top of page
Search

A CRC32-based GROUP BY strategy for MySQL

From time to time MySQL just doesn’t select the most effective way to execute a query. GROUP BY is a decent example. A client just required to add a unique key over some columns of a large table in GB’s size, and they first needed to find copies. In this case, we can do it in a quick and effective way.

For example we can have a table.


If we do the group by on address string, then it will take long time to find the duplicates.


But if we create one more column in table of crc_address int(10) unsigned NULL and store crc32 checksum in it and index that column . That will make our life easy and we can quickly find the duplicates in the address field.


Now if we query the table and group by on the crc_address field which is the crc32 checksum of address field. We can see the time difference.


This method will save significant amount of time for big tables even this method is helpful in joining big tables and save lot of time. Because crc32 generates the integer value which can be stored inside int(10) unsigned (4 Bytes) and processing integer value is quite faster then processing the string values. Down side of this method crc32 is quite noisy which can be reduced using it in combination with md5 hash. Thanks for using MySQL !

14 views0 comments

Recent Posts

See All

What are the future prospects of Java

According to a survey conducted, nearly 63% programmers mentioned that they will likely continue to work with Java along with coding with Python, SQL and HTML/CSS. In addition, the giants in the corpo

Deleting Duplicate Rows in MySQL

In this blog, you will study several ways to delete duplicate rows in MySQL. In this blog, I have shown you how to find duplicate values in a table. Once the duplicates rows are recognized, you may ne

Upload Data to MySQL tables using mysqlimport

Uploading quite a lot of rows of data from a text, csv, and excel file into a MySQL table is a repetitive task for sysadmins and DBAs who are handling MySQL database. This blog clarifies 4 applied exa

bottom of page