A CRC32-based GROUP BY strategy for MySQL

From time to time MySQL just doesn’t select the most effective way to execute a query. GROUP BY is a decent example. A client just required to add a unique key over some columns of a large table in GB’s size, and they first needed to find copies. In this case, we can do it in a quick and effective way.

For example we can have a table.

If we do the group by on address string, then it will take long time to find the duplicates.

But if we create one more column in table of crc_address int(10) unsigned NULL and store crc32 checksum in it and index that column . That will make our life easy and we can quickly find the duplicates in the address field.

Now if we query the table and group by on the crc_address field which is the crc32 checksum of address field. We can see the time difference.

This method will save significant amount of time for big tables even this method is helpful in joining big tables and save lot of time. Because crc32 generates the integer value which can be stored inside int(10) unsigned (4 Bytes) and processing integer value is quite faster then processing the string values. Down side of this method crc32 is quite noisy which can be reduced using it in combination with md5 hash. Thanks for using MySQL !

GoplarDB

The Database Experts

A CRC32-based GROUP BY strategy for MySQL

Recent Posts