In this blog, you will study several ways to delete duplicate rows in MySQL.
In this blog, I have shown you how to find duplicate values in a table. Once the duplicates rows are recognized, you may need to delete them to clean up your data.
Prepare sample data
The subsequent script creates the contacts table and inserts sample data into the contacts table for the demo.
You can perform this script to re-form test data after you execute a DELETE statement.
The subsequent query returns the duplicate emails in the contacts table:
As you can understand, we have four rows with duplicate emails.
Delete duplicate rows using DELETE JOIN statement
MySQL make available you with the DELETE JOIN statement that you can use to remove duplicate rows speedily.
The subsequent statement deletes duplicate rows and retains the highest id:
It specified that four rows have been deleted. You can execute the query that find duplicate emails once more to confirm the delete:
The query returns an empty set, which means that the duplicate rows have been deleted.
Let’s verify data from the contacts table:
The rows with id 2, 4, 7, and 9 have been deleted.
In case you want to delete duplicate rows and keep the lowest id, you can use the following statement:
Note that you can execute the script for creating contacts table again and test this query. The following output shows the data of the contacts table after removing duplicate rows.
Delete duplicate rows using an intermediate table
The subsequent shows the steps for removing duplicate rows using an in-between table:
Create a new table with the structure the same as the original table which you want to delete duplicate rows.
Insert distinct rows from the original table to the immediate table.
Drop the original table and rename the immediate table to the original table.
The subsequent queries illustrate the steps:
CREATE TABLE source_copy LIKE source;
INSERT INTO source_copy
SELECT * FROM source
GROUP BY col; -- column that has duplicate values
DROP TABLE source;
ALTER TABLE source_copy RENAME TO source;
For example, the subsequent statements delete rows with duplicate emails from the contacts table:
Delete duplicate rows using ROW_NUMBER() function
Note that the ROW_NUMBER() function has been supported since MySQL version 8.02 so you should check your MySQL version before using the function.
The subsequent statement uses the ROW_NUMBER() function to assign a consecutive integer number to each row. If the email is duplicate, the row number will be greater than one.
The subsequent statement returns the id list of the duplicate rows:
And you just delete the duplicate rows from the contacts table using the DELETE statement with a subquery in the WHERE clause:
MySQL issued the subsequent message:
4 row(s) affected
In this blog, you have learned how to delete duplicate rows in MySQL by using the DELETE JOIN statement or an intermediate table.