It has been few years already when clickhouse development team written blog post about how to update data in ClickHouse. Older version of ClickHouse only supports monthly partitions and changeable structures. We are waiting for easy approach to change the data and finally now we can do Updates and Deletes in ClickHouse. In this blog I will tell you how to do it.
Test Data
Load some test data into the ClickHouse table.
We will show how updates will work as Delete is going to be easy after that.
How it works
The sentence structure for updates and deletes is non-standard SQL. ClickHouse team needed to direct the modification from old-style SQL: new updates and deletes are batch actions, made asynchronously. It is even termed 'mutations'. Routine syntax highpoints the modification.
ALTER TABLE <table_name> DELETE WHERE <filter>;
and
ALTER TABLE <table_name> UPDATE col1 = expr1, ... WHERE <filter>;
In our test table there is column description.
Let's study that status 16 was a fault and we want to fix it. This is just 96% of the data, but without DELETE or UPDATE, we would have to load again the table.
It returns instantly, but the update is asynchronous, so we do not tell if data has been updated or not. Let's crisscross:
Appears to be working. The status of the update action can be observed up in system.mutations table:
Alternative exciting awareness gives system.parts table:
It displays that every part has been affected by update. But it has been fast for small test dataset.
How Delete works
Delete is simple to do.
Conclusion
ClickHouse lastly provisions UPDATE/DELETE functionality, however in its own tradition way. There are some restrictions, but it does the work when used for occasional actions. It makes ClickHouse GDPR submissive, since data can be deleted at all time.
Comments