ClickHouse/docs/en/operations/table_engines/replacingmergetree.md
Ivan Blinkov 0a4a5b36cc
Some WIP on documentation refactoring (#2659)
* Additional .gitignore entries

* Merge a bunch of small articles about system tables into single one

* Merge a bunch of small articles about formats into single one

* Adapt table with formats to English docs too

* Add SPb meetup link to main page

* Move Utilities out of top level of docs (the location is probably not yet final) + translate couple articles

* Merge MacOS.md into build_osx.md

* Move Data types higher in ToC

* Publish changelog on website alongside documentation

* Few fixes for en/table_engines/file.md

* Use smaller header sizes in changelogs

* Group up table engines inside ToC

* Move table engines out of top level too

* Specificy in ToC that query language is SQL based. Thats a bit excessive, but catches eye.

* Move stuff that is part of query language into respective folder

* Move table functions lower in ToC

* Lost redirects.txt update

* Do not rely on comments in yaml + fix few ru titles

* Extract major parts of queries.md into separate articles

* queries.md has been supposed to be removed

* Fix weird translation

* Fix a bunch of links

* There is only table of contents left

* "Query language" is actually part of SQL abbreviation

* Change filename in README.md too

* fix mistype
2018-07-18 13:00:53 +03:00

1.2 KiB

ReplacingMergeTree

This engine table differs from MergeTree in that it removes duplicate entries with the same primary key value.

The last optional parameter for the table engine is the version column. When merging, it reduces all rows with the same primary key value to just one row. If the version column is specified, it leaves the row with the highest version; otherwise, it leaves the last row.

The version column must have a type from the UInt family, Date, or DateTime.

ReplacingMergeTree(EventDate, (OrderID, EventDate, BannerID, ...), 8192, ver)

Note that data is only deduplicated during merges. Merging occurs in the background at an unknown time, so you can't plan for it. Some of the data may remain unprocessed. Although you can run an unscheduled merge using the OPTIMIZE query, don't count on using it, because the OPTIMIZE query will read and write a large amount of data.

Thus, ReplacingMergeTree is suitable for clearing out duplicate data in the background in order to save space, but it doesn't guarantee the absence of duplicates.

This engine is not used in Yandex.Metrica, but it has been applied in other Yandex projects.