# clickhouse-copier Copies data from the tables in one cluster to tables in another (or the same) cluster. You can run multiple `clickhouse-copier` instances on different servers to perform the same job. ZooKeeper is used for syncing the processes. After starting, `clickhouse-copier`: - Connects to ZooKeeper and receives: - Copying jobs. - The state of the copying jobs. - It performs the jobs. Each running process chooses the "closest" shard of the source cluster and copies the data into the destination cluster, resharding the data if necessary. `clickhouse-copier` tracks the changes in ZooKeeper and applies them on the fly. To reduce network traffic, we recommend running `clickhouse-copier` on the same server where the source data is located. ## Running clickhouse-copier The utility should be run manually: ```bash clickhouse-copier copier --daemon --config zookeeper.xml --task-path /task/path --base-dir /path/to/dir ``` Parameters: - `daemon` — Starts `clickhouse-copier` in daemon mode. - `config` — The path to the `zookeeper.xml` file with the parameters for the connection to ZooKeeper. - `task-path` — The path to the ZooKeeper node. This node is used for syncing `clickhouse-copier` processes and storing tasks. Tasks are stored in `$task-path/description`. - `task-file` — Optional path to file with task configuration for initial upload to ZooKeeper. - `task-upload-force` — Force upload `task-file` even if node already exists. - `base-dir` — The path to logs and auxiliary files. When it starts, `clickhouse-copier` creates `clickhouse-copier_YYYYMMHHSS_` subdirectories in `$base-dir`. If this parameter is omitted, the directories are created in the directory where `clickhouse-copier` was launched. ## Format of zookeeper.xml ```xml trace 100M 3 127.0.0.1 2181 ``` ## Configuration of copying tasks ```xml false 127.0.0.1 9000 ... ... 2 1 0 3 1 source_cluster test hits destination_cluster test hits2 ENGINE=ReplicatedMergeTree('/clickhouse/tables/{cluster}/{shard}/hits2', '{replica}') PARTITION BY toMonday(date) ORDER BY (CounterID, EventDate) jumpConsistentHash(intHash64(UserID), 2) CounterID != 0 '2018-02-26' '2018-03-05' ... ... ... ``` `clickhouse-copier` tracks the changes in `/task/path/description` and applies them on the fly. For instance, if you change the value of `max_workers`, the number of processes running tasks will also change. [Original article](https://clickhouse.yandex/docs/en/operations/utils/clickhouse-copier/)