Copies data from the tables in one cluster to tables in another (or the same) cluster.
You can run multiple `clickhouse-copier` instances on different servers to perform the same job. ZooKeeper is used for syncing the processes.
After starting, `clickhouse-copier`:
- Connects to ZooKeeper and receives:
- Copying jobs.
- The state of the copying jobs.
- It performs the jobs.
Each running process chooses the "closest" shard of the source cluster and copies the data into the destination cluster, resharding the data if necessary.
`clickhouse-copier` tracks the changes in ZooKeeper and applies them on the fly.
To reduce network traffic, we recommend running `clickhouse-copier` on the same server where the source data is located.
-`daemon` — Starts `clickhouse-copier` in daemon mode.
-`config` — The path to the `zookeeper.xml` file with the parameters for the connection to ZooKeeper.
-`task-path` — The path to the ZooKeeper node. This node is used for syncing `clickhouse-copier` processes and storing tasks. Tasks are stored in `$task-path/description`.
-`base-dir` — The path to logs and auxiliary files. When it starts, `clickhouse-copier` creates `clickhouse-copier_YYYYMMHHSS_<PID>` subdirectories in `$base-dir`. If this parameter is omitted, the directories are created in the directory where `clickhouse-copier` was launched.
will be removed at the start of the copying since they will be interpeted
as unfinished data from the previous copying!!!
-->
<enabled_partitions>
<partition>'2018-02-26'</partition>
<partition>'2018-03-05'</partition>
...
</enabled_partitions>
</table_hits>
<!-- Next table to copy. It is not copied until previous table is copying. -->
</table_visits>
...
</table_visits>
...
</tables>
</yandex>
```
`clickhouse-copier` tracks the changes in `/task/path/description` and applies them on the fly. For instance, if you change the value of `max_workers`, the number of processes running tasks will also change.