# clickhouse-copier util The util copies tables data from one cluster to new tables of other (possibly the same) cluster in distributed and fault-tolerant manner. Configuration of copying tasks is set in special ZooKeeper node (called the `/description` node). A ZooKeeper path to the description node is specified via `--task-path ` parameter. So, node `/task/path/description` should contain special XML content describing copying tasks. Simultaneously many `clickhouse-copier` processes located on any servers could execute the same task. ZooKeeper node `/task/path/` is used by the processes to coordinate their work. You must not add additional child nodes to `/task/path/`. Currently you are responsible for manual launching of all `cluster-copier` processes. You can launch as many processes as you want, whenever and wherever you want. Each process try to select the nearest available shard of source cluster and copy some part of data (partition) from it to the whole destination cluster (with resharding). Therefore it makes sense to launch cluster-copier processes on the source cluster nodes to reduce the network usage. Since the workers coordinate their work via ZooKeeper, in addition to `--task-path ` you have to specify ZooKeeper cluster configuration via `--config-file ` parameter. Example of `zookeeper.xml`: ```xml 127.0.0.1 2181 ``` When you run `clickhouse-copier --config-file --task-path ` the process connects to ZooKeeper cluster, reads tasks config from `/task/path/description` and executes them. ## Format of task config Here is an example of `/task/path/description` content: ```xml false 127.0.0.1 9000 ... ... 2 1 0 3 1 source_cluster test hits destination_cluster test hits2 ENGINE=ReplicatedMergeTree('/clickhouse/tables/{cluster}/{shard}/hits2', '{replica}') PARTITION BY toMonday(date) ORDER BY (CounterID, EventDate) jumpConsistentHash(intHash64(UserID), 2) CounterID != 0 '2018-02-26' '2018-03-05' ... ... ... ``` cluster-copier processes watch for `/task/path/description` node update. So, if you modify the config settings or `max_workers` params, they will be updated. ## Example ```bash clickhouse-copier copier --daemon --config /path/to/copier/zookeeper.xml --task-path /clickhouse-copier/cluster1_tables_hits --base-dir /path/to/copier_logs ``` `--base-dir /path/to/copier_logs` specifies where auxilary and log files of the copier process will be saved. In this case it will create `/path/to/copier_logs/clickhouse-copier_YYYYMMHHSS_/` dir with log and status-files. If it is not specified it will use current dir (`/clickhouse-copier_YYYYMMHHSS_/` if it is run as a `--daemon`).