# clickhouse-copier
Copies data from the tables in one cluster to tables in another (or the same) cluster.
You can run multiple `clickhouse-copier` instances on different servers to perform the same job. ZooKeeper is used for syncing the processes.
After starting, `clickhouse-copier`:
- Connects to ZooKeeper and receives:
- Copying jobs.
- The state of the copying jobs.
- It performs the jobs.
Each running process chooses the "closest" shard of the source cluster and copies the data into the destination cluster, resharding the data if necessary.
`clickhouse-copier` tracks the changes in ZooKeeper and applies them on the fly.
To reduce network traffic, we recommend running `clickhouse-copier` on the same server where the source data is located.
## Running clickhouse-copier
The utility should be run manually:
```bash
clickhouse-copier copier --daemon --config zookeeper.xml --task-path /task/path --base-dir /path/to/dir
```
Parameters:
- `daemon` — Starts `clickhouse-copier` in daemon mode.
- `config` — The path to the `zookeeper.xml` file with the parameters for the connection to ZooKeeper.
- `task-path` — The path to the ZooKeeper node. This node is used for syncing `clickhouse-copier` processes and storing tasks. Tasks are stored in `$task-path/description`.
- `base-dir` — The path to logs and auxiliary files. When it starts, `clickhouse-copier` creates `clickhouse-copier_YYYYMMHHSS_` subdirectories in `$base-dir`. If this parameter is omitted, the directories are created in the directory where `clickhouse-copier` was launched.
## Format of zookeeper.xml
```xml
127.0.0.1
2181
```
## Configuration of copying tasks
```xml
false
127.0.0.1
9000
...
...
2
1
0
3
1
source_cluster
test
hits
destination_cluster
test
hits2
ENGINE=ReplicatedMergeTree('/clickhouse/tables/{cluster}/{shard}/hits2', '{replica}')
PARTITION BY toMonday(date)
ORDER BY (CounterID, EventDate)
jumpConsistentHash(intHash64(UserID), 2)
CounterID != 0
'2018-02-26'
'2018-03-05'
...
...
...
```
`clickhouse-copier` tracks the changes in `/task/path/description` and applies them on the fly. For instance, if you change the value of `max_workers`, the number of processes running tasks will also change.