mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-24 16:42:05 +00:00
1664 lines
81 KiB
Markdown
1664 lines
81 KiB
Markdown
This is a "usability testing" of TimescaleDB. I did not use TimescaleDB before. I will try to install it, load the data and conduct benchmarks. And record every obstacle that I will face.
|
||
Usability testing need to be conducted by the most clueless person in the room. Doing this "usability testing" requires a bit of patience and courage (to publish all the struggles as is).
|
||
|
||
Note: insted of using clear VM, I have to run benchmark on exactly the same baremetal server where all other benchmarks were run.
|
||
|
||
|
||
## Installation
|
||
|
||
Install as following:
|
||
https://docs.timescale.com/timescaledb/latest/how-to-guides/install-timescaledb/self-hosted/ubuntu/installation-apt-ubuntu/#installation-apt-ubuntu
|
||
|
||
I've noticed that TimescaleDB documentation website does not have favicon in contrast to the main page.
|
||
In other means, it is quite neat.
|
||
|
||
```
|
||
sudo apt install postgresql-common
|
||
sudo sh /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
|
||
sudo sh -c "echo 'deb [signed-by=/usr/share/keyrings/timescale.keyring] https://packagecloud.io/timescale/timescaledb/ubuntu/ $(lsb_release -c -s) main' > /etc/apt/sources.list.d/timescaledb.list"
|
||
wget --quiet -O - https://packagecloud.io/timescale/timescaledb/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/timescale.keyring
|
||
sudo apt-get update
|
||
sudo apt install timescaledb-2-postgresql-13
|
||
```
|
||
|
||
It recommends to tune it:
|
||
|
||
```
|
||
sudo apt install timescaledb-tune
|
||
|
||
sudo timescaledb-tune --quiet --yes
|
||
Using postgresql.conf at this path:
|
||
/etc/postgresql/13/main/postgresql.conf
|
||
|
||
Writing backup to:
|
||
/tmp/timescaledb_tune.backup202110292328
|
||
|
||
Recommendations based on 125.88 GB of available memory and 32 CPUs for PostgreSQL 13
|
||
shared_preload_libraries = 'timescaledb' # (change requires restart)
|
||
shared_buffers = 32226MB
|
||
effective_cache_size = 96678MB
|
||
maintenance_work_mem = 2047MB
|
||
work_mem = 10312kB
|
||
timescaledb.max_background_workers = 8
|
||
max_worker_processes = 43
|
||
max_parallel_workers_per_gather = 16
|
||
max_parallel_workers = 32
|
||
wal_buffers = 16MB
|
||
min_wal_size = 512MB
|
||
default_statistics_target = 500
|
||
random_page_cost = 1.1
|
||
checkpoint_completion_target = 0.9
|
||
max_locks_per_transaction = 512
|
||
autovacuum_max_workers = 10
|
||
autovacuum_naptime = 10
|
||
effective_io_concurrency = 256
|
||
timescaledb.last_tuned = '2021-10-29T23:28:49+03:00'
|
||
timescaledb.last_tuned_version = '0.12.0'
|
||
Saving changes to: /etc/postgresql/13/main/postgresql.conf
|
||
```
|
||
|
||
```
|
||
sudo service postgresql restart
|
||
```
|
||
|
||
Post-install setup:
|
||
https://docs.timescale.com/timescaledb/latest/how-to-guides/install-timescaledb/post-install-setup/
|
||
|
||
```
|
||
$ psql -U postgres -h localhost
|
||
Password for user postgres:
|
||
psql: error: connection to server at "localhost" (::1), port 5432 failed: fe_sendauth: no password supplied
|
||
```
|
||
|
||
How to set up password?
|
||
|
||
```
|
||
milovidov@mtlog-perftest03j:~$ psql -U postgres -h localhost
|
||
Password for user postgres:
|
||
psql: error: connection to server at "localhost" (::1), port 5432 failed: fe_sendauth: no password supplied
|
||
milovidov@mtlog-perftest03j:~$ psql
|
||
psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL: role "milovidov" does not exist
|
||
milovidov@mtlog-perftest03j:~$ sudo psql
|
||
psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL: role "root" does not exist
|
||
milovidov@mtlog-perftest03j:~$ psql -U postgres
|
||
psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL: Peer authentication failed for user "postgres"
|
||
milovidov@mtlog-perftest03j:~$ psql -U postgres -h localost
|
||
psql: error: could not translate host name "localost" to address: Name or service not known
|
||
milovidov@mtlog-perftest03j:~$ sudo psql -U postgres -h localost
|
||
psql: error: could not translate host name "localost" to address: Name or service not known
|
||
milovidov@mtlog-perftest03j:~$ sudo psql -U postgres -h localhost
|
||
Password for user postgres:
|
||
psql: error: connection to server at "localhost" (::1), port 5432 failed: fe_sendauth: no password supplied
|
||
milovidov@mtlog-perftest03j:~$ sudo -u postgres psql -h localhost
|
||
Password for user postgres:
|
||
psql: error: connection to server at "localhost" (::1), port 5432 failed: fe_sendauth: no password supplied
|
||
```
|
||
|
||
I found an answer here: https://stackoverflow.com/questions/12720967/how-to-change-postgresql-user-password
|
||
|
||
```
|
||
$ sudo -u postgres psql
|
||
psql (13.4 (Ubuntu 13.4-4.pgdg18.04+1), server 9.5.25)
|
||
Type "help" for help.
|
||
|
||
postgres=# \password postgres
|
||
Enter new password:
|
||
Enter it again:
|
||
postgres=#
|
||
|
||
CREATE database tutorial;
|
||
|
||
postgres=# CREATE EXTENSION IF NOT EXISTS timescaledb;
|
||
ERROR: could not open extension control file "/usr/share/postgresql/9.5/extension/timescaledb.control": No such file or directory
|
||
```
|
||
|
||
Looks like I have old PostgreSQL.
|
||
|
||
```
|
||
$ ls -l /usr/share/postgresql/
|
||
10/ 11/ 13/ 9.5/
|
||
```
|
||
|
||
But there is also newer PostgreSQL.
|
||
|
||
```
|
||
$ psql --version
|
||
psql (PostgreSQL) 13.4 (Ubuntu 13.4-4.pgdg18.04+1)
|
||
|
||
psql is new, so what is wrong?
|
||
```
|
||
|
||
Looks like I have all versions running simultaneously?
|
||
|
||
https://askubuntu.com/questions/17823/how-to-list-all-installed-packages
|
||
|
||
```
|
||
$ ps auxw | grep postgres
|
||
postgres 718818 0.0 0.5 33991600 730184 ? Ss 23:29 0:00 /usr/lib/postgresql/13/bin/postgres -D /var/lib/postgresql/13/main -c config_file=/etc/postgresql/13/main/postgresql.conf
|
||
postgres 718825 0.0 0.0 320356 27660 ? S 23:29 0:00 /usr/lib/postgresql/10/bin/postgres -D /var/lib/postgresql/10/main -c config_file=/etc/postgresql/10/main/postgresql.conf
|
||
postgres 718826 0.0 0.0 320712 27900 ? S 23:29 0:00 /usr/lib/postgresql/11/bin/postgres -D /var/lib/postgresql/11/main -c config_file=/etc/postgresql/11/main/postgresql.conf
|
||
postgres 718829 0.0 0.0 320468 7092 ? Ss 23:29 0:00 postgres: 10/main: checkpointer process
|
||
postgres 718830 0.0 0.0 320356 4300 ? Ss 23:29 0:00 postgres: 10/main: writer process
|
||
postgres 718831 0.0 0.0 320356 9204 ? Ss 23:29 0:00 postgres: 10/main: wal writer process
|
||
postgres 718832 0.0 0.0 320776 6964 ? Ss 23:29 0:00 postgres: 10/main: autovacuum launcher process
|
||
postgres 718833 0.0 0.0 175404 3596 ? Ss 23:29 0:00 postgres: 10/main: stats collector process
|
||
postgres 718834 0.0 0.0 320640 5052 ? Ss 23:29 0:00 postgres: 10/main: bgworker: logical replication launcher
|
||
postgres 718835 0.0 0.0 320820 5592 ? Ss 23:29 0:00 postgres: 11/main: checkpointer
|
||
postgres 718836 0.0 0.0 320712 4164 ? Ss 23:29 0:00 postgres: 11/main: background writer
|
||
postgres 718837 0.0 0.0 320712 9040 ? Ss 23:29 0:00 postgres: 11/main: walwriter
|
||
postgres 718838 0.0 0.0 321116 6824 ? Ss 23:29 0:00 postgres: 11/main: autovacuum launcher
|
||
postgres 718839 0.0 0.0 175752 3652 ? Ss 23:29 0:00 postgres: 11/main: stats collector
|
||
postgres 718840 0.0 0.0 321120 6640 ? Ss 23:29 0:00 postgres: 11/main: logical replication launcher
|
||
postgres 718842 0.0 0.1 33991700 263860 ? Ss 23:29 0:00 postgres: 13/main: checkpointer
|
||
postgres 718843 0.0 0.2 33991600 264096 ? Ss 23:29 0:00 postgres: 13/main: background writer
|
||
postgres 718844 0.0 0.0 33991600 22044 ? Ss 23:29 0:00 postgres: 13/main: walwriter
|
||
postgres 718845 0.0 0.0 33992284 7040 ? Ss 23:29 0:00 postgres: 13/main: autovacuum launcher
|
||
postgres 718846 0.0 0.0 177920 4320 ? Ss 23:29 0:00 postgres: 13/main: stats collector
|
||
postgres 718847 0.0 0.0 33992136 7972 ? Ss 23:29 0:00 postgres: 13/main: TimescaleDB Background Worker Launcher
|
||
postgres 718848 0.0 0.0 33992164 7248 ? Ss 23:29 0:00 postgres: 13/main: logical replication launcher
|
||
postgres 718857 0.0 0.0 304492 26284 ? S 23:29 0:00 /usr/lib/postgresql/9.5/bin/postgres -D /var/lib/postgresql/9.5/main -c config_file=/etc/postgresql/9.5/main/postgresql.conf
|
||
postgres 718859 0.0 0.0 304592 6480 ? Ss 23:29 0:00 postgres: checkpointer process
|
||
postgres 718860 0.0 0.0 304492 5656 ? Ss 23:29 0:00 postgres: writer process
|
||
postgres 718861 0.0 0.0 304492 4144 ? Ss 23:29 0:00 postgres: wal writer process
|
||
postgres 718862 0.0 0.0 304928 6896 ? Ss 23:29 0:00 postgres: autovacuum launcher process
|
||
postgres 718863 0.0 0.0 159744 4156 ? Ss 23:29 0:00 postgres: stats collector process
|
||
milovid+ 724277 0.0 0.0 14364 1024 pts/17 S+ 23:41 0:00 grep --color=auto postgres
|
||
|
||
$ apt list --installed | grep postgres
|
||
|
||
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
|
||
|
||
postgresql-10/now 10.16-1.pgdg18.04+1 amd64 [installed,upgradable to: 10.18-1.pgdg18.04+1]
|
||
postgresql-11/now 11.11-1.pgdg18.04+1 amd64 [installed,upgradable to: 11.13-1.pgdg18.04+1]
|
||
postgresql-11-postgis-3/now 3.1.1+dfsg-1.pgdg18.04+1 amd64 [installed,upgradable to: 3.1.4+dfsg-1.pgdg18.04+1]
|
||
postgresql-11-postgis-3-scripts/now 3.1.1+dfsg-1.pgdg18.04+1 all [installed,upgradable to: 3.1.4+dfsg-1.pgdg18.04+1]
|
||
postgresql-13/bionic-pgdg,now 13.4-4.pgdg18.04+1 amd64 [installed,automatic]
|
||
postgresql-9.5/bionic-pgdg,now 9.5.25-1.pgdg18.04+1 amd64 [installed]
|
||
postgresql-9.5-postgis-2.2-scripts/now 2.2.2+dfsg-4.pgdg14.04+1.yandex all [installed,local]
|
||
postgresql-client-10/now 10.16-1.pgdg18.04+1 amd64 [installed,upgradable to: 10.18-1.pgdg18.04+1]
|
||
postgresql-client-11/now 11.11-1.pgdg18.04+1 amd64 [installed,upgradable to: 11.13-1.pgdg18.04+1]
|
||
postgresql-client-13/bionic-pgdg,now 13.4-4.pgdg18.04+1 amd64 [installed,automatic]
|
||
postgresql-client-9.5/bionic-pgdg,now 9.5.25-1.pgdg18.04+1 amd64 [installed]
|
||
postgresql-client-common/bionic-pgdg,now 231.pgdg18.04+1 all [installed]
|
||
postgresql-common/bionic-pgdg,now 231.pgdg18.04+1 all [installed]
|
||
timescaledb-2-loader-postgresql-13/bionic,now 2.5.0~ubuntu18.04 amd64 [installed,automatic]
|
||
timescaledb-2-postgresql-13/bionic,now 2.5.0~ubuntu18.04 amd64 [installed]
|
||
```
|
||
|
||
Let's remove all older packages.
|
||
|
||
```
|
||
sudo apt remove postgresql-10 postgresql-11 postgresql-9.5 postgresql-client-10 postgresql-client-11 postgresql-client-9.5
|
||
```
|
||
|
||
Just in case:
|
||
|
||
```
|
||
sudo service postgresql restart
|
||
```
|
||
|
||
Now it stopped to work:
|
||
|
||
```
|
||
$ sudo -u postgres psql
|
||
psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
|
||
Is the server running locally and accepting connections on that socket?
|
||
|
||
$ sudo -u postgres psql -h localhost
|
||
psql: error: connection to server at "localhost" (::1), port 5432 failed: Connection refused
|
||
Is the server running on that host and accepting TCP/IP connections?
|
||
connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
|
||
Is the server running on that host and accepting TCP/IP connections?
|
||
```
|
||
|
||
But it's running:
|
||
|
||
```
|
||
$ ps auxw | grep postgres
|
||
postgres 726158 0.5 0.5 33991600 730084 ? Ss 23:45 0:00 /usr/lib/postgresql/13/bin/postgres -D /var/lib/postgresql/13/main -c config_file=/etc/postgresql/13/main/postgresql.conf
|
||
postgres 726160 0.0 0.0 33991600 4256 ? Ss 23:45 0:00 postgres: 13/main: checkpointer
|
||
postgres 726161 0.1 0.1 33991600 150048 ? Ss 23:45 0:00 postgres: 13/main: background writer
|
||
postgres 726162 0.0 0.0 33991600 22044 ? Ss 23:45 0:00 postgres: 13/main: walwriter
|
||
postgres 726163 0.0 0.0 33992284 6976 ? Ss 23:45 0:00 postgres: 13/main: autovacuum launcher
|
||
postgres 726164 0.0 0.0 177920 4384 ? Ss 23:45 0:00 postgres: 13/main: stats collector
|
||
postgres 726165 0.0 0.0 33992136 7840 ? Ss 23:45 0:00 postgres: 13/main: TimescaleDB Background Worker Launcher
|
||
postgres 726166 0.0 0.0 33992164 7244 ? Ss 23:45 0:00 postgres: 13/main: logical replication launcher
|
||
milovid+ 726578 0.0 0.0 14364 1100 pts/17 S+ 23:46 0:00 grep --color=auto postgres
|
||
```
|
||
|
||
But it does not listen 5432:
|
||
|
||
```
|
||
$ netstat -n | grep 5432
|
||
```
|
||
|
||
Let's look at the config:
|
||
|
||
```
|
||
sudo mcedit /etc/postgresql/13/main/postgresql.conf
|
||
```
|
||
|
||
```
|
||
# - Connection Settings -
|
||
|
||
#listen_addresses = 'localhost'
|
||
```
|
||
|
||
Looks like I need to uncomment it.
|
||
|
||
```
|
||
sudo service postgresql restart
|
||
```
|
||
|
||
But it did not help:
|
||
|
||
```
|
||
$ sudo -u postgres psql -h localhost
|
||
psql: error: connection to server at "localhost" (::1), port 5432 failed: Connection refused
|
||
Is the server running on that host and accepting TCP/IP connections?
|
||
connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
|
||
Is the server running on that host and accepting TCP/IP connections?
|
||
```
|
||
|
||
Let's consult https://stackoverflow.com/questions/31091748/postgres-server-not-listening
|
||
|
||
It is mentioning some pg_hba.conf. BTW what is HBA*? Let's find this file...
|
||
|
||
```
|
||
sudo mcedit /etc/postgresql/13/main/pg_hba.conf
|
||
```
|
||
|
||
\* host based authentication rules - it is explained inside this file.
|
||
|
||
Nothing wrong in this file...
|
||
|
||
```
|
||
$ sudo service postgresql status
|
||
● postgresql.service - PostgreSQL RDBMS
|
||
Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
|
||
Active: active (exited) since Fri 2021-10-29 23:50:14 MSK; 6min ago
|
||
Process: 728545 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
|
||
Main PID: 728545 (code=exited, status=0/SUCCESS)
|
||
|
||
Oct 29 23:50:14 mtlog-perftest03j systemd[1]: postgresql.service: Changed dead -> start
|
||
Oct 29 23:50:14 mtlog-perftest03j systemd[1]: Starting PostgreSQL RDBMS...
|
||
Oct 29 23:50:14 mtlog-perftest03j systemd[728545]: postgresql.service: Executing: /bin/true
|
||
Oct 29 23:50:14 mtlog-perftest03j systemd[1]: postgresql.service: Child 728545 belongs to postgresql.service.
|
||
Oct 29 23:50:14 mtlog-perftest03j systemd[1]: postgresql.service: Main process exited, code=exited, status=0/SUCCESS
|
||
Oct 29 23:50:14 mtlog-perftest03j systemd[1]: postgresql.service: Changed start -> exited
|
||
Oct 29 23:50:14 mtlog-perftest03j systemd[1]: postgresql.service: Job postgresql.service/start finished, result=done
|
||
Oct 29 23:50:14 mtlog-perftest03j systemd[1]: Started PostgreSQL RDBMS.
|
||
Oct 29 23:50:14 mtlog-perftest03j systemd[1]: postgresql.service: Failed to send unit change signal for postgresql.service: Connection reset by peer
|
||
```
|
||
|
||
It's quite cryptic. What does it mean "Failed to send unit change signal"? Is it good or bad?
|
||
What is the "unit"? Maybe it is "SystemD Unit" - the phrase that I've heard many times but don't really understand.
|
||
|
||
Almost gave up... Wow, I found the culprit! In `/etc/postgresql/13/main/postgresql.conf`:
|
||
|
||
```
|
||
port = 5435
|
||
```
|
||
|
||
Most likely this has happened, because multiple versions of PostgreSQL were installed.
|
||
|
||
Let's change to 5432.
|
||
|
||
```
|
||
sudo mcedit /etc/postgresql/13/main/postgresql.conf
|
||
sudo service postgresql restart
|
||
```
|
||
|
||
But now it does not accept password:
|
||
|
||
```
|
||
milovidov@mtlog-perftest03j:~$ sudo -u postgres psql -h 127.0.0.1
|
||
Password for user postgres:
|
||
psql: error: connection to server at "127.0.0.1", port 5432 failed: fe_sendauth: no password supplied
|
||
milovidov@mtlog-perftest03j:~$ sudo -u postgres psql -h 127.0.0.1 --password ''
|
||
Password:
|
||
psql: error: connection to server at "127.0.0.1", port 5432 failed: fe_sendauth: no password supplied
|
||
milovidov@mtlog-perftest03j:~$ sudo -u postgres psql -h 127.0.0.1
|
||
Password for user postgres:
|
||
psql: error: connection to server at "127.0.0.1", port 5432 failed: fe_sendauth: no password supplied
|
||
```
|
||
|
||
Works this way:
|
||
|
||
```
|
||
$ sudo -u postgres psql
|
||
psql (13.4 (Ubuntu 13.4-4.pgdg18.04+1))
|
||
Type "help" for help.
|
||
|
||
postgres=# \password
|
||
Enter new password:
|
||
Enter it again:
|
||
```
|
||
|
||
It works with fine ASCII arc:
|
||
|
||
```
|
||
postgres=# CREATE database tutorial;
|
||
CREATE DATABASE
|
||
postgres=# \c tutorial
|
||
You are now connected to database "tutorial" as user "postgres".
|
||
tutorial=# CREATE EXTENSION IF NOT EXISTS timescaledb;
|
||
WARNING:
|
||
WELCOME TO
|
||
_____ _ _ ____________
|
||
|_ _(_) | | | _ \ ___ \
|
||
| | _ _ __ ___ ___ ___ ___ __ _| | ___| | | | |_/ /
|
||
| | | | _ ` _ \ / _ \/ __|/ __/ _` | |/ _ \ | | | ___ \
|
||
| | | | | | | | | __/\__ \ (_| (_| | | __/ |/ /| |_/ /
|
||
|_| |_|_| |_| |_|\___||___/\___\__,_|_|\___|___/ \____/
|
||
Running version 2.5.0
|
||
For more information on TimescaleDB, please visit the following links:
|
||
|
||
1. Getting started: https://docs.timescale.com/timescaledb/latest/getting-started
|
||
2. API reference documentation: https://docs.timescale.com/api/latest
|
||
3. How TimescaleDB is designed: https://docs.timescale.com/timescaledb/latest/overview/core-concepts
|
||
|
||
Note: TimescaleDB collects anonymous reports to better understand and assist our users.
|
||
For more information and how to disable, please see our docs https://docs.timescale.com/timescaledb/latest/how-to-guides/configuration/telemetry.
|
||
|
||
CREATE EXTENSION
|
||
```
|
||
|
||
|
||
## Creating Table
|
||
|
||
Continuing to https://docs.timescale.com/timescaledb/latest/how-to-guides/hypertables/create/
|
||
|
||
Create table:
|
||
|
||
```
|
||
CREATE TABLE hits_100m_obfuscated (
|
||
WatchID BIGINT,
|
||
JavaEnable SMALLINT,
|
||
Title TEXT,
|
||
GoodEvent SMALLINT,
|
||
EventTime TIMESTAMP,
|
||
EventDate Date,
|
||
CounterID INTEGER,
|
||
ClientIP INTEGER,
|
||
RegionID INTEGER,
|
||
UserID BIGINT,
|
||
CounterClass SMALLINT,
|
||
OS SMALLINT,
|
||
UserAgent SMALLINT,
|
||
URL TEXT,
|
||
Referer TEXT,
|
||
Refresh SMALLINT,
|
||
RefererCategoryID SMALLINT,
|
||
RefererRegionID INTEGER,
|
||
URLCategoryID SMALLINT,
|
||
URLRegionID INTEGER,
|
||
ResolutionWidth SMALLINT,
|
||
ResolutionHeight SMALLINT,
|
||
ResolutionDepth SMALLINT,
|
||
FlashMajor SMALLINT,
|
||
FlashMinor SMALLINT,
|
||
FlashMinor2 TEXT,
|
||
NetMajor SMALLINT,
|
||
NetMinor SMALLINT,
|
||
UserAgentMajor SMALLINT,
|
||
UserAgentMinor CHAR(2),
|
||
CookieEnable SMALLINT,
|
||
JavascriptEnable SMALLINT,
|
||
IsMobile SMALLINT,
|
||
MobilePhone SMALLINT,
|
||
MobilePhoneModel TEXT,
|
||
Params TEXT,
|
||
IPNetworkID INTEGER,
|
||
TraficSourceID SMALLINT,
|
||
SearchEngineID SMALLINT,
|
||
SearchPhrase TEXT,
|
||
AdvEngineID SMALLINT,
|
||
IsArtifical SMALLINT,
|
||
WindowClientWidth SMALLINT,
|
||
WindowClientHeight SMALLINT,
|
||
ClientTimeZone SMALLINT,
|
||
ClientEventTime TIMESTAMP,
|
||
SilverlightVersion1 SMALLINT,
|
||
SilverlightVersion2 SMALLINT,
|
||
SilverlightVersion3 INTEGER,
|
||
SilverlightVersion4 SMALLINT,
|
||
PageCharset TEXT,
|
||
CodeVersion INTEGER,
|
||
IsLink SMALLINT,
|
||
IsDownload SMALLINT,
|
||
IsNotBounce SMALLINT,
|
||
FUniqID BIGINT,
|
||
OriginalURL TEXT,
|
||
HID INTEGER,
|
||
IsOldCounter SMALLINT,
|
||
IsEvent SMALLINT,
|
||
IsParameter SMALLINT,
|
||
DontCountHits SMALLINT,
|
||
WithHash SMALLINT,
|
||
HitColor CHAR,
|
||
LocalEventTime TIMESTAMP,
|
||
Age SMALLINT,
|
||
Sex SMALLINT,
|
||
Income SMALLINT,
|
||
Interests SMALLINT,
|
||
Robotness SMALLINT,
|
||
RemoteIP INTEGER,
|
||
WindowName INTEGER,
|
||
OpenerName INTEGER,
|
||
HistoryLength SMALLINT,
|
||
BrowserLanguage TEXT,
|
||
BrowserCountry TEXT,
|
||
SocialNetwork TEXT,
|
||
SocialAction TEXT,
|
||
HTTPError SMALLINT,
|
||
SendTiming INTEGER,
|
||
DNSTiming INTEGER,
|
||
ConnectTiming INTEGER,
|
||
ResponseStartTiming INTEGER,
|
||
ResponseEndTiming INTEGER,
|
||
FetchTiming INTEGER,
|
||
SocialSourceNetworkID SMALLINT,
|
||
SocialSourcePage TEXT,
|
||
ParamPrice BIGINT,
|
||
ParamOrderID TEXT,
|
||
ParamCurrency TEXT,
|
||
ParamCurrencyID SMALLINT,
|
||
OpenstatServiceName TEXT,
|
||
OpenstatCampaignID TEXT,
|
||
OpenstatAdID TEXT,
|
||
OpenstatSourceID TEXT,
|
||
UTMSource TEXT,
|
||
UTMMedium TEXT,
|
||
UTMCampaign TEXT,
|
||
UTMContent TEXT,
|
||
UTMTerm TEXT,
|
||
FromTag TEXT,
|
||
HasGCLID SMALLINT,
|
||
RefererHash BIGINT,
|
||
URLHash BIGINT,
|
||
CLID INTEGER
|
||
);
|
||
```
|
||
|
||
I remember PostgreSQL does not support unsigned integers. It also does not support TINYINT.
|
||
And it does not support zero bytes in TEXT fields. We will deal with it...
|
||
|
||
```
|
||
tutorial=# SELECT create_hypertable('hits_100m_obfuscated', 'EventTime');
|
||
ERROR: column "EventTime" does not exist
|
||
```
|
||
|
||
WTF?
|
||
|
||
Maybe it because column names are lowercased?
|
||
|
||
```
|
||
tutorial=# SELECT create_hypertable('hits_100m_obfuscated', 'eventtime');
|
||
NOTICE: adding not-null constraint to column "eventtime"
|
||
DETAIL: Time dimensions cannot have NULL values.
|
||
create_hypertable
|
||
-----------------------------------
|
||
(1,public,hits_100m_obfuscated,t)
|
||
(1 row)
|
||
```
|
||
|
||
Looks like I forgot to specify NOT NULL for every column.
|
||
Let's repeat...
|
||
|
||
```
|
||
tutorial=# DROP TABLE hits_100m_obfuscated
|
||
tutorial-# ;
|
||
DROP TABLE
|
||
tutorial=# CREATE TABLE hits_100m_obfuscated (
|
||
tutorial(# WatchID BIGINT NOT NULL,
|
||
tutorial(# JavaEnable SMALLINT NOT NULL,
|
||
tutorial(# Title TEXT NOT NULL,
|
||
tutorial(# GoodEvent SMALLINT NOT NULL,
|
||
tutorial(# EventTime TIMESTAMP NOT NULL,
|
||
tutorial(# EventDate Date NOT NULL,
|
||
tutorial(# CounterID INTEGER NOT NULL,
|
||
tutorial(# ClientIP INTEGER NOT NULL,
|
||
tutorial(# RegionID INTEGER NOT NULL,
|
||
tutorial(# UserID BIGINT NOT NULL,
|
||
tutorial(# CounterClass SMALLINT NOT NULL,
|
||
tutorial(# OS SMALLINT NOT NULL,
|
||
tutorial(# UserAgent SMALLINT NOT NULL,
|
||
tutorial(# URL TEXT NOT NULL,
|
||
tutorial(# Referer TEXT NOT NULL,
|
||
tutorial(# Refresh SMALLINT NOT NULL,
|
||
tutorial(# RefererCategoryID SMALLINT NOT NULL,
|
||
tutorial(# RefererRegionID INTEGER NOT NULL,
|
||
tutorial(# URLCategoryID SMALLINT NOT NULL,
|
||
tutorial(# URLRegionID INTEGER NOT NULL,
|
||
tutorial(# ResolutionWidth SMALLINT NOT NULL,
|
||
tutorial(# ResolutionHeight SMALLINT NOT NULL,
|
||
tutorial(# ResolutionDepth SMALLINT NOT NULL,
|
||
tutorial(# FlashMajor SMALLINT NOT NULL,
|
||
tutorial(# FlashMinor SMALLINT NOT NULL,
|
||
tutorial(# FlashMinor2 TEXT NOT NULL,
|
||
tutorial(# NetMajor SMALLINT NOT NULL,
|
||
tutorial(# NetMinor SMALLINT NOT NULL,
|
||
tutorial(# UserAgentMajor SMALLINT NOT NULL,
|
||
tutorial(# UserAgentMinor CHAR(2) NOT NULL,
|
||
tutorial(# CookieEnable SMALLINT NOT NULL,
|
||
tutorial(# JavascriptEnable SMALLINT NOT NULL,
|
||
tutorial(# IsMobile SMALLINT NOT NULL,
|
||
tutorial(# MobilePhone SMALLINT NOT NULL,
|
||
tutorial(# MobilePhoneModel TEXT NOT NULL,
|
||
tutorial(# Params TEXT NOT NULL,
|
||
tutorial(# IPNetworkID INTEGER NOT NULL,
|
||
tutorial(# TraficSourceID SMALLINT NOT NULL,
|
||
tutorial(# SearchEngineID SMALLINT NOT NULL,
|
||
tutorial(# SearchPhrase TEXT NOT NULL,
|
||
tutorial(# AdvEngineID SMALLINT NOT NULL,
|
||
tutorial(# IsArtifical SMALLINT NOT NULL,
|
||
tutorial(# WindowClientWidth SMALLINT NOT NULL,
|
||
tutorial(# WindowClientHeight SMALLINT NOT NULL,
|
||
tutorial(# ClientTimeZone SMALLINT NOT NULL,
|
||
tutorial(# ClientEventTime TIMESTAMP NOT NULL,
|
||
tutorial(# SilverlightVersion1 SMALLINT NOT NULL,
|
||
tutorial(# SilverlightVersion2 SMALLINT NOT NULL,
|
||
tutorial(# SilverlightVersion3 INTEGER NOT NULL,
|
||
tutorial(# SilverlightVersion4 SMALLINT NOT NULL,
|
||
tutorial(# PageCharset TEXT NOT NULL,
|
||
tutorial(# CodeVersion INTEGER NOT NULL,
|
||
tutorial(# IsLink SMALLINT NOT NULL,
|
||
tutorial(# IsDownload SMALLINT NOT NULL,
|
||
tutorial(# IsNotBounce SMALLINT NOT NULL,
|
||
tutorial(# FUniqID BIGINT NOT NULL,
|
||
tutorial(# OriginalURL TEXT NOT NULL,
|
||
tutorial(# HID INTEGER NOT NULL,
|
||
tutorial(# IsOldCounter SMALLINT NOT NULL,
|
||
tutorial(# IsEvent SMALLINT NOT NULL,
|
||
tutorial(# IsParameter SMALLINT NOT NULL,
|
||
tutorial(# DontCountHits SMALLINT NOT NULL,
|
||
tutorial(# WithHash SMALLINT NOT NULL,
|
||
tutorial(# HitColor CHAR NOT NULL,
|
||
tutorial(# LocalEventTime TIMESTAMP NOT NULL,
|
||
tutorial(# Age SMALLINT NOT NULL,
|
||
tutorial(# Sex SMALLINT NOT NULL,
|
||
tutorial(# Income SMALLINT NOT NULL,
|
||
tutorial(# Interests SMALLINT NOT NULL,
|
||
tutorial(# Robotness SMALLINT NOT NULL,
|
||
tutorial(# RemoteIP INTEGER NOT NULL,
|
||
tutorial(# WindowName INTEGER NOT NULL,
|
||
tutorial(# OpenerName INTEGER NOT NULL,
|
||
tutorial(# HistoryLength SMALLINT NOT NULL,
|
||
tutorial(# BrowserLanguage TEXT NOT NULL,
|
||
tutorial(# BrowserCountry TEXT NOT NULL,
|
||
tutorial(# SocialNetwork TEXT NOT NULL,
|
||
tutorial(# SocialAction TEXT NOT NULL,
|
||
tutorial(# HTTPError SMALLINT NOT NULL,
|
||
tutorial(# SendTiming INTEGER NOT NULL,
|
||
tutorial(# DNSTiming INTEGER NOT NULL,
|
||
tutorial(# ConnectTiming INTEGER NOT NULL,
|
||
tutorial(# ResponseStartTiming INTEGER NOT NULL,
|
||
tutorial(# ResponseEndTiming INTEGER NOT NULL,
|
||
tutorial(# FetchTiming INTEGER NOT NULL,
|
||
tutorial(# SocialSourceNetworkID SMALLINT NOT NULL,
|
||
tutorial(# SocialSourcePage TEXT NOT NULL,
|
||
tutorial(# ParamPrice BIGINT NOT NULL,
|
||
tutorial(# ParamOrderID TEXT NOT NULL,
|
||
tutorial(# ParamCurrency TEXT NOT NULL,
|
||
tutorial(# ParamCurrencyID SMALLINT NOT NULL,
|
||
tutorial(# OpenstatServiceName TEXT NOT NULL,
|
||
tutorial(# OpenstatCampaignID TEXT NOT NULL,
|
||
tutorial(# OpenstatAdID TEXT NOT NULL,
|
||
tutorial(# OpenstatSourceID TEXT NOT NULL,
|
||
tutorial(# UTMSource TEXT NOT NULL,
|
||
tutorial(# UTMMedium TEXT NOT NULL,
|
||
tutorial(# UTMCampaign TEXT NOT NULL,
|
||
tutorial(# UTMContent TEXT NOT NULL,
|
||
tutorial(# UTMTerm TEXT NOT NULL,
|
||
tutorial(# FromTag TEXT NOT NULL,
|
||
tutorial(# HasGCLID SMALLINT NOT NULL,
|
||
tutorial(# RefererHash BIGINT NOT NULL,
|
||
tutorial(# URLHash BIGINT NOT NULL,
|
||
tutorial(# CLID INTEGER NOT NULL
|
||
tutorial(# );
|
||
CREATE TABLE
|
||
tutorial=# SELECT create_hypertable('hits_100m_obfuscated', 'eventtime');
|
||
create_hypertable
|
||
-----------------------------------
|
||
(2,public,hits_100m_obfuscated,t)
|
||
(1 row)
|
||
|
||
tutorial=#
|
||
```
|
||
|
||
Now ok.
|
||
|
||
|
||
## Loading Data
|
||
|
||
Next - importing data:
|
||
https://docs.timescale.com/timescaledb/latest/how-to-guides/migrate-data/import-csv/#csv-import
|
||
|
||
```
|
||
SELECT WatchID::Int64, JavaEnable, toValidUTF8(Title), GoodEvent, EventTime, EventDate, CounterID::Int32, ClientIP::Int32, RegionID::Int32, UserID::Int64, CounterClass, OS, UserAgent, toValidUTF8(URL), toValidUTF8(Referer), Refresh, RefererCategoryID::Int16, RefererRegionID::Int32, URLCategoryID::Int16, URLRegionID::Int32, ResolutionWidth::Int16, ResolutionHeight::Int16, ResolutionDepth, FlashMajor, FlashMinor, FlashMinor2, NetMajor, NetMinor, UserAgentMajor::Int16, UserAgentMinor, CookieEnable, JavascriptEnable, IsMobile, MobilePhone, toValidUTF8(MobilePhoneModel), toValidUTF8(Params), IPNetworkID::Int32, TraficSourceID, SearchEngineID::Int16, toValidUTF8(SearchPhrase), AdvEngineID, IsArtifical, WindowClientWidth::Int16, WindowClientHeight::Int16, ClientTimeZone, ClientEventTime, SilverlightVersion1, SilverlightVersion2, SilverlightVersion3::Int32, SilverlightVersion4::Int16, toValidUTF8(PageCharset), CodeVersion::Int32, IsLink, IsDownload, IsNotBounce, FUniqID::Int64, toValidUTF8(OriginalURL), HID::Int32, IsOldCounter, IsEvent, IsParameter, DontCountHits, WithHash, HitColor, LocalEventTime, Age, Sex, Income, Interests::Int16, Robotness, RemoteIP::Int32, WindowName, OpenerName, HistoryLength, BrowserLanguage, BrowserCountry, toValidUTF8(SocialNetwork), toValidUTF8(SocialAction), HTTPError, SendTiming, DNSTiming, ConnectTiming, ResponseStartTiming, ResponseEndTiming, FetchTiming, SocialSourceNetworkID, toValidUTF8(SocialSourcePage), ParamPrice, toValidUTF8(ParamOrderID), ParamCurrency, ParamCurrencyID::Int16, OpenstatServiceName, OpenstatCampaignID, OpenstatAdID, OpenstatSourceID, UTMSource, UTMMedium, UTMCampaign, UTMContent, UTMTerm, FromTag, HasGCLID, RefererHash::Int64, URLHash::Int64, CLID::Int32
|
||
FROM hits_100m_obfuscated
|
||
INTO OUTFILE 'dump.csv'
|
||
FORMAT CSV
|
||
```
|
||
|
||
https://github.com/ClickHouse/ClickHouse/issues/30872
|
||
https://github.com/ClickHouse/ClickHouse/issues/30873
|
||
|
||
```
|
||
$ wc -c dump.csv
|
||
80865718769 dump.csv
|
||
```
|
||
|
||
```
|
||
milovidov@mtlog-perftest03j:~$ timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV"
|
||
panic: could not connect: pq: password authentication failed for user "postgres"
|
||
|
||
goroutine 12 [running]:
|
||
main.processBatches(0xc00001e3c0, 0xc0000a66c0)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:238 +0x887
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
milovidov@mtlog-perftest03j:~$ sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV"
|
||
panic: could not connect: pq: password authentication failed for user "postgres"
|
||
|
||
goroutine 25 [running]:
|
||
main.processBatches(0xc00019a350, 0xc00019e660)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:238 +0x887
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
|
||
|
||
milovidov@mtlog-perftest03j:~$ sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" --host localhost
|
||
flag provided but not defined: -host
|
||
Usage of timescaledb-parallel-copy:
|
||
-batch-size int
|
||
Number of rows per insert (default 5000)
|
||
-columns string
|
||
Comma-separated columns present in CSV
|
||
-connection string
|
||
PostgreSQL connection url (default "host=localhost user=postgres sslmode=disable")
|
||
-copy-options string
|
||
Additional options to pass to COPY (e.g., NULL 'NULL') (default "CSV")
|
||
-db-name string
|
||
Database where the destination table exists
|
||
-file string
|
||
File to read from rather than stdin
|
||
-header-line-count int
|
||
Number of header lines (default 1)
|
||
-limit int
|
||
Number of rows to insert overall; 0 means to insert all
|
||
-log-batches
|
||
Whether to time individual batches.
|
||
-reporting-period duration
|
||
Period to report insert stats; if 0s, intermediate results will not be reported
|
||
-schema string
|
||
Destination table's schema (default "public")
|
||
-skip-header
|
||
Skip the first line of the input
|
||
-split string
|
||
Character to split by (default ",")
|
||
-table string
|
||
Destination table for insertions (default "test_table")
|
||
-token-size int
|
||
Maximum size to use for tokens. By default, this is 64KB, so any value less than that will be ignored (default 65536)
|
||
-truncate
|
||
Truncate the destination table before insert
|
||
-verbose
|
||
Print more information about copying statistics
|
||
-version
|
||
Show the version of this tool
|
||
-workers int
|
||
Number of parallel requests to make (default 1)
|
||
|
||
|
||
milovidov@mtlog-perftest03j:~$ sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost'
|
||
panic: could not connect: pq: password authentication failed for user "postgres"
|
||
|
||
goroutine 14 [running]:
|
||
main.processBatches(0xc0000183d0, 0xc0000a66c0)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:238 +0x887
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
panic: could not connect: pq: password authentication failed for user "postgres"
|
||
|
||
goroutine 13 [running]:
|
||
main.processBatches(0xc0000183d0, 0xc0000a66c0)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:238 +0x887
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
panic: could not connect: pq: password authentication failed for user "postgres"
|
||
|
||
goroutine 12 [running]:
|
||
main.processBatches(0xc0000183d0, 0xc0000a66c0)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:238 +0x887
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
|
||
|
||
milovidov@mtlog-perftest03j:~$ sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost password 12345'
|
||
panic: could not connect: cannot parse `host=localhost password 12345`: failed to parse as DSN (invalid dsn)
|
||
|
||
goroutine 13 [running]:
|
||
main.processBatches(0xc0000183d0, 0xc0000a66c0)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:238 +0x887
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
|
||
|
||
milovidov@mtlog-perftest03j:~$ sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost password=12345'
|
||
panic: pq: invalid byte sequence for encoding "UTF8": 0xe0 0x22 0x2c
|
||
|
||
goroutine 34 [running]:
|
||
main.processBatches(0xc000132350, 0xc000136660)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
panic: pq: invalid byte sequence for encoding "UTF8": 0xe0 0x22 0x2c
|
||
|
||
goroutine 30 [running]:
|
||
main.processBatches(0xc000132350, 0xc000136660)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
```
|
||
|
||
Ok, now I've got something meaningful.
|
||
But it does not show, what line has error...
|
||
|
||
```
|
||
$ echo -e '\xe0\x22\x2c'
|
||
<EFBFBD>",
|
||
```
|
||
|
||
Let's recreate the dump:
|
||
|
||
```
|
||
rm dump.csv
|
||
|
||
SELECT WatchID::Int64, JavaEnable, toValidUTF8(Title), GoodEvent, EventTime, EventDate, CounterID::Int32, ClientIP::Int32, RegionID::Int32,
|
||
UserID::Int64, CounterClass, OS, UserAgent, toValidUTF8(URL), toValidUTF8(Referer), Refresh, RefererCategoryID::Int16, RefererRegionID::Int32,
|
||
URLCategoryID::Int16, URLRegionID::Int32, ResolutionWidth::Int16, ResolutionHeight::Int16, ResolutionDepth, FlashMajor, FlashMinor,
|
||
FlashMinor2, NetMajor, NetMinor, UserAgentMajor::Int16, toValidUTF8(UserAgentMinor::String), CookieEnable, JavascriptEnable, IsMobile, MobilePhone,
|
||
toValidUTF8(MobilePhoneModel), toValidUTF8(Params), IPNetworkID::Int32, TraficSourceID, SearchEngineID::Int16, toValidUTF8(SearchPhrase),
|
||
AdvEngineID, IsArtifical, WindowClientWidth::Int16, WindowClientHeight::Int16, ClientTimeZone, ClientEventTime,
|
||
SilverlightVersion1, SilverlightVersion2, SilverlightVersion3::Int32, SilverlightVersion4::Int16, toValidUTF8(PageCharset),
|
||
CodeVersion::Int32, IsLink, IsDownload, IsNotBounce, FUniqID::Int64, toValidUTF8(OriginalURL), HID::Int32, IsOldCounter, IsEvent,
|
||
IsParameter, DontCountHits, WithHash, toValidUTF8(HitColor::String), LocalEventTime, Age, Sex, Income, Interests::Int16, Robotness, RemoteIP::Int32,
|
||
WindowName, OpenerName, HistoryLength, toValidUTF8(BrowserLanguage::String), toValidUTF8(BrowserCountry::String),
|
||
toValidUTF8(SocialNetwork), toValidUTF8(SocialAction),
|
||
HTTPError, SendTiming, DNSTiming, ConnectTiming, ResponseStartTiming, ResponseEndTiming, FetchTiming, SocialSourceNetworkID,
|
||
toValidUTF8(SocialSourcePage), ParamPrice, toValidUTF8(ParamOrderID), toValidUTF8(ParamCurrency::String),
|
||
ParamCurrencyID::Int16, OpenstatServiceName, OpenstatCampaignID, OpenstatAdID, OpenstatSourceID,
|
||
UTMSource, UTMMedium, UTMCampaign, UTMContent, UTMTerm, FromTag, HasGCLID, RefererHash::Int64, URLHash::Int64, CLID::Int32
|
||
FROM hits_100m_obfuscated
|
||
INTO OUTFILE 'dump.csv'
|
||
FORMAT CSV
|
||
```
|
||
|
||
```
|
||
$ sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 1 --copy-options "CSV" -connection 'host=localhost password=12345'
|
||
panic: pq: value too long for type character(2)
|
||
|
||
goroutine 6 [running]:
|
||
main.processBatches(0xc0000183d0, 0xc0000a66c0)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
```
|
||
|
||
ALTER does not work:
|
||
|
||
```
|
||
tutorial=# ALTER TABLE hits_100m_obfuscated MODIFY COLUMN UserAgentMinor TEXT
|
||
tutorial-# ;
|
||
ERROR: syntax error at or near "MODIFY"
|
||
LINE 1: ALTER TABLE hits_100m_obfuscated MODIFY COLUMN UserAgentMino...
|
||
^
|
||
```
|
||
|
||
PostgreSQL is using unusual syntax for ALTER:
|
||
|
||
```
|
||
tutorial=# ALTER TABLE hits_100m_obfuscated ALTER COLUMN UserAgentMinor TYPE TEXT
|
||
;
|
||
ALTER TABLE
|
||
tutorial=# \q
|
||
```
|
||
|
||
https://github.com/ClickHouse/ClickHouse/issues/30874
|
||
|
||
Now something again:
|
||
|
||
```
|
||
$ sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 1 --copy-options "CSV" -connection 'host=localhost password=12345'
|
||
panic: pq: value "2149615427" is out of range for type integer
|
||
|
||
goroutine 6 [running]:
|
||
main.processBatches(0xc0000183d0, 0xc0000a66c0)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
```
|
||
|
||
```
|
||
$ grep -F '2149615427' dump.csv
|
||
5607505572457935073,0,"Лазар автоматические пылесосы подробная школы. Когалерея — Курсы на Автория пище Сноудента новые устами",1,"2013-07-15 07:47:45","2013-07-15",38,-1194330980,229,-6649844357037090659,0,2,3,"https://produkty%2Fkategory_id=&auto-nexus.html?blockfesty-i-korroszhego","http://tambov.irr.ua/yandex.ru/saledParam=0&user/auto.ria",1,10282,995,15014,519,1996,1781,23,14,2,"800",0,0,7,"D<>",1,1,0,0,"","",3392210,-1,0,"",0,0,1261,1007,135,"2013-07-15 21:54:13",0,0,0,0,"windows-1251;charset",1601,0,0,0,8184671896482443026,"",451733382,0,0,0,0,0,"5","2013-07-15 15:41:14",31,1,3,60,13,-1855237933,-1,-1,-1,"S0","h1","","",0,0,0,0,2149615427,36,3,0,"",0,"","NH",0,"","","","","","","","","","",0,-1103774879459415602,-2414747266057209563,0
|
||
^C
|
||
```
|
||
|
||
Let's recreate the dump:
|
||
|
||
```
|
||
rm dump.csv
|
||
|
||
SELECT WatchID::Int64, JavaEnable, toValidUTF8(Title), GoodEvent, EventTime, EventDate, CounterID::Int32, ClientIP::Int32, RegionID::Int32,
|
||
UserID::Int64, CounterClass, OS, UserAgent, toValidUTF8(URL), toValidUTF8(Referer), Refresh, RefererCategoryID::Int16, RefererRegionID::Int32,
|
||
URLCategoryID::Int16, URLRegionID::Int32, ResolutionWidth::Int16, ResolutionHeight::Int16, ResolutionDepth, FlashMajor, FlashMinor,
|
||
FlashMinor2, NetMajor, NetMinor, UserAgentMajor::Int16, toValidUTF8(UserAgentMinor::String), CookieEnable, JavascriptEnable, IsMobile, MobilePhone,
|
||
toValidUTF8(MobilePhoneModel), toValidUTF8(Params), IPNetworkID::Int32, TraficSourceID, SearchEngineID::Int16, toValidUTF8(SearchPhrase),
|
||
AdvEngineID, IsArtifical, WindowClientWidth::Int16, WindowClientHeight::Int16, ClientTimeZone, ClientEventTime,
|
||
SilverlightVersion1, SilverlightVersion2, SilverlightVersion3::Int32, SilverlightVersion4::Int16, toValidUTF8(PageCharset),
|
||
CodeVersion::Int32, IsLink, IsDownload, IsNotBounce, FUniqID::Int64, toValidUTF8(OriginalURL), HID::Int32, IsOldCounter, IsEvent,
|
||
IsParameter, DontCountHits, WithHash, toValidUTF8(HitColor::String), LocalEventTime, Age, Sex, Income, Interests::Int16, Robotness, RemoteIP::Int32,
|
||
WindowName, OpenerName, HistoryLength, toValidUTF8(BrowserLanguage::String), toValidUTF8(BrowserCountry::String),
|
||
toValidUTF8(SocialNetwork), toValidUTF8(SocialAction),
|
||
HTTPError, least(SendTiming, 30000), least(DNSTiming, 30000), least(ConnectTiming, 30000), least(ResponseStartTiming, 30000),
|
||
least(ResponseEndTiming, 30000), least(FetchTiming, 30000), SocialSourceNetworkID,
|
||
toValidUTF8(SocialSourcePage), ParamPrice, toValidUTF8(ParamOrderID), toValidUTF8(ParamCurrency::String),
|
||
ParamCurrencyID::Int16, OpenstatServiceName, OpenstatCampaignID, OpenstatAdID, OpenstatSourceID,
|
||
UTMSource, UTMMedium, UTMCampaign, UTMContent, UTMTerm, FromTag, HasGCLID, RefererHash::Int64, URLHash::Int64, CLID::Int32
|
||
FROM hits_100m_obfuscated
|
||
INTO OUTFILE 'dump.csv'
|
||
FORMAT CSV
|
||
```
|
||
|
||
PostgreSQL does not support USE database.
|
||
But I remember, that I can write `\c` instead. I guess `\c` means "change" (the database). Or it is called "schema" or "catalog".
|
||
|
||
```
|
||
milovidov@mtlog-perftest03j:~$ sudo -u postgres psql
|
||
psql (13.4 (Ubuntu 13.4-4.pgdg18.04+1))
|
||
Type "help" for help.
|
||
|
||
postgres=# SELECT count(*) FROM hits_100m_obfuscated;
|
||
ERROR: relation "hits_100m_obfuscated" does not exist
|
||
LINE 1: SELECT count(*) FROM hits_100m_obfuscated;
|
||
^
|
||
postgres=# USE tutorial;
|
||
ERROR: syntax error at or near "USE"
|
||
LINE 1: USE tutorial;
|
||
^
|
||
postgres=# \c tutorial
|
||
You are now connected to database "tutorial" as user "postgres".
|
||
tutorial=# SELECT count(*) FROM hits_100m_obfuscated;
|
||
count
|
||
-------
|
||
69996
|
||
(1 row)
|
||
```
|
||
|
||
And parallel loader already loaded some part of data into my table (it is not transactional).
|
||
Let's truncate table:
|
||
|
||
```
|
||
tutorial=# TRUNCATE TABLE hits_100m_obfuscated;
|
||
TRUNCATE TABLE
|
||
```
|
||
|
||
Surprisingly, it works!
|
||
|
||
Now it started loading data:
|
||
```
|
||
$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost password=12345'
|
||
```
|
||
|
||
But the loading is not using 16 CPU cores and it is not bottlenecked by IO.
|
||
|
||
WTF:
|
||
|
||
```
|
||
$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost password=12345'
|
||
panic: pq: could not extend file "base/16384/31264.1": wrote only 4096 of 8192 bytes at block 145407
|
||
|
||
goroutine 6 [running]:
|
||
main.processBatches(0xc0000183d0, 0xc0000a66c0)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
|
||
real 3m31.328s
|
||
user 0m35.016s
|
||
sys 0m6.964s
|
||
```
|
||
|
||
Looks like there is no space:
|
||
|
||
```
|
||
milovidov@mtlog-perftest03j:~$ df -h /var/lib/postgresql/13/main
|
||
Filesystem Size Used Avail Use% Mounted on
|
||
/dev/md1 35G 33G 1.4G 97% /
|
||
```
|
||
|
||
https://github.com/ClickHouse/ClickHouse/issues/30883
|
||
|
||
Let's move to another device.
|
||
|
||
```
|
||
milovidov@mtlog-perftest03j:~$ sudo mkdir /opt/postgresql
|
||
milovidov@mtlog-perftest03j:~$ sudo ls -l /var/lib/postgresql/13/main
|
||
total 88
|
||
drwx------ 6 postgres postgres 4096 Oct 30 00:06 base
|
||
drwx------ 2 postgres postgres 4096 Oct 30 02:07 global
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_commit_ts
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_dynshmem
|
||
drwx------ 4 postgres postgres 4096 Oct 30 02:10 pg_logical
|
||
drwx------ 4 postgres postgres 4096 Oct 29 23:27 pg_multixact
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_notify
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_replslot
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_serial
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_snapshots
|
||
drwx------ 2 postgres postgres 4096 Oct 30 02:10 pg_stat
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_stat_tmp
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_subtrans
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_tblspc
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_twophase
|
||
-rw------- 1 postgres postgres 3 Oct 29 23:27 PG_VERSION
|
||
drwx------ 3 postgres postgres 12288 Oct 30 02:10 pg_wal
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_xact
|
||
-rw------- 1 postgres postgres 88 Oct 29 23:27 postgresql.auto.conf
|
||
-rw------- 1 postgres postgres 130 Oct 30 00:03 postmaster.opts
|
||
milovidov@mtlog-perftest03j:~$ sudo chown postgres:postgres /opt/postgresql
|
||
milovidov@mtlog-perftest03j:~$ sudo mv /var/lib/postgresql/13/main/* /opt/postgresql
|
||
mv: cannot stat '/var/lib/postgresql/13/main/*': No such file or directory
|
||
milovidov@mtlog-perftest03j:~$ sudo bash -c 'mv /var/lib/postgresql/13/main/* /opt/postgresql'
|
||
sudo ln milovidov@mtlog-perftest03j:~$ #sudo ln -s /opt/postgresql /var/lib/postgresql/13/main
|
||
milovidov@mtlog-perftest03j:~$ sudo rm /var/lib/postgresql/13/main
|
||
rm: cannot remove '/var/lib/postgresql/13/main': Is a directory
|
||
milovidov@mtlog-perftest03j:~$ sudo rm -rf /var/lib/postgresql/13/main
|
||
milovidov@mtlog-perftest03j:~$ sudo ln -s /opt/postgresql /var/lib/postgresql/13/main
|
||
milovidov@mtlog-perftest03j:~$ sudo ls -l /var/lib/postgresql/13/main
|
||
lrwxrwxrwx 1 root root 15 Oct 30 02:12 /var/lib/postgresql/13/main -> /opt/postgresql
|
||
milovidov@mtlog-perftest03j:~$ sudo ls -l /opt/postgresql/
|
||
total 80
|
||
drwx------ 6 postgres postgres 4096 Oct 30 00:06 base
|
||
drwx------ 2 postgres postgres 4096 Oct 30 02:07 global
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_commit_ts
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_dynshmem
|
||
drwx------ 4 postgres postgres 4096 Oct 30 02:10 pg_logical
|
||
drwx------ 4 postgres postgres 4096 Oct 29 23:27 pg_multixact
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_notify
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_replslot
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_serial
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_snapshots
|
||
drwx------ 2 postgres postgres 4096 Oct 30 02:10 pg_stat
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_stat_tmp
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_subtrans
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_tblspc
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_twophase
|
||
-rw------- 1 postgres postgres 3 Oct 29 23:27 PG_VERSION
|
||
drwx------ 3 postgres postgres 4096 Oct 30 02:10 pg_wal
|
||
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_xact
|
||
-rw------- 1 postgres postgres 88 Oct 29 23:27 postgresql.auto.conf
|
||
-rw------- 1 postgres postgres 130 Oct 30 00:03 postmaster.opts
|
||
|
||
sudo service postgresql start
|
||
|
||
sudo less /var/log/postgresql/postgresql-13-main.log
|
||
|
||
2021-10-30 02:13:41.284 MSK [791362] FATAL: data directory "/var/lib/postgresql/13/main" has invalid permissions
|
||
2021-10-30 02:13:41.284 MSK [791362] DETAIL: Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).
|
||
pg_ctl: could not start server
|
||
Examine the log output.
|
||
|
||
sudo chmod 0700 /var/lib/postgresql/13/main /opt/postgresql
|
||
sudo service postgresql start
|
||
|
||
postgres=# \c tutorial
|
||
You are now connected to database "tutorial" as user "postgres".
|
||
tutorial=# TRUNCATE TABLE hits_100m_obfuscated;
|
||
TRUNCATE TABLE
|
||
```
|
||
|
||
```
|
||
$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost password=12345'
|
||
```
|
||
|
||
No success:
|
||
|
||
```
|
||
$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost password=12345'
|
||
panic: pq: invalid byte sequence for encoding "UTF8": 0x00
|
||
|
||
goroutine 29 [running]:
|
||
main.processBatches(0xc000132350, 0xc000136660)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
|
||
real 11m47.879s
|
||
user 3m10.980s
|
||
sys 0m45.256s
|
||
```
|
||
|
||
The error message is false, because UTF-8 **does** support 0x00. It is just some PostgreSQL quirk.
|
||
|
||
Let's recreate the dump:
|
||
|
||
```
|
||
rm dump.csv
|
||
|
||
SELECT WatchID::Int64, JavaEnable, replaceAll(toValidUTF8(Title), '\0', ''), GoodEvent, EventTime, EventDate, CounterID::Int32, ClientIP::Int32, RegionID::Int32,
|
||
UserID::Int64, CounterClass, OS, UserAgent, replaceAll(toValidUTF8(URL), '\0', ''), replaceAll(toValidUTF8(Referer), '\0', ''), Refresh, RefererCategoryID::Int16, RefererRegionID::Int32,
|
||
URLCategoryID::Int16, URLRegionID::Int32, ResolutionWidth::Int16, ResolutionHeight::Int16, ResolutionDepth, FlashMajor, FlashMinor,
|
||
FlashMinor2, NetMajor, NetMinor, UserAgentMajor::Int16, replaceAll(toValidUTF8(UserAgentMinor::String), '\0', ''), CookieEnable, JavascriptEnable, IsMobile, MobilePhone,
|
||
replaceAll(toValidUTF8(MobilePhoneModel), '\0', ''), replaceAll(toValidUTF8(Params), '\0', ''), IPNetworkID::Int32, TraficSourceID, SearchEngineID::Int16, replaceAll(toValidUTF8(SearchPhrase), '\0', ''),
|
||
AdvEngineID, IsArtifical, WindowClientWidth::Int16, WindowClientHeight::Int16, ClientTimeZone, ClientEventTime,
|
||
SilverlightVersion1, SilverlightVersion2, SilverlightVersion3::Int32, SilverlightVersion4::Int16, replaceAll(toValidUTF8(PageCharset), '\0', ''),
|
||
CodeVersion::Int32, IsLink, IsDownload, IsNotBounce, FUniqID::Int64, replaceAll(toValidUTF8(OriginalURL), '\0', ''), HID::Int32, IsOldCounter, IsEvent,
|
||
IsParameter, DontCountHits, WithHash, replaceAll(toValidUTF8(HitColor::String), '\0', ''), LocalEventTime, Age, Sex, Income, Interests::Int16, Robotness, RemoteIP::Int32,
|
||
WindowName, OpenerName, HistoryLength, replaceAll(toValidUTF8(BrowserLanguage::String), '\0', ''), replaceAll(toValidUTF8(BrowserCountry::String), '\0', ''),
|
||
replaceAll(toValidUTF8(SocialNetwork), '\0', ''), replaceAll(toValidUTF8(SocialAction), '\0', ''),
|
||
HTTPError, least(SendTiming, 30000), least(DNSTiming, 30000), least(ConnectTiming, 30000), least(ResponseStartTiming, 30000),
|
||
least(ResponseEndTiming, 30000), least(FetchTiming, 30000), SocialSourceNetworkID,
|
||
replaceAll(toValidUTF8(SocialSourcePage), '\0', ''), ParamPrice, replaceAll(toValidUTF8(ParamOrderID), '\0', ''), replaceAll(toValidUTF8(ParamCurrency::String), '\0', ''),
|
||
ParamCurrencyID::Int16, OpenstatServiceName, OpenstatCampaignID, OpenstatAdID, OpenstatSourceID,
|
||
UTMSource, UTMMedium, UTMCampaign, UTMContent, UTMTerm, FromTag, HasGCLID, RefererHash::Int64, URLHash::Int64, CLID::Int32
|
||
FROM hits_100m_obfuscated
|
||
INTO OUTFILE 'dump.csv'
|
||
FORMAT CSV
|
||
```
|
||
|
||
WTF:
|
||
|
||
```
|
||
tutorial=# SELECT count(*) FROM hits_100m_obfuscated;
|
||
ERROR: could not load library "/usr/lib/postgresql/13/lib/llvmjit.so": libLLVM-6.0.so.1: cannot open shared object file: No such file or directory
|
||
```
|
||
|
||
Maybe just install LLVM?
|
||
|
||
```
|
||
sudo apt install llvm
|
||
```
|
||
|
||
It does not help:
|
||
|
||
```
|
||
milovidov@mtlog-perftest03j:~$ sudo -u postgres psql
|
||
psql (13.4 (Ubuntu 13.4-4.pgdg18.04+1))
|
||
Type "help" for help.
|
||
|
||
postgres=# \c tutorial
|
||
You are now connected to database "tutorial" as user "postgres".
|
||
tutorial=# SELECT count(*) FROM hits_100m_obfuscated;
|
||
ERROR: could not load library "/usr/lib/postgresql/13/lib/llvmjit.so": libLLVM-6.0.so.1: cannot open shared object file: No such file or directory
|
||
tutorial=#
|
||
```
|
||
|
||
Dependency on system libraries is harmful.
|
||
|
||
```
|
||
milovidov@mtlog-perftest03j:~$ ls -l /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so
|
||
lrwxrwxrwx 1 root root 16 Apr 6 2018 /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so -> libLLVM-6.0.so.1
|
||
milovidov@mtlog-perftest03j:~$ ls -l /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1
|
||
ls: cannot access '/usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1': No such file or directory
|
||
```
|
||
|
||
https://askubuntu.com/questions/481/how-do-i-find-the-package-that-provides-a-file
|
||
|
||
```
|
||
milovidov@mtlog-perftest03j:~$ dpkg -S libLLVM-6.0.so.1
|
||
llvm-6.0-dev: /usr/lib/llvm-6.0/lib/libLLVM-6.0.so.1
|
||
libllvm6.0:amd64: /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1
|
||
```
|
||
|
||
Wow, it's absolutely broken:
|
||
|
||
```
|
||
milovidov@mtlog-perftest03j:~$ sudo apt remove llvm-6.0-dev
|
||
Reading package lists... Done
|
||
Building dependency tree
|
||
Reading state information... Done
|
||
The following packages were automatically installed and are no longer required:
|
||
libcgal13 libgmpxx4ldbl liblldb-11 libprotobuf-c1 libsfcgal1 mysql-server-core-5.7
|
||
Use 'sudo apt autoremove' to remove them.
|
||
The following packages will be REMOVED:
|
||
liblld-6.0-dev lld lld-6.0 llvm-6.0-dev
|
||
0 upgraded, 0 newly installed, 4 to remove and 293 not upgraded.
|
||
After this operation, 163 MB disk space will be freed.
|
||
Do you want to continue? [Y/n]
|
||
(Reading database ... 268641 files and directories currently installed.)
|
||
Removing liblld-6.0-dev (1:6.0-1ubuntu2) ...
|
||
Removing lld (1:6.0-41~exp5~ubuntu1) ...
|
||
Removing lld-6.0 (1:6.0-1ubuntu2) ...
|
||
Removing llvm-6.0-dev (1:6.0-1ubuntu2) ...
|
||
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
|
||
Processing triggers for libc-bin (2.27-3ubuntu1.4) ...
|
||
milovidov@mtlog-perftest03j:~$ sudo apt install llvm-6.0-dev
|
||
Reading package lists... Done
|
||
Building dependency tree
|
||
Reading state information... Done
|
||
The following packages were automatically installed and are no longer required:
|
||
libcgal13 libgmpxx4ldbl liblldb-11 libprotobuf-c1 libsfcgal1 mysql-server-core-5.7
|
||
Use 'sudo apt autoremove' to remove them.
|
||
The following NEW packages will be installed:
|
||
llvm-6.0-dev
|
||
0 upgraded, 1 newly installed, 0 to remove and 293 not upgraded.
|
||
Need to get 23.0 MB of archives.
|
||
After this operation, 160 MB of additional disk space will be used.
|
||
Get:1 http://mirror.yandex.ru/ubuntu bionic/main amd64 llvm-6.0-dev amd64 1:6.0-1ubuntu2 [23.0 MB]
|
||
Fetched 23.0 MB in 1s (42.5 MB/s)
|
||
Selecting previously unselected package llvm-6.0-dev.
|
||
(Reading database ... 267150 files and directories currently installed.)
|
||
Preparing to unpack .../llvm-6.0-dev_1%3a6.0-1ubuntu2_amd64.deb ...
|
||
Unpacking llvm-6.0-dev (1:6.0-1ubuntu2) ...
|
||
Setting up llvm-6.0-dev (1:6.0-1ubuntu2) ...
|
||
Processing triggers for libc-bin (2.27-3ubuntu1.4) ...
|
||
milovidov@mtlog-perftest03j:~$ ls -l /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so
|
||
lrwxrwxrwx 1 root root 16 Apr 6 2018 /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so -> libLLVM-6.0.so.1
|
||
milovidov@mtlog-perftest03j:~$ ls -l /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1
|
||
ls: cannot access '/usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1': No such file or directory
|
||
```
|
||
|
||
Let's remove just in case:
|
||
|
||
```
|
||
sudo apt remove llvm-6.0-dev
|
||
```
|
||
|
||
https://dba.stackexchange.com/questions/264955/handling-performance-problems-with-jit-in-postgres-12
|
||
|
||
JIT can be disabled by `set jit = off;`
|
||
|
||
```
|
||
tutorial=# set jit = off;
|
||
SET
|
||
tutorial=#
|
||
tutorial=# SELECT count(*) FROM hits_100m_obfuscated;
|
||
```
|
||
|
||
But now this SELECT query started and hanged for multiple minutes without any result.
|
||
And I see something strange in `top`:
|
||
|
||
```
|
||
792553 postgres 20 0 32.418g 0.031t 0.031t D 2.4 25.3 3:43.84 postgres: 13/main: checkpointer
|
||
814659 postgres 20 0 32.432g 0.023t 0.023t D 0.0 18.8 0:14.53 postgres: 13/main: parallel worker for PID 813980
|
||
813980 postgres 20 0 32.433g 0.023t 0.023t D 0.0 18.4 0:14.47 postgres: 13/main: postgres tutorial [local] SELECT
|
||
814657 postgres 20 0 32.432g 0.016t 0.016t D 0.0 12.6 0:09.83 postgres: 13/main: parallel worker for PID 813980
|
||
814658 postgres 20 0 32.432g 0.015t 0.015t D 2.4 12.6 0:09.45 postgres: 13/main: parallel worker for PID 813980
|
||
814656 postgres 20 0 32.432g 0.015t 0.015t D 0.0 12.0 0:07.36 postgres: 13/main: parallel worker for PID 813980
|
||
792554 postgres 20 0 32.417g 5.394g 5.392g D 0.0 4.3 0:04.78 postgres: 13/main: background writer
|
||
```
|
||
|
||
The query did not finish in 30 minutes. How it can be so enormously slow?
|
||
|
||
|
||
Loading failed, again:
|
||
|
||
```
|
||
$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost password=12345'
|
||
panic: pq: extra data after last expected column
|
||
|
||
goroutine 14 [running]:
|
||
main.processBatches(0xc0000183d0, 0xc0000a66c0)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
|
||
real 20m57.936s
|
||
user 4m14.444s
|
||
sys 1m11.412s
|
||
```
|
||
|
||
Most likely PostgreSQL cannot recognize proper CSV escaping of quotes like `"Hello "" world"`.
|
||
Let's simply remove all double quotes from String values.
|
||
|
||
```
|
||
rm dump.csv
|
||
|
||
SELECT WatchID::Int64, JavaEnable, replaceAll(replaceAll(toValidUTF8(Title), '\0', ''), '"', ''), GoodEvent, EventTime, EventDate, CounterID::Int32, ClientIP::Int32, RegionID::Int32,
|
||
UserID::Int64, CounterClass, OS, UserAgent, replaceAll(replaceAll(toValidUTF8(URL), '\0', ''), '"', ''), replaceAll(replaceAll(toValidUTF8(Referer), '\0', ''), '"', ''), Refresh, RefererCategoryID::Int16, RefererRegionID::Int32,
|
||
URLCategoryID::Int16, URLRegionID::Int32, ResolutionWidth::Int16, ResolutionHeight::Int16, ResolutionDepth, FlashMajor, FlashMinor,
|
||
FlashMinor2, NetMajor, NetMinor, UserAgentMajor::Int16, replaceAll(replaceAll(toValidUTF8(UserAgentMinor::String), '\0', ''), '"', ''), CookieEnable, JavascriptEnable, IsMobile, MobilePhone,
|
||
replaceAll(replaceAll(toValidUTF8(MobilePhoneModel), '\0', ''), '"', ''), replaceAll(replaceAll(toValidUTF8(Params), '\0', ''), '"', ''), IPNetworkID::Int32, TraficSourceID, SearchEngineID::Int16, replaceAll(replaceAll(toValidUTF8(SearchPhrase), '\0', ''), '"', ''),
|
||
AdvEngineID, IsArtifical, WindowClientWidth::Int16, WindowClientHeight::Int16, ClientTimeZone, ClientEventTime,
|
||
SilverlightVersion1, SilverlightVersion2, SilverlightVersion3::Int32, SilverlightVersion4::Int16, replaceAll(replaceAll(toValidUTF8(PageCharset), '\0', ''), '"', ''),
|
||
CodeVersion::Int32, IsLink, IsDownload, IsNotBounce, FUniqID::Int64, replaceAll(replaceAll(toValidUTF8(OriginalURL), '\0', ''), '"', ''), HID::Int32, IsOldCounter, IsEvent,
|
||
IsParameter, DontCountHits, WithHash, replaceAll(replaceAll(toValidUTF8(HitColor::String), '\0', ''), '"', ''), LocalEventTime, Age, Sex, Income, Interests::Int16, Robotness, RemoteIP::Int32,
|
||
WindowName, OpenerName, HistoryLength, replaceAll(replaceAll(toValidUTF8(BrowserLanguage::String), '\0', ''), '"', ''), replaceAll(replaceAll(toValidUTF8(BrowserCountry::String), '\0', ''), '"', ''),
|
||
replaceAll(replaceAll(toValidUTF8(SocialNetwork), '\0', ''), '"', ''), replaceAll(replaceAll(toValidUTF8(SocialAction), '\0', ''), '"', ''),
|
||
HTTPError, least(SendTiming, 30000), least(DNSTiming, 30000), least(ConnectTiming, 30000), least(ResponseStartTiming, 30000),
|
||
least(ResponseEndTiming, 30000), least(FetchTiming, 30000), SocialSourceNetworkID,
|
||
replaceAll(replaceAll(toValidUTF8(SocialSourcePage), '\0', ''), '"', ''), ParamPrice, replaceAll(replaceAll(toValidUTF8(ParamOrderID), '\0', ''), '"', ''), replaceAll(replaceAll(toValidUTF8(ParamCurrency::String), '\0', ''), '"', ''),
|
||
ParamCurrencyID::Int16, OpenstatServiceName, OpenstatCampaignID, OpenstatAdID, OpenstatSourceID,
|
||
UTMSource, UTMMedium, UTMCampaign, UTMContent, UTMTerm, FromTag, HasGCLID, RefererHash::Int64, URLHash::Int64, CLID::Int32
|
||
FROM hits_100m_obfuscated
|
||
INTO OUTFILE 'dump.csv'
|
||
FORMAT CSV
|
||
```
|
||
|
||
Oops, another trouble:
|
||
|
||
```
|
||
$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost password=12345'
|
||
panic: pq: unterminated CSV quoted field
|
||
|
||
goroutine 19 [running]:
|
||
main.processBatches(0xc000132350, 0xc000136660)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
|
||
real 0m38.278s
|
||
user 0m13.544s
|
||
sys 0m3.552s
|
||
```
|
||
|
||
I have hypothesis, maybe it is interpreting both backslashes and quotes in CSV?
|
||
We need to check, what is CSV, exactly, from TimescaleDB's standpoint.
|
||
|
||
https://www.postgresql.org/docs/9.2/sql-copy.html
|
||
|
||
Yes, PostgreSQL is using "fake CSV":
|
||
|
||
> This format option is used for importing and exporting the Comma Separated Value (CSV) file format used by many other programs, such as spreadsheets. Instead of the escaping rules used by PostgreSQL's standard text format, it produces and recognizes the common CSV escaping mechanism.
|
||
|
||
> The values in each record are separated by the DELIMITER character. If the value contains the delimiter character, the QUOTE character, the NULL string, a carriage return, or line feed character, then the whole value is prefixed and suffixed by the QUOTE character, and any occurrence within the value of a QUOTE character or the ESCAPE character is preceded by the escape character.
|
||
|
||
So, it looks like CSV but is using C-style backslash escapes inside values.
|
||
Let's remove both backslash and quote from our strings to make PostgreSQL happy.
|
||
|
||
```
|
||
rm dump.csv
|
||
|
||
SELECT WatchID::Int64, JavaEnable, replaceAll(replaceAll(replaceAll(toValidUTF8(Title), '\0', ''), '"', ''), '\\', ''), GoodEvent, EventTime, EventDate, CounterID::Int32, ClientIP::Int32, RegionID::Int32,
|
||
UserID::Int64, CounterClass, OS, UserAgent, replaceAll(replaceAll(replaceAll(toValidUTF8(URL), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(Referer), '\0', ''), '"', ''), '\\', ''), Refresh, RefererCategoryID::Int16, RefererRegionID::Int32,
|
||
URLCategoryID::Int16, URLRegionID::Int32, ResolutionWidth::Int16, ResolutionHeight::Int16, ResolutionDepth, FlashMajor, FlashMinor,
|
||
FlashMinor2, NetMajor, NetMinor, UserAgentMajor::Int16, replaceAll(replaceAll(replaceAll(toValidUTF8(UserAgentMinor::String), '\0', ''), '"', ''), '\\', ''), CookieEnable, JavascriptEnable, IsMobile, MobilePhone,
|
||
replaceAll(replaceAll(replaceAll(toValidUTF8(MobilePhoneModel), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(Params), '\0', ''), '"', ''), '\\', ''), IPNetworkID::Int32, TraficSourceID, SearchEngineID::Int16, replaceAll(replaceAll(replaceAll(toValidUTF8(SearchPhrase), '\0', ''), '"', ''), '\\', ''),
|
||
AdvEngineID, IsArtifical, WindowClientWidth::Int16, WindowClientHeight::Int16, ClientTimeZone, ClientEventTime,
|
||
SilverlightVersion1, SilverlightVersion2, SilverlightVersion3::Int32, SilverlightVersion4::Int16, replaceAll(replaceAll(replaceAll(toValidUTF8(PageCharset), '\0', ''), '"', ''), '\\', ''),
|
||
CodeVersion::Int32, IsLink, IsDownload, IsNotBounce, FUniqID::Int64, replaceAll(replaceAll(replaceAll(toValidUTF8(OriginalURL), '\0', ''), '"', ''), '\\', ''), HID::Int32, IsOldCounter, IsEvent,
|
||
IsParameter, DontCountHits, WithHash, replaceAll(replaceAll(replaceAll(toValidUTF8(HitColor::String), '\0', ''), '"', ''), '\\', ''), LocalEventTime, Age, Sex, Income, Interests::Int16, Robotness, RemoteIP::Int32,
|
||
WindowName, OpenerName, HistoryLength, replaceAll(replaceAll(replaceAll(toValidUTF8(BrowserLanguage::String), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(BrowserCountry::String), '\0', ''), '"', ''), '\\', ''),
|
||
replaceAll(replaceAll(replaceAll(toValidUTF8(SocialNetwork), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(SocialAction), '\0', ''), '"', ''), '\\', ''),
|
||
HTTPError, least(SendTiming, 30000), least(DNSTiming, 30000), least(ConnectTiming, 30000), least(ResponseStartTiming, 30000),
|
||
least(ResponseEndTiming, 30000), least(FetchTiming, 30000), SocialSourceNetworkID,
|
||
replaceAll(replaceAll(replaceAll(toValidUTF8(SocialSourcePage), '\0', ''), '"', ''), '\\', ''), ParamPrice, replaceAll(replaceAll(replaceAll(toValidUTF8(ParamOrderID), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(ParamCurrency::String), '\0', ''), '"', ''), '\\', ''),
|
||
ParamCurrencyID::Int16, OpenstatServiceName, OpenstatCampaignID, OpenstatAdID, OpenstatSourceID,
|
||
UTMSource, UTMMedium, UTMCampaign, UTMContent, UTMTerm, FromTag, HasGCLID, RefererHash::Int64, URLHash::Int64, CLID::Int32
|
||
FROM hits_100m_obfuscated
|
||
INTO OUTFILE 'dump.csv'
|
||
FORMAT CSV
|
||
```
|
||
|
||
It does not work at all:
|
||
|
||
```
|
||
$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost password=12345'
|
||
panic: pq: invalid input syntax for type bigint: " ПЕСНЮ ПРЕСТИВАРКЕ ДОЛЖНО ЛИ,1,306,31432,304,22796,1011,879,37,15,5,700.224,2,7,13,D<>,1,1,0,0,",",3039109,-1,0,",0,0,779,292,135,2013-07-31 09:37:12,0,0,0,0,windows,1,0,0,0,6888403766694734958,http%3A//maps&sort_order_Kurzarm_DOB&sr=http%3A%2F%3Fpage=/ok.html?1=1&cid=577&oki=1&op_seo_entry=&op_uid=13225;IC"
|
||
|
||
goroutine 20 [running]:
|
||
main.processBatches(0xc0000183d0, 0xc0000a66c0)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
|
||
real 1m47.915s
|
||
user 0m33.676s
|
||
sys 0m8.028s
|
||
```
|
||
|
||
Maybe let's switch from CSV to TSV that PostgreSQL seems to understand better.
|
||
|
||
```
|
||
SELECT WatchID::Int64, JavaEnable, replaceAll(replaceAll(replaceAll(toValidUTF8(Title), '\0', ''), '"', ''), '\\', ''), GoodEvent, EventTime, EventDate, CounterID::Int32, ClientIP::Int32, RegionID::Int32,
|
||
UserID::Int64, CounterClass, OS, UserAgent, replaceAll(replaceAll(replaceAll(toValidUTF8(URL), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(Referer), '\0', ''), '"', ''), '\\', ''), Refresh, RefererCategoryID::Int16, RefererRegionID::Int32,
|
||
URLCategoryID::Int16, URLRegionID::Int32, ResolutionWidth::Int16, ResolutionHeight::Int16, ResolutionDepth, FlashMajor, FlashMinor,
|
||
FlashMinor2, NetMajor, NetMinor, UserAgentMajor::Int16, replaceAll(replaceAll(replaceAll(toValidUTF8(UserAgentMinor::String), '\0', ''), '"', ''), '\\', ''), CookieEnable, JavascriptEnable, IsMobile, MobilePhone,
|
||
replaceAll(replaceAll(replaceAll(toValidUTF8(MobilePhoneModel), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(Params), '\0', ''), '"', ''), '\\', ''), IPNetworkID::Int32, TraficSourceID, SearchEngineID::Int16, replaceAll(replaceAll(replaceAll(toValidUTF8(SearchPhrase), '\0', ''), '"', ''), '\\', ''),
|
||
AdvEngineID, IsArtifical, WindowClientWidth::Int16, WindowClientHeight::Int16, ClientTimeZone, ClientEventTime,
|
||
SilverlightVersion1, SilverlightVersion2, SilverlightVersion3::Int32, SilverlightVersion4::Int16, replaceAll(replaceAll(replaceAll(toValidUTF8(PageCharset), '\0', ''), '"', ''), '\\', ''),
|
||
CodeVersion::Int32, IsLink, IsDownload, IsNotBounce, FUniqID::Int64, replaceAll(replaceAll(replaceAll(toValidUTF8(OriginalURL), '\0', ''), '"', ''), '\\', ''), HID::Int32, IsOldCounter, IsEvent,
|
||
IsParameter, DontCountHits, WithHash, replaceAll(replaceAll(replaceAll(toValidUTF8(HitColor::String), '\0', ''), '"', ''), '\\', ''), LocalEventTime, Age, Sex, Income, Interests::Int16, Robotness, RemoteIP::Int32,
|
||
WindowName, OpenerName, HistoryLength, replaceAll(replaceAll(replaceAll(toValidUTF8(BrowserLanguage::String), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(BrowserCountry::String), '\0', ''), '"', ''), '\\', ''),
|
||
replaceAll(replaceAll(replaceAll(toValidUTF8(SocialNetwork), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(SocialAction), '\0', ''), '"', ''), '\\', ''),
|
||
HTTPError, least(SendTiming, 30000), least(DNSTiming, 30000), least(ConnectTiming, 30000), least(ResponseStartTiming, 30000),
|
||
least(ResponseEndTiming, 30000), least(FetchTiming, 30000), SocialSourceNetworkID,
|
||
replaceAll(replaceAll(replaceAll(toValidUTF8(SocialSourcePage), '\0', ''), '"', ''), '\\', ''), ParamPrice, replaceAll(replaceAll(replaceAll(toValidUTF8(ParamOrderID), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(ParamCurrency::String), '\0', ''), '"', ''), '\\', ''),
|
||
ParamCurrencyID::Int16, OpenstatServiceName, OpenstatCampaignID, OpenstatAdID, OpenstatSourceID,
|
||
UTMSource, UTMMedium, UTMCampaign, UTMContent, UTMTerm, FromTag, HasGCLID, RefererHash::Int64, URLHash::Int64, CLID::Int32
|
||
FROM hits_100m_obfuscated
|
||
INTO OUTFILE 'dump.tsv'
|
||
FORMAT TSV
|
||
```
|
||
|
||
But how to pass TSV to `timescaledb-parallel-copy` tool?
|
||
|
||
```
|
||
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --workers 16 -connection 'host=localhost password=12345' panic: pq: invalid input syntax for type bigint: "9076997425961590393\t0\tКино\t1\t2013-07-06 17:47:29\t2013-07-06\t225510\t-1056921538\t229\t3467937489264290637\t0\t2\t3\thttp://liver.ru/belgorod/page/1006.jки/доп_приборы\thttp://video.yandex.ru/1.561.540.000703/?order_Kurzarm_alia\t0\t16124\t20\t14328\t22\t1638\t1658\t23\t15\t7\t700\t0\t0\t17\tD<74>\t1\t1\t0\t0\t\t\t2095433\t-1\t0\t\t0\t1\t1369\t713\t135\t2013-07-06 16:25:42\t0\t0\t0\t0\twindows\t1601\t0\t0\t0\t5566829288329160346\t\t940752990\t0\t0\t0\t0\t0\t5\t2013-07-06 01:32:13\t55\t2\t3\t0\t2\t-1352932082\t-1\t-1\t-1\tS0\t<>\\f\t\t\t0\t0\t0\t0\t0\t0\t0\t0\t\t0\t\tNH\t0\t\t\t\t\t\t\t\t\t\t\t0\t6811023348165660452\t7011450103338277684\t0"
|
||
|
||
goroutine 20 [running]:
|
||
main.processBatches(0xc0000183d0, 0xc0000a66c0)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
|
||
real 0m0.304s
|
||
user 0m0.044s
|
||
sys 0m0.044s
|
||
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --copy-options "TEXT" --workers 16 -connection 'host=localhost password=12345'
|
||
panic: pq: syntax error at or near "TEXT"
|
||
|
||
goroutine 18 [running]:
|
||
main.processBatches(0xc0000183d0, 0xc0000a66c0)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
|
||
real 0m0.044s
|
||
user 0m0.048s
|
||
sys 0m0.036s
|
||
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --copy-options "text" --workers 16 -connection 'host=localhost password=12345'
|
||
panic: pq: syntax error at or near "text"
|
||
|
||
goroutine 18 [running]:
|
||
main.processBatches(0xc0000183d0, 0xc0000a66c0)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
panic: pq: syntax error at or near "text"
|
||
|
||
goroutine 19 [running]:
|
||
main.processBatches(0xc0000183d0, 0xc0000a66c0)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
|
||
real 0m0.057s
|
||
user 0m0.060s
|
||
sys 0m0.028s
|
||
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --copy-options "Text" --workers 16 -connection 'host=localhost password=12345'
|
||
panic: pq: syntax error at or near "Text"
|
||
|
||
goroutine 11 [running]:
|
||
main.processBatches(0xc0000183d0, 0xc0000a66c0)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
|
||
real 0m0.041s
|
||
user 0m0.052s
|
||
sys 0m0.032s
|
||
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --copy-options "FORMAT text" --workers 16 -connection 'host=localhost password=12345'
|
||
panic: pq: syntax error at or near "FORMAT"
|
||
|
||
goroutine 21 [running]:
|
||
main.processBatches(0xc00019a350, 0xc00019e660)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
|
||
real 0m0.045s
|
||
user 0m0.052s
|
||
sys 0m0.028s
|
||
```
|
||
|
||
Nothing works:
|
||
|
||
```
|
||
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --help
|
||
Usage of timescaledb-parallel-copy:
|
||
-batch-size int
|
||
Number of rows per insert (default 5000)
|
||
-columns string
|
||
Comma-separated columns present in CSV
|
||
-connection string
|
||
PostgreSQL connection url (default "host=localhost user=postgres sslmode=disable")
|
||
-copy-options string
|
||
Additional options to pass to COPY (e.g., NULL 'NULL') (default "CSV")
|
||
-db-name string
|
||
Database where the destination table exists
|
||
-file string
|
||
File to read from rather than stdin
|
||
-header-line-count int
|
||
Number of header lines (default 1)
|
||
-limit int
|
||
Number of rows to insert overall; 0 means to insert all
|
||
-log-batches
|
||
Whether to time individual batches.
|
||
-reporting-period duration
|
||
Period to report insert stats; if 0s, intermediate results will not be reported
|
||
-schema string
|
||
Destination table's schema (default "public")
|
||
-skip-header
|
||
Skip the first line of the input
|
||
-split string
|
||
Character to split by (default ",")
|
||
-table string
|
||
Destination table for insertions (default "test_table")
|
||
-token-size int
|
||
Maximum size to use for tokens. By default, this is 64KB, so any value less than that will be ignored (default 65536)
|
||
-truncate
|
||
Truncate the destination table before insert
|
||
-verbose
|
||
Print more information about copying statistics
|
||
-version
|
||
Show the version of this tool
|
||
-workers int
|
||
Number of parallel requests to make (default 1)
|
||
|
||
real 0m0.009s
|
||
user 0m0.004s
|
||
sys 0m0.000s
|
||
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --truncate --copy-options "" --workers 16 -connection 'host=localhost password=12345'
|
||
panic: pq: invalid input syntax for type bigint: "9076997425961590393 0 Кино 1 2013-07-06 17:47:29 2013-07-06 225510 -1056921538 229 3467937489264290637 0 2 3http://liver.ru/belgorod/page/1006.jки/доп_приборы http://video.yandex.ru/1.561.540.000703/?order_Kurzarm_alia 0 16124 20 14328 22 1638 1658 23 15 7 700 0017 D<> 1 1 0 0 2095433 -1 0 0 1 1369 713 135 2013-07-06 16:25:42 0 0 0 0 windows 1601 000 5566829288329160346 940752990 0 0 0 0 0 5 2013-07-06 01:32:13 55 2 3 0 2 -1352932082 -1 -1 -1 S0<53>\f 0 0 0 0 0 0 0 0 0 NH 0 06811023348165660452 7011450103338277684 0"
|
||
|
||
goroutine 13 [running]:
|
||
main.processBatches(0xc000019140, 0xc0001eb080)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
|
||
real 0m0.191s
|
||
user 0m0.036s
|
||
sys 0m0.040s
|
||
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --truncate --copy-options "NULL AS '\N'" --workers 16 -connection 'host=localhost password=12345'
|
||
panic: pq: invalid input syntax for type bigint: "9076997425961590393 0 Кино 1 2013-07-06 17:47:29 2013-07-06 225510 -1056921538 229 3467937489264290637 0 2 3http://liver.ru/belgorod/page/1006.jки/доп_приборы http://video.yandex.ru/1.561.540.000703/?order_Kurzarm_alia 0 16124 20 14328 22 1638 1658 23 15 7 700 0017 D<> 1 1 0 0 2095433 -1 0 0 1 1369 713 135 2013-07-06 16:25:42 0 0 0 0 windows 1601 000 5566829288329160346 940752990 0 0 0 0 0 5 2013-07-06 01:32:13 55 2 3 0 2 -1352932082 -1 -1 -1 S0<53>\f 0 0 0 0 0 0 0 0 0 NH 0 06811023348165660452 7011450103338277684 0"
|
||
|
||
goroutine 11 [running]:
|
||
main.processBatches(0xc000018900, 0xc0002886c0)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
|
||
real 0m0.187s
|
||
user 0m0.020s
|
||
sys 0m0.048s
|
||
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --truncate --copy-options "DELIMITER AS '\t'" --workers 16 -connection 'host=localhost password=12345'
|
||
panic: pq: conflicting or redundant options
|
||
|
||
goroutine 13 [running]:
|
||
main.processBatches(0xc000019140, 0xc0001e9080)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
|
||
real 0m0.196s
|
||
user 0m0.048s
|
||
sys 0m0.020s
|
||
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --truncate --copy-options "TEXT DELIMITER AS '\t'" --workers 16 -connection 'host=localhost password=12345'
|
||
panic: pq: syntax error at or near "TEXT"
|
||
|
||
goroutine 22 [running]:
|
||
main.processBatches(0xc000019140, 0xc0001e9080)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
panic: pq: syntax error at or near "TEXT"
|
||
|
||
goroutine 11 [running]:
|
||
main.processBatches(0xc000019140, 0xc0001e9080)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
|
||
real 0m0.191s
|
||
user 0m0.032s
|
||
sys 0m0.036s
|
||
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --truncate --copy-options "DELIMITER AS e'\t'" --workers 16 -connection 'host=localhost password=12345'
|
||
panic: pq: conflicting or redundant options
|
||
|
||
goroutine 26 [running]:
|
||
main.processBatches(0xc0001330d0, 0xc0001e3020)
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
|
||
created by main.main
|
||
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
|
||
|
||
real 0m0.169s
|
||
user 0m0.056s
|
||
sys 0m0.016s
|
||
```
|
||
|
||
I will try to avoid `timescaledb-parallel-copy` and use `psql` instead.
|
||
|
||
```
|
||
milovidov@mtlog-perftest03j:~$ sudo -u postgres psql
|
||
psql (13.4 (Ubuntu 13.4-4.pgdg18.04+1))
|
||
Type "help" for help.
|
||
|
||
postgres=# \c tutorial
|
||
You are now connected to database "tutorial" as user "postgres".
|
||
tutorial=# timing
|
||
tutorial-# COPY hits_100m_obfuscated FROM 'dump.tsv'
|
||
tutorial-# ;
|
||
ERROR: syntax error at or near "timing"
|
||
LINE 1: timing
|
||
^
|
||
tutorial=# \timing
|
||
Timing is on.
|
||
tutorial=# COPY hits_100m_obfuscated FROM 'dump.tsv';
|
||
ERROR: could not open file "dump.tsv" for reading: No such file or directory
|
||
HINT: COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility such as psql's \copy.
|
||
Time: 4.348 ms
|
||
tutorial=# \copy hits_100m_obfuscated FROM 'dump.tsv';
|
||
```
|
||
|
||
It started to do something... fairly slow with using less than one CPU core.
|
||
|
||
Folks from TimescaleDB always recommend to enable compression, which is not by default.
|
||
Let's read about it:
|
||
|
||
https://docs.timescale.com/timescaledb/latest/how-to-guides/compression/
|
||
|
||
> We strongly recommend that you understand how compression works before you start enabling it on your hypertables.
|
||
|
||
The amount of hackery to overcome PostgreSQL limitations is overwhelming:
|
||
|
||
> When compression is enabled, TimescaleDB converts data stored in many rows into an array. This means that instead of using lots of rows to store the data, it stores the same data in a single row.
|
||
|
||
In the meantime, copy finished in "just" 1.5 hours, 19 245 rows/second. This is extremely slow, even for single core.
|
||
|
||
```
|
||
tutorial=# \copy hits_100m_obfuscated FROM 'dump.tsv';
|
||
COPY 100000000
|
||
Time: 5195909.154 ms (01:26:35.909)
|
||
```
|
||
|
||
## Running Benchmark
|
||
|
||
Let's prepare for benchmark...
|
||
What is needed to execute single query in batch mode?
|
||
|
||
`man psql`
|
||
|
||
```
|
||
sudo -u postgres psql tutorial -t -c '\timing' -c 'SELECT 1' | grep 'Time'
|
||
```
|
||
|
||
Now we are ready to run our benchmark.
|
||
|
||
PostgreSQL does not have `SHOW PROCESSLIST`.
|
||
It has `select * from pg_stat_activity;` instead.
|
||
|
||
https://ma.ttias.be/show-full-processlist-equivalent-of-mysql-for-postgresql/
|
||
|
||
But it does not show query progress.
|
||
The first query `SELECT count(*) FROM hits_100m_obfuscated` just hanged. It reads something from disk...
|
||
|
||
Let's check the data volume:
|
||
|
||
```
|
||
$ sudo du -hcs /opt/postgresql/
|
||
68G /opt/postgresql/
|
||
```
|
||
|
||
Looks consistent for uncompressed data.
|
||
|
||
```
|
||
./benchmark.sh
|
||
|
||
grep -oP 'Time: \d+' log | grep -oP '\d+' | awk '{ if (n % 3 == 0) { printf("[") }; ++n; printf("%g", $1 / 1000); if (n % 3 == 0) { printf("],\n") } else { printf(", ") } }'
|
||
```
|
||
|
||
Now let's enable compression.
|
||
|
||
```
|
||
ALTER TABLE hits_100m_obfuscated SET (timescaledb.compress);
|
||
SELECT add_compression_policy('hits_100m_obfuscated', INTERVAL '0 seconds');
|
||
```
|
||
|
||
```
|
||
milovidov@mtlog-perftest03j:~ClickHouse/benchmark/timescaledb$ sudo -u postgres psql tutorial
|
||
psql (13.4 (Ubuntu 13.4-4.pgdg18.04+1))
|
||
Type "help" for help.
|
||
|
||
tutorial=# ALTER TABLE hits_100m_obfuscated SET (timescaledb.compress);
|
||
ALTER TABLE
|
||
tutorial=# SELECT add_compression_policy('hits_100m_obfuscated', INTERVAL '0 seconds');
|
||
add_compression_policy
|
||
------------------------
|
||
1000
|
||
(1 row)
|
||
```
|
||
|
||
Ok, in `top` I see that it started compression with using single CPU core.
|
||
|
||
```
|
||
300464 postgres 20 0 32.456g 932044 911452 D 48.0 0.7 1:08.11 postgres: 13/main: Compression Policy [1000]
|
||
```
|
||
|
||
Let's also define better order of data:
|
||
|
||
```
|
||
ALTER TABLE hits_100m_obfuscated
|
||
SET (timescaledb.compress,
|
||
timescaledb.compress_orderby = 'counterid, userid, event_time');
|
||
```
|
||
|
||
The query hanged. Maybe it's waiting for finish of previous compression?
|
||
|
||
After several minutes it answered:
|
||
|
||
```
|
||
ERROR: cannot change configuration on already compressed chunks
|
||
DETAIL: There are compressed chunks that prevent changing the existing compression configuration.
|
||
```
|
||
|
||
Ok, at least some of the chunks will have the proper order.
|
||
|
||
After a few hours looks like the compression finished.
|
||
|
||
```
|
||
sudo ncdu /var/lib/postgresql/13/main/
|
||
|
||
28.9 GiB [##########] /base
|
||
```
|
||
|
||
Yes, looks like it's compressed. About two times - not too much.
|
||
|
||
Let's rerun the benchmark.
|
||
|
||
Ok, it's slightly faster.
|