81 KiB
This is a "usability testing" of TimescaleDB. I did not use TimescaleDB before. I will try to install it, load the data and conduct benchmarks. And record every obstacle that I will face. Usability testing need to be conducted by the most clueless person in the room. Doing this "usability testing" requires a bit of patience and courage (to publish all the struggles as is).
Note: insted of using clear VM, I have to run benchmark on exactly the same baremetal server where all other benchmarks were run.
Installation
Install as following: https://docs.timescale.com/timescaledb/latest/how-to-guides/install-timescaledb/self-hosted/ubuntu/installation-apt-ubuntu/#installation-apt-ubuntu
I've noticed that TimescaleDB documentation website does not have favicon in contrast to the main page. In other means, it is quite neat.
sudo apt install postgresql-common
sudo sh /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
sudo sh -c "echo 'deb [signed-by=/usr/share/keyrings/timescale.keyring] https://packagecloud.io/timescale/timescaledb/ubuntu/ $(lsb_release -c -s) main' > /etc/apt/sources.list.d/timescaledb.list"
wget --quiet -O - https://packagecloud.io/timescale/timescaledb/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/timescale.keyring
sudo apt-get update
sudo apt install timescaledb-2-postgresql-13
It recommends to tune it:
sudo apt install timescaledb-tune
sudo timescaledb-tune --quiet --yes
Using postgresql.conf at this path:
/etc/postgresql/13/main/postgresql.conf
Writing backup to:
/tmp/timescaledb_tune.backup202110292328
Recommendations based on 125.88 GB of available memory and 32 CPUs for PostgreSQL 13
shared_preload_libraries = 'timescaledb' # (change requires restart)
shared_buffers = 32226MB
effective_cache_size = 96678MB
maintenance_work_mem = 2047MB
work_mem = 10312kB
timescaledb.max_background_workers = 8
max_worker_processes = 43
max_parallel_workers_per_gather = 16
max_parallel_workers = 32
wal_buffers = 16MB
min_wal_size = 512MB
default_statistics_target = 500
random_page_cost = 1.1
checkpoint_completion_target = 0.9
max_locks_per_transaction = 512
autovacuum_max_workers = 10
autovacuum_naptime = 10
effective_io_concurrency = 256
timescaledb.last_tuned = '2021-10-29T23:28:49+03:00'
timescaledb.last_tuned_version = '0.12.0'
Saving changes to: /etc/postgresql/13/main/postgresql.conf
sudo service postgresql restart
Post-install setup: https://docs.timescale.com/timescaledb/latest/how-to-guides/install-timescaledb/post-install-setup/
$ psql -U postgres -h localhost
Password for user postgres:
psql: error: connection to server at "localhost" (::1), port 5432 failed: fe_sendauth: no password supplied
How to set up password?
milovidov@mtlog-perftest03j:~$ psql -U postgres -h localhost
Password for user postgres:
psql: error: connection to server at "localhost" (::1), port 5432 failed: fe_sendauth: no password supplied
milovidov@mtlog-perftest03j:~$ psql
psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL: role "milovidov" does not exist
milovidov@mtlog-perftest03j:~$ sudo psql
psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL: role "root" does not exist
milovidov@mtlog-perftest03j:~$ psql -U postgres
psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL: Peer authentication failed for user "postgres"
milovidov@mtlog-perftest03j:~$ psql -U postgres -h localost
psql: error: could not translate host name "localost" to address: Name or service not known
milovidov@mtlog-perftest03j:~$ sudo psql -U postgres -h localost
psql: error: could not translate host name "localost" to address: Name or service not known
milovidov@mtlog-perftest03j:~$ sudo psql -U postgres -h localhost
Password for user postgres:
psql: error: connection to server at "localhost" (::1), port 5432 failed: fe_sendauth: no password supplied
milovidov@mtlog-perftest03j:~$ sudo -u postgres psql -h localhost
Password for user postgres:
psql: error: connection to server at "localhost" (::1), port 5432 failed: fe_sendauth: no password supplied
I found an answer here: https://stackoverflow.com/questions/12720967/how-to-change-postgresql-user-password
$ sudo -u postgres psql
psql (13.4 (Ubuntu 13.4-4.pgdg18.04+1), server 9.5.25)
Type "help" for help.
postgres=# \password postgres
Enter new password:
Enter it again:
postgres=#
CREATE database tutorial;
postgres=# CREATE EXTENSION IF NOT EXISTS timescaledb;
ERROR: could not open extension control file "/usr/share/postgresql/9.5/extension/timescaledb.control": No such file or directory
Looks like I have old PostgreSQL.
$ ls -l /usr/share/postgresql/
10/ 11/ 13/ 9.5/
But there is also newer PostgreSQL.
$ psql --version
psql (PostgreSQL) 13.4 (Ubuntu 13.4-4.pgdg18.04+1)
psql is new, so what is wrong?
Looks like I have all versions running simultaneously?
https://askubuntu.com/questions/17823/how-to-list-all-installed-packages
$ ps auxw | grep postgres
postgres 718818 0.0 0.5 33991600 730184 ? Ss 23:29 0:00 /usr/lib/postgresql/13/bin/postgres -D /var/lib/postgresql/13/main -c config_file=/etc/postgresql/13/main/postgresql.conf
postgres 718825 0.0 0.0 320356 27660 ? S 23:29 0:00 /usr/lib/postgresql/10/bin/postgres -D /var/lib/postgresql/10/main -c config_file=/etc/postgresql/10/main/postgresql.conf
postgres 718826 0.0 0.0 320712 27900 ? S 23:29 0:00 /usr/lib/postgresql/11/bin/postgres -D /var/lib/postgresql/11/main -c config_file=/etc/postgresql/11/main/postgresql.conf
postgres 718829 0.0 0.0 320468 7092 ? Ss 23:29 0:00 postgres: 10/main: checkpointer process
postgres 718830 0.0 0.0 320356 4300 ? Ss 23:29 0:00 postgres: 10/main: writer process
postgres 718831 0.0 0.0 320356 9204 ? Ss 23:29 0:00 postgres: 10/main: wal writer process
postgres 718832 0.0 0.0 320776 6964 ? Ss 23:29 0:00 postgres: 10/main: autovacuum launcher process
postgres 718833 0.0 0.0 175404 3596 ? Ss 23:29 0:00 postgres: 10/main: stats collector process
postgres 718834 0.0 0.0 320640 5052 ? Ss 23:29 0:00 postgres: 10/main: bgworker: logical replication launcher
postgres 718835 0.0 0.0 320820 5592 ? Ss 23:29 0:00 postgres: 11/main: checkpointer
postgres 718836 0.0 0.0 320712 4164 ? Ss 23:29 0:00 postgres: 11/main: background writer
postgres 718837 0.0 0.0 320712 9040 ? Ss 23:29 0:00 postgres: 11/main: walwriter
postgres 718838 0.0 0.0 321116 6824 ? Ss 23:29 0:00 postgres: 11/main: autovacuum launcher
postgres 718839 0.0 0.0 175752 3652 ? Ss 23:29 0:00 postgres: 11/main: stats collector
postgres 718840 0.0 0.0 321120 6640 ? Ss 23:29 0:00 postgres: 11/main: logical replication launcher
postgres 718842 0.0 0.1 33991700 263860 ? Ss 23:29 0:00 postgres: 13/main: checkpointer
postgres 718843 0.0 0.2 33991600 264096 ? Ss 23:29 0:00 postgres: 13/main: background writer
postgres 718844 0.0 0.0 33991600 22044 ? Ss 23:29 0:00 postgres: 13/main: walwriter
postgres 718845 0.0 0.0 33992284 7040 ? Ss 23:29 0:00 postgres: 13/main: autovacuum launcher
postgres 718846 0.0 0.0 177920 4320 ? Ss 23:29 0:00 postgres: 13/main: stats collector
postgres 718847 0.0 0.0 33992136 7972 ? Ss 23:29 0:00 postgres: 13/main: TimescaleDB Background Worker Launcher
postgres 718848 0.0 0.0 33992164 7248 ? Ss 23:29 0:00 postgres: 13/main: logical replication launcher
postgres 718857 0.0 0.0 304492 26284 ? S 23:29 0:00 /usr/lib/postgresql/9.5/bin/postgres -D /var/lib/postgresql/9.5/main -c config_file=/etc/postgresql/9.5/main/postgresql.conf
postgres 718859 0.0 0.0 304592 6480 ? Ss 23:29 0:00 postgres: checkpointer process
postgres 718860 0.0 0.0 304492 5656 ? Ss 23:29 0:00 postgres: writer process
postgres 718861 0.0 0.0 304492 4144 ? Ss 23:29 0:00 postgres: wal writer process
postgres 718862 0.0 0.0 304928 6896 ? Ss 23:29 0:00 postgres: autovacuum launcher process
postgres 718863 0.0 0.0 159744 4156 ? Ss 23:29 0:00 postgres: stats collector process
milovid+ 724277 0.0 0.0 14364 1024 pts/17 S+ 23:41 0:00 grep --color=auto postgres
$ apt list --installed | grep postgres
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
postgresql-10/now 10.16-1.pgdg18.04+1 amd64 [installed,upgradable to: 10.18-1.pgdg18.04+1]
postgresql-11/now 11.11-1.pgdg18.04+1 amd64 [installed,upgradable to: 11.13-1.pgdg18.04+1]
postgresql-11-postgis-3/now 3.1.1+dfsg-1.pgdg18.04+1 amd64 [installed,upgradable to: 3.1.4+dfsg-1.pgdg18.04+1]
postgresql-11-postgis-3-scripts/now 3.1.1+dfsg-1.pgdg18.04+1 all [installed,upgradable to: 3.1.4+dfsg-1.pgdg18.04+1]
postgresql-13/bionic-pgdg,now 13.4-4.pgdg18.04+1 amd64 [installed,automatic]
postgresql-9.5/bionic-pgdg,now 9.5.25-1.pgdg18.04+1 amd64 [installed]
postgresql-9.5-postgis-2.2-scripts/now 2.2.2+dfsg-4.pgdg14.04+1.yandex all [installed,local]
postgresql-client-10/now 10.16-1.pgdg18.04+1 amd64 [installed,upgradable to: 10.18-1.pgdg18.04+1]
postgresql-client-11/now 11.11-1.pgdg18.04+1 amd64 [installed,upgradable to: 11.13-1.pgdg18.04+1]
postgresql-client-13/bionic-pgdg,now 13.4-4.pgdg18.04+1 amd64 [installed,automatic]
postgresql-client-9.5/bionic-pgdg,now 9.5.25-1.pgdg18.04+1 amd64 [installed]
postgresql-client-common/bionic-pgdg,now 231.pgdg18.04+1 all [installed]
postgresql-common/bionic-pgdg,now 231.pgdg18.04+1 all [installed]
timescaledb-2-loader-postgresql-13/bionic,now 2.5.0~ubuntu18.04 amd64 [installed,automatic]
timescaledb-2-postgresql-13/bionic,now 2.5.0~ubuntu18.04 amd64 [installed]
Let's remove all older packages.
sudo apt remove postgresql-10 postgresql-11 postgresql-9.5 postgresql-client-10 postgresql-client-11 postgresql-client-9.5
Just in case:
sudo service postgresql restart
Now it stopped to work:
$ sudo -u postgres psql
psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
Is the server running locally and accepting connections on that socket?
$ sudo -u postgres psql -h localhost
psql: error: connection to server at "localhost" (::1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
But it's running:
$ ps auxw | grep postgres
postgres 726158 0.5 0.5 33991600 730084 ? Ss 23:45 0:00 /usr/lib/postgresql/13/bin/postgres -D /var/lib/postgresql/13/main -c config_file=/etc/postgresql/13/main/postgresql.conf
postgres 726160 0.0 0.0 33991600 4256 ? Ss 23:45 0:00 postgres: 13/main: checkpointer
postgres 726161 0.1 0.1 33991600 150048 ? Ss 23:45 0:00 postgres: 13/main: background writer
postgres 726162 0.0 0.0 33991600 22044 ? Ss 23:45 0:00 postgres: 13/main: walwriter
postgres 726163 0.0 0.0 33992284 6976 ? Ss 23:45 0:00 postgres: 13/main: autovacuum launcher
postgres 726164 0.0 0.0 177920 4384 ? Ss 23:45 0:00 postgres: 13/main: stats collector
postgres 726165 0.0 0.0 33992136 7840 ? Ss 23:45 0:00 postgres: 13/main: TimescaleDB Background Worker Launcher
postgres 726166 0.0 0.0 33992164 7244 ? Ss 23:45 0:00 postgres: 13/main: logical replication launcher
milovid+ 726578 0.0 0.0 14364 1100 pts/17 S+ 23:46 0:00 grep --color=auto postgres
But it does not listen 5432:
$ netstat -n | grep 5432
Let's look at the config:
sudo mcedit /etc/postgresql/13/main/postgresql.conf
# - Connection Settings -
#listen_addresses = 'localhost'
Looks like I need to uncomment it.
sudo service postgresql restart
But it did not help:
$ sudo -u postgres psql -h localhost
psql: error: connection to server at "localhost" (::1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
Let's consult https://stackoverflow.com/questions/31091748/postgres-server-not-listening
It is mentioning some pg_hba.conf. BTW what is HBA*? Let's find this file...
sudo mcedit /etc/postgresql/13/main/pg_hba.conf
* host based authentication rules - it is explained inside this file.
Nothing wrong in this file...
$ sudo service postgresql status
● postgresql.service - PostgreSQL RDBMS
Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
Active: active (exited) since Fri 2021-10-29 23:50:14 MSK; 6min ago
Process: 728545 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
Main PID: 728545 (code=exited, status=0/SUCCESS)
Oct 29 23:50:14 mtlog-perftest03j systemd[1]: postgresql.service: Changed dead -> start
Oct 29 23:50:14 mtlog-perftest03j systemd[1]: Starting PostgreSQL RDBMS...
Oct 29 23:50:14 mtlog-perftest03j systemd[728545]: postgresql.service: Executing: /bin/true
Oct 29 23:50:14 mtlog-perftest03j systemd[1]: postgresql.service: Child 728545 belongs to postgresql.service.
Oct 29 23:50:14 mtlog-perftest03j systemd[1]: postgresql.service: Main process exited, code=exited, status=0/SUCCESS
Oct 29 23:50:14 mtlog-perftest03j systemd[1]: postgresql.service: Changed start -> exited
Oct 29 23:50:14 mtlog-perftest03j systemd[1]: postgresql.service: Job postgresql.service/start finished, result=done
Oct 29 23:50:14 mtlog-perftest03j systemd[1]: Started PostgreSQL RDBMS.
Oct 29 23:50:14 mtlog-perftest03j systemd[1]: postgresql.service: Failed to send unit change signal for postgresql.service: Connection reset by peer
It's quite cryptic. What does it mean "Failed to send unit change signal"? Is it good or bad? What is the "unit"? Maybe it is "SystemD Unit" - the phrase that I've heard many times but don't really understand.
Almost gave up... Wow, I found the culprit! In /etc/postgresql/13/main/postgresql.conf
:
port = 5435
Most likely this has happened, because multiple versions of PostgreSQL were installed.
Let's change to 5432.
sudo mcedit /etc/postgresql/13/main/postgresql.conf
sudo service postgresql restart
But now it does not accept password:
milovidov@mtlog-perftest03j:~$ sudo -u postgres psql -h 127.0.0.1
Password for user postgres:
psql: error: connection to server at "127.0.0.1", port 5432 failed: fe_sendauth: no password supplied
milovidov@mtlog-perftest03j:~$ sudo -u postgres psql -h 127.0.0.1 --password ''
Password:
psql: error: connection to server at "127.0.0.1", port 5432 failed: fe_sendauth: no password supplied
milovidov@mtlog-perftest03j:~$ sudo -u postgres psql -h 127.0.0.1
Password for user postgres:
psql: error: connection to server at "127.0.0.1", port 5432 failed: fe_sendauth: no password supplied
Works this way:
$ sudo -u postgres psql
psql (13.4 (Ubuntu 13.4-4.pgdg18.04+1))
Type "help" for help.
postgres=# \password
Enter new password:
Enter it again:
It works with fine ASCII arc:
postgres=# CREATE database tutorial;
CREATE DATABASE
postgres=# \c tutorial
You are now connected to database "tutorial" as user "postgres".
tutorial=# CREATE EXTENSION IF NOT EXISTS timescaledb;
WARNING:
WELCOME TO
_____ _ _ ____________
|_ _(_) | | | _ \ ___ \
| | _ _ __ ___ ___ ___ ___ __ _| | ___| | | | |_/ /
| | | | _ ` _ \ / _ \/ __|/ __/ _` | |/ _ \ | | | ___ \
| | | | | | | | | __/\__ \ (_| (_| | | __/ |/ /| |_/ /
|_| |_|_| |_| |_|\___||___/\___\__,_|_|\___|___/ \____/
Running version 2.5.0
For more information on TimescaleDB, please visit the following links:
1. Getting started: https://docs.timescale.com/timescaledb/latest/getting-started
2. API reference documentation: https://docs.timescale.com/api/latest
3. How TimescaleDB is designed: https://docs.timescale.com/timescaledb/latest/overview/core-concepts
Note: TimescaleDB collects anonymous reports to better understand and assist our users.
For more information and how to disable, please see our docs https://docs.timescale.com/timescaledb/latest/how-to-guides/configuration/telemetry.
CREATE EXTENSION
Creating Table
Continuing to https://docs.timescale.com/timescaledb/latest/how-to-guides/hypertables/create/
Create table:
CREATE TABLE hits_100m_obfuscated (
WatchID BIGINT,
JavaEnable SMALLINT,
Title TEXT,
GoodEvent SMALLINT,
EventTime TIMESTAMP,
EventDate Date,
CounterID INTEGER,
ClientIP INTEGER,
RegionID INTEGER,
UserID BIGINT,
CounterClass SMALLINT,
OS SMALLINT,
UserAgent SMALLINT,
URL TEXT,
Referer TEXT,
Refresh SMALLINT,
RefererCategoryID SMALLINT,
RefererRegionID INTEGER,
URLCategoryID SMALLINT,
URLRegionID INTEGER,
ResolutionWidth SMALLINT,
ResolutionHeight SMALLINT,
ResolutionDepth SMALLINT,
FlashMajor SMALLINT,
FlashMinor SMALLINT,
FlashMinor2 TEXT,
NetMajor SMALLINT,
NetMinor SMALLINT,
UserAgentMajor SMALLINT,
UserAgentMinor CHAR(2),
CookieEnable SMALLINT,
JavascriptEnable SMALLINT,
IsMobile SMALLINT,
MobilePhone SMALLINT,
MobilePhoneModel TEXT,
Params TEXT,
IPNetworkID INTEGER,
TraficSourceID SMALLINT,
SearchEngineID SMALLINT,
SearchPhrase TEXT,
AdvEngineID SMALLINT,
IsArtifical SMALLINT,
WindowClientWidth SMALLINT,
WindowClientHeight SMALLINT,
ClientTimeZone SMALLINT,
ClientEventTime TIMESTAMP,
SilverlightVersion1 SMALLINT,
SilverlightVersion2 SMALLINT,
SilverlightVersion3 INTEGER,
SilverlightVersion4 SMALLINT,
PageCharset TEXT,
CodeVersion INTEGER,
IsLink SMALLINT,
IsDownload SMALLINT,
IsNotBounce SMALLINT,
FUniqID BIGINT,
OriginalURL TEXT,
HID INTEGER,
IsOldCounter SMALLINT,
IsEvent SMALLINT,
IsParameter SMALLINT,
DontCountHits SMALLINT,
WithHash SMALLINT,
HitColor CHAR,
LocalEventTime TIMESTAMP,
Age SMALLINT,
Sex SMALLINT,
Income SMALLINT,
Interests SMALLINT,
Robotness SMALLINT,
RemoteIP INTEGER,
WindowName INTEGER,
OpenerName INTEGER,
HistoryLength SMALLINT,
BrowserLanguage TEXT,
BrowserCountry TEXT,
SocialNetwork TEXT,
SocialAction TEXT,
HTTPError SMALLINT,
SendTiming INTEGER,
DNSTiming INTEGER,
ConnectTiming INTEGER,
ResponseStartTiming INTEGER,
ResponseEndTiming INTEGER,
FetchTiming INTEGER,
SocialSourceNetworkID SMALLINT,
SocialSourcePage TEXT,
ParamPrice BIGINT,
ParamOrderID TEXT,
ParamCurrency TEXT,
ParamCurrencyID SMALLINT,
OpenstatServiceName TEXT,
OpenstatCampaignID TEXT,
OpenstatAdID TEXT,
OpenstatSourceID TEXT,
UTMSource TEXT,
UTMMedium TEXT,
UTMCampaign TEXT,
UTMContent TEXT,
UTMTerm TEXT,
FromTag TEXT,
HasGCLID SMALLINT,
RefererHash BIGINT,
URLHash BIGINT,
CLID INTEGER
);
I remember PostgreSQL does not support unsigned integers. It also does not support TINYINT. And it does not support zero bytes in TEXT fields. We will deal with it...
tutorial=# SELECT create_hypertable('hits_100m_obfuscated', 'EventTime');
ERROR: column "EventTime" does not exist
WTF?
Maybe it because column names are lowercased?
tutorial=# SELECT create_hypertable('hits_100m_obfuscated', 'eventtime');
NOTICE: adding not-null constraint to column "eventtime"
DETAIL: Time dimensions cannot have NULL values.
create_hypertable
-----------------------------------
(1,public,hits_100m_obfuscated,t)
(1 row)
Looks like I forgot to specify NOT NULL for every column. Let's repeat...
tutorial=# DROP TABLE hits_100m_obfuscated
tutorial-# ;
DROP TABLE
tutorial=# CREATE TABLE hits_100m_obfuscated (
tutorial(# WatchID BIGINT NOT NULL,
tutorial(# JavaEnable SMALLINT NOT NULL,
tutorial(# Title TEXT NOT NULL,
tutorial(# GoodEvent SMALLINT NOT NULL,
tutorial(# EventTime TIMESTAMP NOT NULL,
tutorial(# EventDate Date NOT NULL,
tutorial(# CounterID INTEGER NOT NULL,
tutorial(# ClientIP INTEGER NOT NULL,
tutorial(# RegionID INTEGER NOT NULL,
tutorial(# UserID BIGINT NOT NULL,
tutorial(# CounterClass SMALLINT NOT NULL,
tutorial(# OS SMALLINT NOT NULL,
tutorial(# UserAgent SMALLINT NOT NULL,
tutorial(# URL TEXT NOT NULL,
tutorial(# Referer TEXT NOT NULL,
tutorial(# Refresh SMALLINT NOT NULL,
tutorial(# RefererCategoryID SMALLINT NOT NULL,
tutorial(# RefererRegionID INTEGER NOT NULL,
tutorial(# URLCategoryID SMALLINT NOT NULL,
tutorial(# URLRegionID INTEGER NOT NULL,
tutorial(# ResolutionWidth SMALLINT NOT NULL,
tutorial(# ResolutionHeight SMALLINT NOT NULL,
tutorial(# ResolutionDepth SMALLINT NOT NULL,
tutorial(# FlashMajor SMALLINT NOT NULL,
tutorial(# FlashMinor SMALLINT NOT NULL,
tutorial(# FlashMinor2 TEXT NOT NULL,
tutorial(# NetMajor SMALLINT NOT NULL,
tutorial(# NetMinor SMALLINT NOT NULL,
tutorial(# UserAgentMajor SMALLINT NOT NULL,
tutorial(# UserAgentMinor CHAR(2) NOT NULL,
tutorial(# CookieEnable SMALLINT NOT NULL,
tutorial(# JavascriptEnable SMALLINT NOT NULL,
tutorial(# IsMobile SMALLINT NOT NULL,
tutorial(# MobilePhone SMALLINT NOT NULL,
tutorial(# MobilePhoneModel TEXT NOT NULL,
tutorial(# Params TEXT NOT NULL,
tutorial(# IPNetworkID INTEGER NOT NULL,
tutorial(# TraficSourceID SMALLINT NOT NULL,
tutorial(# SearchEngineID SMALLINT NOT NULL,
tutorial(# SearchPhrase TEXT NOT NULL,
tutorial(# AdvEngineID SMALLINT NOT NULL,
tutorial(# IsArtifical SMALLINT NOT NULL,
tutorial(# WindowClientWidth SMALLINT NOT NULL,
tutorial(# WindowClientHeight SMALLINT NOT NULL,
tutorial(# ClientTimeZone SMALLINT NOT NULL,
tutorial(# ClientEventTime TIMESTAMP NOT NULL,
tutorial(# SilverlightVersion1 SMALLINT NOT NULL,
tutorial(# SilverlightVersion2 SMALLINT NOT NULL,
tutorial(# SilverlightVersion3 INTEGER NOT NULL,
tutorial(# SilverlightVersion4 SMALLINT NOT NULL,
tutorial(# PageCharset TEXT NOT NULL,
tutorial(# CodeVersion INTEGER NOT NULL,
tutorial(# IsLink SMALLINT NOT NULL,
tutorial(# IsDownload SMALLINT NOT NULL,
tutorial(# IsNotBounce SMALLINT NOT NULL,
tutorial(# FUniqID BIGINT NOT NULL,
tutorial(# OriginalURL TEXT NOT NULL,
tutorial(# HID INTEGER NOT NULL,
tutorial(# IsOldCounter SMALLINT NOT NULL,
tutorial(# IsEvent SMALLINT NOT NULL,
tutorial(# IsParameter SMALLINT NOT NULL,
tutorial(# DontCountHits SMALLINT NOT NULL,
tutorial(# WithHash SMALLINT NOT NULL,
tutorial(# HitColor CHAR NOT NULL,
tutorial(# LocalEventTime TIMESTAMP NOT NULL,
tutorial(# Age SMALLINT NOT NULL,
tutorial(# Sex SMALLINT NOT NULL,
tutorial(# Income SMALLINT NOT NULL,
tutorial(# Interests SMALLINT NOT NULL,
tutorial(# Robotness SMALLINT NOT NULL,
tutorial(# RemoteIP INTEGER NOT NULL,
tutorial(# WindowName INTEGER NOT NULL,
tutorial(# OpenerName INTEGER NOT NULL,
tutorial(# HistoryLength SMALLINT NOT NULL,
tutorial(# BrowserLanguage TEXT NOT NULL,
tutorial(# BrowserCountry TEXT NOT NULL,
tutorial(# SocialNetwork TEXT NOT NULL,
tutorial(# SocialAction TEXT NOT NULL,
tutorial(# HTTPError SMALLINT NOT NULL,
tutorial(# SendTiming INTEGER NOT NULL,
tutorial(# DNSTiming INTEGER NOT NULL,
tutorial(# ConnectTiming INTEGER NOT NULL,
tutorial(# ResponseStartTiming INTEGER NOT NULL,
tutorial(# ResponseEndTiming INTEGER NOT NULL,
tutorial(# FetchTiming INTEGER NOT NULL,
tutorial(# SocialSourceNetworkID SMALLINT NOT NULL,
tutorial(# SocialSourcePage TEXT NOT NULL,
tutorial(# ParamPrice BIGINT NOT NULL,
tutorial(# ParamOrderID TEXT NOT NULL,
tutorial(# ParamCurrency TEXT NOT NULL,
tutorial(# ParamCurrencyID SMALLINT NOT NULL,
tutorial(# OpenstatServiceName TEXT NOT NULL,
tutorial(# OpenstatCampaignID TEXT NOT NULL,
tutorial(# OpenstatAdID TEXT NOT NULL,
tutorial(# OpenstatSourceID TEXT NOT NULL,
tutorial(# UTMSource TEXT NOT NULL,
tutorial(# UTMMedium TEXT NOT NULL,
tutorial(# UTMCampaign TEXT NOT NULL,
tutorial(# UTMContent TEXT NOT NULL,
tutorial(# UTMTerm TEXT NOT NULL,
tutorial(# FromTag TEXT NOT NULL,
tutorial(# HasGCLID SMALLINT NOT NULL,
tutorial(# RefererHash BIGINT NOT NULL,
tutorial(# URLHash BIGINT NOT NULL,
tutorial(# CLID INTEGER NOT NULL
tutorial(# );
CREATE TABLE
tutorial=# SELECT create_hypertable('hits_100m_obfuscated', 'eventtime');
create_hypertable
-----------------------------------
(2,public,hits_100m_obfuscated,t)
(1 row)
tutorial=#
Now ok.
Loading Data
Next - importing data: https://docs.timescale.com/timescaledb/latest/how-to-guides/migrate-data/import-csv/#csv-import
SELECT WatchID::Int64, JavaEnable, toValidUTF8(Title), GoodEvent, EventTime, EventDate, CounterID::Int32, ClientIP::Int32, RegionID::Int32, UserID::Int64, CounterClass, OS, UserAgent, toValidUTF8(URL), toValidUTF8(Referer), Refresh, RefererCategoryID::Int16, RefererRegionID::Int32, URLCategoryID::Int16, URLRegionID::Int32, ResolutionWidth::Int16, ResolutionHeight::Int16, ResolutionDepth, FlashMajor, FlashMinor, FlashMinor2, NetMajor, NetMinor, UserAgentMajor::Int16, UserAgentMinor, CookieEnable, JavascriptEnable, IsMobile, MobilePhone, toValidUTF8(MobilePhoneModel), toValidUTF8(Params), IPNetworkID::Int32, TraficSourceID, SearchEngineID::Int16, toValidUTF8(SearchPhrase), AdvEngineID, IsArtifical, WindowClientWidth::Int16, WindowClientHeight::Int16, ClientTimeZone, ClientEventTime, SilverlightVersion1, SilverlightVersion2, SilverlightVersion3::Int32, SilverlightVersion4::Int16, toValidUTF8(PageCharset), CodeVersion::Int32, IsLink, IsDownload, IsNotBounce, FUniqID::Int64, toValidUTF8(OriginalURL), HID::Int32, IsOldCounter, IsEvent, IsParameter, DontCountHits, WithHash, HitColor, LocalEventTime, Age, Sex, Income, Interests::Int16, Robotness, RemoteIP::Int32, WindowName, OpenerName, HistoryLength, BrowserLanguage, BrowserCountry, toValidUTF8(SocialNetwork), toValidUTF8(SocialAction), HTTPError, SendTiming, DNSTiming, ConnectTiming, ResponseStartTiming, ResponseEndTiming, FetchTiming, SocialSourceNetworkID, toValidUTF8(SocialSourcePage), ParamPrice, toValidUTF8(ParamOrderID), ParamCurrency, ParamCurrencyID::Int16, OpenstatServiceName, OpenstatCampaignID, OpenstatAdID, OpenstatSourceID, UTMSource, UTMMedium, UTMCampaign, UTMContent, UTMTerm, FromTag, HasGCLID, RefererHash::Int64, URLHash::Int64, CLID::Int32
FROM hits_100m_obfuscated
INTO OUTFILE 'dump.csv'
FORMAT CSV
https://github.com/ClickHouse/ClickHouse/issues/30872 https://github.com/ClickHouse/ClickHouse/issues/30873
$ wc -c dump.csv
80865718769 dump.csv
milovidov@mtlog-perftest03j:~$ timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV"
panic: could not connect: pq: password authentication failed for user "postgres"
goroutine 12 [running]:
main.processBatches(0xc00001e3c0, 0xc0000a66c0)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:238 +0x887
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
milovidov@mtlog-perftest03j:~$ sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV"
panic: could not connect: pq: password authentication failed for user "postgres"
goroutine 25 [running]:
main.processBatches(0xc00019a350, 0xc00019e660)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:238 +0x887
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
milovidov@mtlog-perftest03j:~$ sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" --host localhost
flag provided but not defined: -host
Usage of timescaledb-parallel-copy:
-batch-size int
Number of rows per insert (default 5000)
-columns string
Comma-separated columns present in CSV
-connection string
PostgreSQL connection url (default "host=localhost user=postgres sslmode=disable")
-copy-options string
Additional options to pass to COPY (e.g., NULL 'NULL') (default "CSV")
-db-name string
Database where the destination table exists
-file string
File to read from rather than stdin
-header-line-count int
Number of header lines (default 1)
-limit int
Number of rows to insert overall; 0 means to insert all
-log-batches
Whether to time individual batches.
-reporting-period duration
Period to report insert stats; if 0s, intermediate results will not be reported
-schema string
Destination table's schema (default "public")
-skip-header
Skip the first line of the input
-split string
Character to split by (default ",")
-table string
Destination table for insertions (default "test_table")
-token-size int
Maximum size to use for tokens. By default, this is 64KB, so any value less than that will be ignored (default 65536)
-truncate
Truncate the destination table before insert
-verbose
Print more information about copying statistics
-version
Show the version of this tool
-workers int
Number of parallel requests to make (default 1)
milovidov@mtlog-perftest03j:~$ sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost'
panic: could not connect: pq: password authentication failed for user "postgres"
goroutine 14 [running]:
main.processBatches(0xc0000183d0, 0xc0000a66c0)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:238 +0x887
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
panic: could not connect: pq: password authentication failed for user "postgres"
goroutine 13 [running]:
main.processBatches(0xc0000183d0, 0xc0000a66c0)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:238 +0x887
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
panic: could not connect: pq: password authentication failed for user "postgres"
goroutine 12 [running]:
main.processBatches(0xc0000183d0, 0xc0000a66c0)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:238 +0x887
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
milovidov@mtlog-perftest03j:~$ sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost password 12345'
panic: could not connect: cannot parse `host=localhost password 12345`: failed to parse as DSN (invalid dsn)
goroutine 13 [running]:
main.processBatches(0xc0000183d0, 0xc0000a66c0)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:238 +0x887
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
milovidov@mtlog-perftest03j:~$ sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost password=12345'
panic: pq: invalid byte sequence for encoding "UTF8": 0xe0 0x22 0x2c
goroutine 34 [running]:
main.processBatches(0xc000132350, 0xc000136660)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
panic: pq: invalid byte sequence for encoding "UTF8": 0xe0 0x22 0x2c
goroutine 30 [running]:
main.processBatches(0xc000132350, 0xc000136660)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
Ok, now I've got something meaningful. But it does not show, what line has error...
$ echo -e '\xe0\x22\x2c'
<0A>",
Let's recreate the dump:
rm dump.csv
SELECT WatchID::Int64, JavaEnable, toValidUTF8(Title), GoodEvent, EventTime, EventDate, CounterID::Int32, ClientIP::Int32, RegionID::Int32,
UserID::Int64, CounterClass, OS, UserAgent, toValidUTF8(URL), toValidUTF8(Referer), Refresh, RefererCategoryID::Int16, RefererRegionID::Int32,
URLCategoryID::Int16, URLRegionID::Int32, ResolutionWidth::Int16, ResolutionHeight::Int16, ResolutionDepth, FlashMajor, FlashMinor,
FlashMinor2, NetMajor, NetMinor, UserAgentMajor::Int16, toValidUTF8(UserAgentMinor::String), CookieEnable, JavascriptEnable, IsMobile, MobilePhone,
toValidUTF8(MobilePhoneModel), toValidUTF8(Params), IPNetworkID::Int32, TraficSourceID, SearchEngineID::Int16, toValidUTF8(SearchPhrase),
AdvEngineID, IsArtifical, WindowClientWidth::Int16, WindowClientHeight::Int16, ClientTimeZone, ClientEventTime,
SilverlightVersion1, SilverlightVersion2, SilverlightVersion3::Int32, SilverlightVersion4::Int16, toValidUTF8(PageCharset),
CodeVersion::Int32, IsLink, IsDownload, IsNotBounce, FUniqID::Int64, toValidUTF8(OriginalURL), HID::Int32, IsOldCounter, IsEvent,
IsParameter, DontCountHits, WithHash, toValidUTF8(HitColor::String), LocalEventTime, Age, Sex, Income, Interests::Int16, Robotness, RemoteIP::Int32,
WindowName, OpenerName, HistoryLength, toValidUTF8(BrowserLanguage::String), toValidUTF8(BrowserCountry::String),
toValidUTF8(SocialNetwork), toValidUTF8(SocialAction),
HTTPError, SendTiming, DNSTiming, ConnectTiming, ResponseStartTiming, ResponseEndTiming, FetchTiming, SocialSourceNetworkID,
toValidUTF8(SocialSourcePage), ParamPrice, toValidUTF8(ParamOrderID), toValidUTF8(ParamCurrency::String),
ParamCurrencyID::Int16, OpenstatServiceName, OpenstatCampaignID, OpenstatAdID, OpenstatSourceID,
UTMSource, UTMMedium, UTMCampaign, UTMContent, UTMTerm, FromTag, HasGCLID, RefererHash::Int64, URLHash::Int64, CLID::Int32
FROM hits_100m_obfuscated
INTO OUTFILE 'dump.csv'
FORMAT CSV
$ sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 1 --copy-options "CSV" -connection 'host=localhost password=12345'
panic: pq: value too long for type character(2)
goroutine 6 [running]:
main.processBatches(0xc0000183d0, 0xc0000a66c0)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
ALTER does not work:
tutorial=# ALTER TABLE hits_100m_obfuscated MODIFY COLUMN UserAgentMinor TEXT
tutorial-# ;
ERROR: syntax error at or near "MODIFY"
LINE 1: ALTER TABLE hits_100m_obfuscated MODIFY COLUMN UserAgentMino...
^
PostgreSQL is using unusual syntax for ALTER:
tutorial=# ALTER TABLE hits_100m_obfuscated ALTER COLUMN UserAgentMinor TYPE TEXT
;
ALTER TABLE
tutorial=# \q
https://github.com/ClickHouse/ClickHouse/issues/30874
Now something again:
$ sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 1 --copy-options "CSV" -connection 'host=localhost password=12345'
panic: pq: value "2149615427" is out of range for type integer
goroutine 6 [running]:
main.processBatches(0xc0000183d0, 0xc0000a66c0)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
$ grep -F '2149615427' dump.csv
5607505572457935073,0,"Лазар автоматические пылесосы подробная школы. Когалерея — Курсы на Автория пище Сноудента новые устами",1,"2013-07-15 07:47:45","2013-07-15",38,-1194330980,229,-6649844357037090659,0,2,3,"https://produkty%2Fkategory_id=&auto-nexus.html?blockfesty-i-korroszhego","http://tambov.irr.ua/yandex.ru/saledParam=0&user/auto.ria",1,10282,995,15014,519,1996,1781,23,14,2,"800",0,0,7,"D<>",1,1,0,0,"","",3392210,-1,0,"",0,0,1261,1007,135,"2013-07-15 21:54:13",0,0,0,0,"windows-1251;charset",1601,0,0,0,8184671896482443026,"",451733382,0,0,0,0,0,"5","2013-07-15 15:41:14",31,1,3,60,13,-1855237933,-1,-1,-1,"S0","h1","","",0,0,0,0,2149615427,36,3,0,"",0,"","NH",0,"","","","","","","","","","",0,-1103774879459415602,-2414747266057209563,0
^C
Let's recreate the dump:
rm dump.csv
SELECT WatchID::Int64, JavaEnable, toValidUTF8(Title), GoodEvent, EventTime, EventDate, CounterID::Int32, ClientIP::Int32, RegionID::Int32,
UserID::Int64, CounterClass, OS, UserAgent, toValidUTF8(URL), toValidUTF8(Referer), Refresh, RefererCategoryID::Int16, RefererRegionID::Int32,
URLCategoryID::Int16, URLRegionID::Int32, ResolutionWidth::Int16, ResolutionHeight::Int16, ResolutionDepth, FlashMajor, FlashMinor,
FlashMinor2, NetMajor, NetMinor, UserAgentMajor::Int16, toValidUTF8(UserAgentMinor::String), CookieEnable, JavascriptEnable, IsMobile, MobilePhone,
toValidUTF8(MobilePhoneModel), toValidUTF8(Params), IPNetworkID::Int32, TraficSourceID, SearchEngineID::Int16, toValidUTF8(SearchPhrase),
AdvEngineID, IsArtifical, WindowClientWidth::Int16, WindowClientHeight::Int16, ClientTimeZone, ClientEventTime,
SilverlightVersion1, SilverlightVersion2, SilverlightVersion3::Int32, SilverlightVersion4::Int16, toValidUTF8(PageCharset),
CodeVersion::Int32, IsLink, IsDownload, IsNotBounce, FUniqID::Int64, toValidUTF8(OriginalURL), HID::Int32, IsOldCounter, IsEvent,
IsParameter, DontCountHits, WithHash, toValidUTF8(HitColor::String), LocalEventTime, Age, Sex, Income, Interests::Int16, Robotness, RemoteIP::Int32,
WindowName, OpenerName, HistoryLength, toValidUTF8(BrowserLanguage::String), toValidUTF8(BrowserCountry::String),
toValidUTF8(SocialNetwork), toValidUTF8(SocialAction),
HTTPError, least(SendTiming, 30000), least(DNSTiming, 30000), least(ConnectTiming, 30000), least(ResponseStartTiming, 30000),
least(ResponseEndTiming, 30000), least(FetchTiming, 30000), SocialSourceNetworkID,
toValidUTF8(SocialSourcePage), ParamPrice, toValidUTF8(ParamOrderID), toValidUTF8(ParamCurrency::String),
ParamCurrencyID::Int16, OpenstatServiceName, OpenstatCampaignID, OpenstatAdID, OpenstatSourceID,
UTMSource, UTMMedium, UTMCampaign, UTMContent, UTMTerm, FromTag, HasGCLID, RefererHash::Int64, URLHash::Int64, CLID::Int32
FROM hits_100m_obfuscated
INTO OUTFILE 'dump.csv'
FORMAT CSV
PostgreSQL does not support USE database.
But I remember, that I can write \c
instead. I guess \c
means "change" (the database). Or it is called "schema" or "catalog".
milovidov@mtlog-perftest03j:~$ sudo -u postgres psql
psql (13.4 (Ubuntu 13.4-4.pgdg18.04+1))
Type "help" for help.
postgres=# SELECT count(*) FROM hits_100m_obfuscated;
ERROR: relation "hits_100m_obfuscated" does not exist
LINE 1: SELECT count(*) FROM hits_100m_obfuscated;
^
postgres=# USE tutorial;
ERROR: syntax error at or near "USE"
LINE 1: USE tutorial;
^
postgres=# \c tutorial
You are now connected to database "tutorial" as user "postgres".
tutorial=# SELECT count(*) FROM hits_100m_obfuscated;
count
-------
69996
(1 row)
And parallel loader already loaded some part of data into my table (it is not transactional). Let's truncate table:
tutorial=# TRUNCATE TABLE hits_100m_obfuscated;
TRUNCATE TABLE
Surprisingly, it works!
Now it started loading data:
$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost password=12345'
But the loading is not using 16 CPU cores and it is not bottlenecked by IO.
WTF:
$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost password=12345'
panic: pq: could not extend file "base/16384/31264.1": wrote only 4096 of 8192 bytes at block 145407
goroutine 6 [running]:
main.processBatches(0xc0000183d0, 0xc0000a66c0)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
real 3m31.328s
user 0m35.016s
sys 0m6.964s
Looks like there is no space:
milovidov@mtlog-perftest03j:~$ df -h /var/lib/postgresql/13/main
Filesystem Size Used Avail Use% Mounted on
/dev/md1 35G 33G 1.4G 97% /
https://github.com/ClickHouse/ClickHouse/issues/30883
Let's move to another device.
milovidov@mtlog-perftest03j:~$ sudo mkdir /opt/postgresql
milovidov@mtlog-perftest03j:~$ sudo ls -l /var/lib/postgresql/13/main
total 88
drwx------ 6 postgres postgres 4096 Oct 30 00:06 base
drwx------ 2 postgres postgres 4096 Oct 30 02:07 global
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_commit_ts
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_dynshmem
drwx------ 4 postgres postgres 4096 Oct 30 02:10 pg_logical
drwx------ 4 postgres postgres 4096 Oct 29 23:27 pg_multixact
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_notify
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_replslot
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_serial
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_snapshots
drwx------ 2 postgres postgres 4096 Oct 30 02:10 pg_stat
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_stat_tmp
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_subtrans
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_tblspc
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_twophase
-rw------- 1 postgres postgres 3 Oct 29 23:27 PG_VERSION
drwx------ 3 postgres postgres 12288 Oct 30 02:10 pg_wal
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_xact
-rw------- 1 postgres postgres 88 Oct 29 23:27 postgresql.auto.conf
-rw------- 1 postgres postgres 130 Oct 30 00:03 postmaster.opts
milovidov@mtlog-perftest03j:~$ sudo chown postgres:postgres /opt/postgresql
milovidov@mtlog-perftest03j:~$ sudo mv /var/lib/postgresql/13/main/* /opt/postgresql
mv: cannot stat '/var/lib/postgresql/13/main/*': No such file or directory
milovidov@mtlog-perftest03j:~$ sudo bash -c 'mv /var/lib/postgresql/13/main/* /opt/postgresql'
sudo ln milovidov@mtlog-perftest03j:~$ #sudo ln -s /opt/postgresql /var/lib/postgresql/13/main
milovidov@mtlog-perftest03j:~$ sudo rm /var/lib/postgresql/13/main
rm: cannot remove '/var/lib/postgresql/13/main': Is a directory
milovidov@mtlog-perftest03j:~$ sudo rm -rf /var/lib/postgresql/13/main
milovidov@mtlog-perftest03j:~$ sudo ln -s /opt/postgresql /var/lib/postgresql/13/main
milovidov@mtlog-perftest03j:~$ sudo ls -l /var/lib/postgresql/13/main
lrwxrwxrwx 1 root root 15 Oct 30 02:12 /var/lib/postgresql/13/main -> /opt/postgresql
milovidov@mtlog-perftest03j:~$ sudo ls -l /opt/postgresql/
total 80
drwx------ 6 postgres postgres 4096 Oct 30 00:06 base
drwx------ 2 postgres postgres 4096 Oct 30 02:07 global
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_commit_ts
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_dynshmem
drwx------ 4 postgres postgres 4096 Oct 30 02:10 pg_logical
drwx------ 4 postgres postgres 4096 Oct 29 23:27 pg_multixact
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_notify
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_replslot
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_serial
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_snapshots
drwx------ 2 postgres postgres 4096 Oct 30 02:10 pg_stat
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_stat_tmp
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_subtrans
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_tblspc
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_twophase
-rw------- 1 postgres postgres 3 Oct 29 23:27 PG_VERSION
drwx------ 3 postgres postgres 4096 Oct 30 02:10 pg_wal
drwx------ 2 postgres postgres 4096 Oct 29 23:27 pg_xact
-rw------- 1 postgres postgres 88 Oct 29 23:27 postgresql.auto.conf
-rw------- 1 postgres postgres 130 Oct 30 00:03 postmaster.opts
sudo service postgresql start
sudo less /var/log/postgresql/postgresql-13-main.log
2021-10-30 02:13:41.284 MSK [791362] FATAL: data directory "/var/lib/postgresql/13/main" has invalid permissions
2021-10-30 02:13:41.284 MSK [791362] DETAIL: Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).
pg_ctl: could not start server
Examine the log output.
sudo chmod 0700 /var/lib/postgresql/13/main /opt/postgresql
sudo service postgresql start
postgres=# \c tutorial
You are now connected to database "tutorial" as user "postgres".
tutorial=# TRUNCATE TABLE hits_100m_obfuscated;
TRUNCATE TABLE
$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost password=12345'
No success:
$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost password=12345'
panic: pq: invalid byte sequence for encoding "UTF8": 0x00
goroutine 29 [running]:
main.processBatches(0xc000132350, 0xc000136660)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
real 11m47.879s
user 3m10.980s
sys 0m45.256s
The error message is false, because UTF-8 does support 0x00. It is just some PostgreSQL quirk.
Let's recreate the dump:
rm dump.csv
SELECT WatchID::Int64, JavaEnable, replaceAll(toValidUTF8(Title), '\0', ''), GoodEvent, EventTime, EventDate, CounterID::Int32, ClientIP::Int32, RegionID::Int32,
UserID::Int64, CounterClass, OS, UserAgent, replaceAll(toValidUTF8(URL), '\0', ''), replaceAll(toValidUTF8(Referer), '\0', ''), Refresh, RefererCategoryID::Int16, RefererRegionID::Int32,
URLCategoryID::Int16, URLRegionID::Int32, ResolutionWidth::Int16, ResolutionHeight::Int16, ResolutionDepth, FlashMajor, FlashMinor,
FlashMinor2, NetMajor, NetMinor, UserAgentMajor::Int16, replaceAll(toValidUTF8(UserAgentMinor::String), '\0', ''), CookieEnable, JavascriptEnable, IsMobile, MobilePhone,
replaceAll(toValidUTF8(MobilePhoneModel), '\0', ''), replaceAll(toValidUTF8(Params), '\0', ''), IPNetworkID::Int32, TraficSourceID, SearchEngineID::Int16, replaceAll(toValidUTF8(SearchPhrase), '\0', ''),
AdvEngineID, IsArtifical, WindowClientWidth::Int16, WindowClientHeight::Int16, ClientTimeZone, ClientEventTime,
SilverlightVersion1, SilverlightVersion2, SilverlightVersion3::Int32, SilverlightVersion4::Int16, replaceAll(toValidUTF8(PageCharset), '\0', ''),
CodeVersion::Int32, IsLink, IsDownload, IsNotBounce, FUniqID::Int64, replaceAll(toValidUTF8(OriginalURL), '\0', ''), HID::Int32, IsOldCounter, IsEvent,
IsParameter, DontCountHits, WithHash, replaceAll(toValidUTF8(HitColor::String), '\0', ''), LocalEventTime, Age, Sex, Income, Interests::Int16, Robotness, RemoteIP::Int32,
WindowName, OpenerName, HistoryLength, replaceAll(toValidUTF8(BrowserLanguage::String), '\0', ''), replaceAll(toValidUTF8(BrowserCountry::String), '\0', ''),
replaceAll(toValidUTF8(SocialNetwork), '\0', ''), replaceAll(toValidUTF8(SocialAction), '\0', ''),
HTTPError, least(SendTiming, 30000), least(DNSTiming, 30000), least(ConnectTiming, 30000), least(ResponseStartTiming, 30000),
least(ResponseEndTiming, 30000), least(FetchTiming, 30000), SocialSourceNetworkID,
replaceAll(toValidUTF8(SocialSourcePage), '\0', ''), ParamPrice, replaceAll(toValidUTF8(ParamOrderID), '\0', ''), replaceAll(toValidUTF8(ParamCurrency::String), '\0', ''),
ParamCurrencyID::Int16, OpenstatServiceName, OpenstatCampaignID, OpenstatAdID, OpenstatSourceID,
UTMSource, UTMMedium, UTMCampaign, UTMContent, UTMTerm, FromTag, HasGCLID, RefererHash::Int64, URLHash::Int64, CLID::Int32
FROM hits_100m_obfuscated
INTO OUTFILE 'dump.csv'
FORMAT CSV
WTF:
tutorial=# SELECT count(*) FROM hits_100m_obfuscated;
ERROR: could not load library "/usr/lib/postgresql/13/lib/llvmjit.so": libLLVM-6.0.so.1: cannot open shared object file: No such file or directory
Maybe just install LLVM?
sudo apt install llvm
It does not help:
milovidov@mtlog-perftest03j:~$ sudo -u postgres psql
psql (13.4 (Ubuntu 13.4-4.pgdg18.04+1))
Type "help" for help.
postgres=# \c tutorial
You are now connected to database "tutorial" as user "postgres".
tutorial=# SELECT count(*) FROM hits_100m_obfuscated;
ERROR: could not load library "/usr/lib/postgresql/13/lib/llvmjit.so": libLLVM-6.0.so.1: cannot open shared object file: No such file or directory
tutorial=#
Dependency on system libraries is harmful.
milovidov@mtlog-perftest03j:~$ ls -l /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so
lrwxrwxrwx 1 root root 16 Apr 6 2018 /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so -> libLLVM-6.0.so.1
milovidov@mtlog-perftest03j:~$ ls -l /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1
ls: cannot access '/usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1': No such file or directory
https://askubuntu.com/questions/481/how-do-i-find-the-package-that-provides-a-file
milovidov@mtlog-perftest03j:~$ dpkg -S libLLVM-6.0.so.1
llvm-6.0-dev: /usr/lib/llvm-6.0/lib/libLLVM-6.0.so.1
libllvm6.0:amd64: /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1
Wow, it's absolutely broken:
milovidov@mtlog-perftest03j:~$ sudo apt remove llvm-6.0-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
libcgal13 libgmpxx4ldbl liblldb-11 libprotobuf-c1 libsfcgal1 mysql-server-core-5.7
Use 'sudo apt autoremove' to remove them.
The following packages will be REMOVED:
liblld-6.0-dev lld lld-6.0 llvm-6.0-dev
0 upgraded, 0 newly installed, 4 to remove and 293 not upgraded.
After this operation, 163 MB disk space will be freed.
Do you want to continue? [Y/n]
(Reading database ... 268641 files and directories currently installed.)
Removing liblld-6.0-dev (1:6.0-1ubuntu2) ...
Removing lld (1:6.0-41~exp5~ubuntu1) ...
Removing lld-6.0 (1:6.0-1ubuntu2) ...
Removing llvm-6.0-dev (1:6.0-1ubuntu2) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Processing triggers for libc-bin (2.27-3ubuntu1.4) ...
milovidov@mtlog-perftest03j:~$ sudo apt install llvm-6.0-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
libcgal13 libgmpxx4ldbl liblldb-11 libprotobuf-c1 libsfcgal1 mysql-server-core-5.7
Use 'sudo apt autoremove' to remove them.
The following NEW packages will be installed:
llvm-6.0-dev
0 upgraded, 1 newly installed, 0 to remove and 293 not upgraded.
Need to get 23.0 MB of archives.
After this operation, 160 MB of additional disk space will be used.
Get:1 http://mirror.yandex.ru/ubuntu bionic/main amd64 llvm-6.0-dev amd64 1:6.0-1ubuntu2 [23.0 MB]
Fetched 23.0 MB in 1s (42.5 MB/s)
Selecting previously unselected package llvm-6.0-dev.
(Reading database ... 267150 files and directories currently installed.)
Preparing to unpack .../llvm-6.0-dev_1%3a6.0-1ubuntu2_amd64.deb ...
Unpacking llvm-6.0-dev (1:6.0-1ubuntu2) ...
Setting up llvm-6.0-dev (1:6.0-1ubuntu2) ...
Processing triggers for libc-bin (2.27-3ubuntu1.4) ...
milovidov@mtlog-perftest03j:~$ ls -l /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so
lrwxrwxrwx 1 root root 16 Apr 6 2018 /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so -> libLLVM-6.0.so.1
milovidov@mtlog-perftest03j:~$ ls -l /usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1
ls: cannot access '/usr/lib/x86_64-linux-gnu/libLLVM-6.0.so.1': No such file or directory
Let's remove just in case:
sudo apt remove llvm-6.0-dev
https://dba.stackexchange.com/questions/264955/handling-performance-problems-with-jit-in-postgres-12
JIT can be disabled by set jit = off;
tutorial=# set jit = off;
SET
tutorial=#
tutorial=# SELECT count(*) FROM hits_100m_obfuscated;
But now this SELECT query started and hanged for multiple minutes without any result.
And I see something strange in top
:
792553 postgres 20 0 32.418g 0.031t 0.031t D 2.4 25.3 3:43.84 postgres: 13/main: checkpointer
814659 postgres 20 0 32.432g 0.023t 0.023t D 0.0 18.8 0:14.53 postgres: 13/main: parallel worker for PID 813980
813980 postgres 20 0 32.433g 0.023t 0.023t D 0.0 18.4 0:14.47 postgres: 13/main: postgres tutorial [local] SELECT
814657 postgres 20 0 32.432g 0.016t 0.016t D 0.0 12.6 0:09.83 postgres: 13/main: parallel worker for PID 813980
814658 postgres 20 0 32.432g 0.015t 0.015t D 2.4 12.6 0:09.45 postgres: 13/main: parallel worker for PID 813980
814656 postgres 20 0 32.432g 0.015t 0.015t D 0.0 12.0 0:07.36 postgres: 13/main: parallel worker for PID 813980
792554 postgres 20 0 32.417g 5.394g 5.392g D 0.0 4.3 0:04.78 postgres: 13/main: background writer
The query did not finish in 30 minutes. How it can be so enormously slow?
Loading failed, again:
$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost password=12345'
panic: pq: extra data after last expected column
goroutine 14 [running]:
main.processBatches(0xc0000183d0, 0xc0000a66c0)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
real 20m57.936s
user 4m14.444s
sys 1m11.412s
Most likely PostgreSQL cannot recognize proper CSV escaping of quotes like "Hello "" world"
.
Let's simply remove all double quotes from String values.
rm dump.csv
SELECT WatchID::Int64, JavaEnable, replaceAll(replaceAll(toValidUTF8(Title), '\0', ''), '"', ''), GoodEvent, EventTime, EventDate, CounterID::Int32, ClientIP::Int32, RegionID::Int32,
UserID::Int64, CounterClass, OS, UserAgent, replaceAll(replaceAll(toValidUTF8(URL), '\0', ''), '"', ''), replaceAll(replaceAll(toValidUTF8(Referer), '\0', ''), '"', ''), Refresh, RefererCategoryID::Int16, RefererRegionID::Int32,
URLCategoryID::Int16, URLRegionID::Int32, ResolutionWidth::Int16, ResolutionHeight::Int16, ResolutionDepth, FlashMajor, FlashMinor,
FlashMinor2, NetMajor, NetMinor, UserAgentMajor::Int16, replaceAll(replaceAll(toValidUTF8(UserAgentMinor::String), '\0', ''), '"', ''), CookieEnable, JavascriptEnable, IsMobile, MobilePhone,
replaceAll(replaceAll(toValidUTF8(MobilePhoneModel), '\0', ''), '"', ''), replaceAll(replaceAll(toValidUTF8(Params), '\0', ''), '"', ''), IPNetworkID::Int32, TraficSourceID, SearchEngineID::Int16, replaceAll(replaceAll(toValidUTF8(SearchPhrase), '\0', ''), '"', ''),
AdvEngineID, IsArtifical, WindowClientWidth::Int16, WindowClientHeight::Int16, ClientTimeZone, ClientEventTime,
SilverlightVersion1, SilverlightVersion2, SilverlightVersion3::Int32, SilverlightVersion4::Int16, replaceAll(replaceAll(toValidUTF8(PageCharset), '\0', ''), '"', ''),
CodeVersion::Int32, IsLink, IsDownload, IsNotBounce, FUniqID::Int64, replaceAll(replaceAll(toValidUTF8(OriginalURL), '\0', ''), '"', ''), HID::Int32, IsOldCounter, IsEvent,
IsParameter, DontCountHits, WithHash, replaceAll(replaceAll(toValidUTF8(HitColor::String), '\0', ''), '"', ''), LocalEventTime, Age, Sex, Income, Interests::Int16, Robotness, RemoteIP::Int32,
WindowName, OpenerName, HistoryLength, replaceAll(replaceAll(toValidUTF8(BrowserLanguage::String), '\0', ''), '"', ''), replaceAll(replaceAll(toValidUTF8(BrowserCountry::String), '\0', ''), '"', ''),
replaceAll(replaceAll(toValidUTF8(SocialNetwork), '\0', ''), '"', ''), replaceAll(replaceAll(toValidUTF8(SocialAction), '\0', ''), '"', ''),
HTTPError, least(SendTiming, 30000), least(DNSTiming, 30000), least(ConnectTiming, 30000), least(ResponseStartTiming, 30000),
least(ResponseEndTiming, 30000), least(FetchTiming, 30000), SocialSourceNetworkID,
replaceAll(replaceAll(toValidUTF8(SocialSourcePage), '\0', ''), '"', ''), ParamPrice, replaceAll(replaceAll(toValidUTF8(ParamOrderID), '\0', ''), '"', ''), replaceAll(replaceAll(toValidUTF8(ParamCurrency::String), '\0', ''), '"', ''),
ParamCurrencyID::Int16, OpenstatServiceName, OpenstatCampaignID, OpenstatAdID, OpenstatSourceID,
UTMSource, UTMMedium, UTMCampaign, UTMContent, UTMTerm, FromTag, HasGCLID, RefererHash::Int64, URLHash::Int64, CLID::Int32
FROM hits_100m_obfuscated
INTO OUTFILE 'dump.csv'
FORMAT CSV
Oops, another trouble:
$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost password=12345'
panic: pq: unterminated CSV quoted field
goroutine 19 [running]:
main.processBatches(0xc000132350, 0xc000136660)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
real 0m38.278s
user 0m13.544s
sys 0m3.552s
I have hypothesis, maybe it is interpreting both backslashes and quotes in CSV? We need to check, what is CSV, exactly, from TimescaleDB's standpoint.
https://www.postgresql.org/docs/9.2/sql-copy.html
Yes, PostgreSQL is using "fake CSV":
This format option is used for importing and exporting the Comma Separated Value (CSV) file format used by many other programs, such as spreadsheets. Instead of the escaping rules used by PostgreSQL's standard text format, it produces and recognizes the common CSV escaping mechanism.
The values in each record are separated by the DELIMITER character. If the value contains the delimiter character, the QUOTE character, the NULL string, a carriage return, or line feed character, then the whole value is prefixed and suffixed by the QUOTE character, and any occurrence within the value of a QUOTE character or the ESCAPE character is preceded by the escape character.
So, it looks like CSV but is using C-style backslash escapes inside values. Let's remove both backslash and quote from our strings to make PostgreSQL happy.
rm dump.csv
SELECT WatchID::Int64, JavaEnable, replaceAll(replaceAll(replaceAll(toValidUTF8(Title), '\0', ''), '"', ''), '\\', ''), GoodEvent, EventTime, EventDate, CounterID::Int32, ClientIP::Int32, RegionID::Int32,
UserID::Int64, CounterClass, OS, UserAgent, replaceAll(replaceAll(replaceAll(toValidUTF8(URL), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(Referer), '\0', ''), '"', ''), '\\', ''), Refresh, RefererCategoryID::Int16, RefererRegionID::Int32,
URLCategoryID::Int16, URLRegionID::Int32, ResolutionWidth::Int16, ResolutionHeight::Int16, ResolutionDepth, FlashMajor, FlashMinor,
FlashMinor2, NetMajor, NetMinor, UserAgentMajor::Int16, replaceAll(replaceAll(replaceAll(toValidUTF8(UserAgentMinor::String), '\0', ''), '"', ''), '\\', ''), CookieEnable, JavascriptEnable, IsMobile, MobilePhone,
replaceAll(replaceAll(replaceAll(toValidUTF8(MobilePhoneModel), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(Params), '\0', ''), '"', ''), '\\', ''), IPNetworkID::Int32, TraficSourceID, SearchEngineID::Int16, replaceAll(replaceAll(replaceAll(toValidUTF8(SearchPhrase), '\0', ''), '"', ''), '\\', ''),
AdvEngineID, IsArtifical, WindowClientWidth::Int16, WindowClientHeight::Int16, ClientTimeZone, ClientEventTime,
SilverlightVersion1, SilverlightVersion2, SilverlightVersion3::Int32, SilverlightVersion4::Int16, replaceAll(replaceAll(replaceAll(toValidUTF8(PageCharset), '\0', ''), '"', ''), '\\', ''),
CodeVersion::Int32, IsLink, IsDownload, IsNotBounce, FUniqID::Int64, replaceAll(replaceAll(replaceAll(toValidUTF8(OriginalURL), '\0', ''), '"', ''), '\\', ''), HID::Int32, IsOldCounter, IsEvent,
IsParameter, DontCountHits, WithHash, replaceAll(replaceAll(replaceAll(toValidUTF8(HitColor::String), '\0', ''), '"', ''), '\\', ''), LocalEventTime, Age, Sex, Income, Interests::Int16, Robotness, RemoteIP::Int32,
WindowName, OpenerName, HistoryLength, replaceAll(replaceAll(replaceAll(toValidUTF8(BrowserLanguage::String), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(BrowserCountry::String), '\0', ''), '"', ''), '\\', ''),
replaceAll(replaceAll(replaceAll(toValidUTF8(SocialNetwork), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(SocialAction), '\0', ''), '"', ''), '\\', ''),
HTTPError, least(SendTiming, 30000), least(DNSTiming, 30000), least(ConnectTiming, 30000), least(ResponseStartTiming, 30000),
least(ResponseEndTiming, 30000), least(FetchTiming, 30000), SocialSourceNetworkID,
replaceAll(replaceAll(replaceAll(toValidUTF8(SocialSourcePage), '\0', ''), '"', ''), '\\', ''), ParamPrice, replaceAll(replaceAll(replaceAll(toValidUTF8(ParamOrderID), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(ParamCurrency::String), '\0', ''), '"', ''), '\\', ''),
ParamCurrencyID::Int16, OpenstatServiceName, OpenstatCampaignID, OpenstatAdID, OpenstatSourceID,
UTMSource, UTMMedium, UTMCampaign, UTMContent, UTMTerm, FromTag, HasGCLID, RefererHash::Int64, URLHash::Int64, CLID::Int32
FROM hits_100m_obfuscated
INTO OUTFILE 'dump.csv'
FORMAT CSV
It does not work at all:
$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.csv --workers 16 --copy-options "CSV" -connection 'host=localhost password=12345'
panic: pq: invalid input syntax for type bigint: " ПЕСНЮ ПРЕСТИВАРКЕ ДОЛЖНО ЛИ,1,306,31432,304,22796,1011,879,37,15,5,700.224,2,7,13,D<>,1,1,0,0,",",3039109,-1,0,",0,0,779,292,135,2013-07-31 09:37:12,0,0,0,0,windows,1,0,0,0,6888403766694734958,http%3A//maps&sort_order_Kurzarm_DOB&sr=http%3A%2F%3Fpage=/ok.html?1=1&cid=577&oki=1&op_seo_entry=&op_uid=13225;IC"
goroutine 20 [running]:
main.processBatches(0xc0000183d0, 0xc0000a66c0)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
real 1m47.915s
user 0m33.676s
sys 0m8.028s
Maybe let's switch from CSV to TSV that PostgreSQL seems to understand better.
SELECT WatchID::Int64, JavaEnable, replaceAll(replaceAll(replaceAll(toValidUTF8(Title), '\0', ''), '"', ''), '\\', ''), GoodEvent, EventTime, EventDate, CounterID::Int32, ClientIP::Int32, RegionID::Int32,
UserID::Int64, CounterClass, OS, UserAgent, replaceAll(replaceAll(replaceAll(toValidUTF8(URL), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(Referer), '\0', ''), '"', ''), '\\', ''), Refresh, RefererCategoryID::Int16, RefererRegionID::Int32,
URLCategoryID::Int16, URLRegionID::Int32, ResolutionWidth::Int16, ResolutionHeight::Int16, ResolutionDepth, FlashMajor, FlashMinor,
FlashMinor2, NetMajor, NetMinor, UserAgentMajor::Int16, replaceAll(replaceAll(replaceAll(toValidUTF8(UserAgentMinor::String), '\0', ''), '"', ''), '\\', ''), CookieEnable, JavascriptEnable, IsMobile, MobilePhone,
replaceAll(replaceAll(replaceAll(toValidUTF8(MobilePhoneModel), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(Params), '\0', ''), '"', ''), '\\', ''), IPNetworkID::Int32, TraficSourceID, SearchEngineID::Int16, replaceAll(replaceAll(replaceAll(toValidUTF8(SearchPhrase), '\0', ''), '"', ''), '\\', ''),
AdvEngineID, IsArtifical, WindowClientWidth::Int16, WindowClientHeight::Int16, ClientTimeZone, ClientEventTime,
SilverlightVersion1, SilverlightVersion2, SilverlightVersion3::Int32, SilverlightVersion4::Int16, replaceAll(replaceAll(replaceAll(toValidUTF8(PageCharset), '\0', ''), '"', ''), '\\', ''),
CodeVersion::Int32, IsLink, IsDownload, IsNotBounce, FUniqID::Int64, replaceAll(replaceAll(replaceAll(toValidUTF8(OriginalURL), '\0', ''), '"', ''), '\\', ''), HID::Int32, IsOldCounter, IsEvent,
IsParameter, DontCountHits, WithHash, replaceAll(replaceAll(replaceAll(toValidUTF8(HitColor::String), '\0', ''), '"', ''), '\\', ''), LocalEventTime, Age, Sex, Income, Interests::Int16, Robotness, RemoteIP::Int32,
WindowName, OpenerName, HistoryLength, replaceAll(replaceAll(replaceAll(toValidUTF8(BrowserLanguage::String), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(BrowserCountry::String), '\0', ''), '"', ''), '\\', ''),
replaceAll(replaceAll(replaceAll(toValidUTF8(SocialNetwork), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(SocialAction), '\0', ''), '"', ''), '\\', ''),
HTTPError, least(SendTiming, 30000), least(DNSTiming, 30000), least(ConnectTiming, 30000), least(ResponseStartTiming, 30000),
least(ResponseEndTiming, 30000), least(FetchTiming, 30000), SocialSourceNetworkID,
replaceAll(replaceAll(replaceAll(toValidUTF8(SocialSourcePage), '\0', ''), '"', ''), '\\', ''), ParamPrice, replaceAll(replaceAll(replaceAll(toValidUTF8(ParamOrderID), '\0', ''), '"', ''), '\\', ''), replaceAll(replaceAll(replaceAll(toValidUTF8(ParamCurrency::String), '\0', ''), '"', ''), '\\', ''),
ParamCurrencyID::Int16, OpenstatServiceName, OpenstatCampaignID, OpenstatAdID, OpenstatSourceID,
UTMSource, UTMMedium, UTMCampaign, UTMContent, UTMTerm, FromTag, HasGCLID, RefererHash::Int64, URLHash::Int64, CLID::Int32
FROM hits_100m_obfuscated
INTO OUTFILE 'dump.tsv'
FORMAT TSV
But how to pass TSV to timescaledb-parallel-copy
tool?
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --workers 16 -connection 'host=localhost password=12345' panic: pq: invalid input syntax for type bigint: "9076997425961590393\t0\tКино\t1\t2013-07-06 17:47:29\t2013-07-06\t225510\t-1056921538\t229\t3467937489264290637\t0\t2\t3\thttp://liver.ru/belgorod/page/1006.jки/доп_приборы\thttp://video.yandex.ru/1.561.540.000703/?order_Kurzarm_alia\t0\t16124\t20\t14328\t22\t1638\t1658\t23\t15\t7\t700\t0\t0\t17\tD<74>\t1\t1\t0\t0\t\t\t2095433\t-1\t0\t\t0\t1\t1369\t713\t135\t2013-07-06 16:25:42\t0\t0\t0\t0\twindows\t1601\t0\t0\t0\t5566829288329160346\t\t940752990\t0\t0\t0\t0\t0\t5\t2013-07-06 01:32:13\t55\t2\t3\t0\t2\t-1352932082\t-1\t-1\t-1\tS0\t<>\\f\t\t\t0\t0\t0\t0\t0\t0\t0\t0\t\t0\t\tNH\t0\t\t\t\t\t\t\t\t\t\t\t0\t6811023348165660452\t7011450103338277684\t0"
goroutine 20 [running]:
main.processBatches(0xc0000183d0, 0xc0000a66c0)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
real 0m0.304s
user 0m0.044s
sys 0m0.044s
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --copy-options "TEXT" --workers 16 -connection 'host=localhost password=12345'
panic: pq: syntax error at or near "TEXT"
goroutine 18 [running]:
main.processBatches(0xc0000183d0, 0xc0000a66c0)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
real 0m0.044s
user 0m0.048s
sys 0m0.036s
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --copy-options "text" --workers 16 -connection 'host=localhost password=12345'
panic: pq: syntax error at or near "text"
goroutine 18 [running]:
main.processBatches(0xc0000183d0, 0xc0000a66c0)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
panic: pq: syntax error at or near "text"
goroutine 19 [running]:
main.processBatches(0xc0000183d0, 0xc0000a66c0)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
real 0m0.057s
user 0m0.060s
sys 0m0.028s
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --copy-options "Text" --workers 16 -connection 'host=localhost password=12345'
panic: pq: syntax error at or near "Text"
goroutine 11 [running]:
main.processBatches(0xc0000183d0, 0xc0000a66c0)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
real 0m0.041s
user 0m0.052s
sys 0m0.032s
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --copy-options "FORMAT text" --workers 16 -connection 'host=localhost password=12345'
panic: pq: syntax error at or near "FORMAT"
goroutine 21 [running]:
main.processBatches(0xc00019a350, 0xc00019e660)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
real 0m0.045s
user 0m0.052s
sys 0m0.028s
Nothing works:
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --help
Usage of timescaledb-parallel-copy:
-batch-size int
Number of rows per insert (default 5000)
-columns string
Comma-separated columns present in CSV
-connection string
PostgreSQL connection url (default "host=localhost user=postgres sslmode=disable")
-copy-options string
Additional options to pass to COPY (e.g., NULL 'NULL') (default "CSV")
-db-name string
Database where the destination table exists
-file string
File to read from rather than stdin
-header-line-count int
Number of header lines (default 1)
-limit int
Number of rows to insert overall; 0 means to insert all
-log-batches
Whether to time individual batches.
-reporting-period duration
Period to report insert stats; if 0s, intermediate results will not be reported
-schema string
Destination table's schema (default "public")
-skip-header
Skip the first line of the input
-split string
Character to split by (default ",")
-table string
Destination table for insertions (default "test_table")
-token-size int
Maximum size to use for tokens. By default, this is 64KB, so any value less than that will be ignored (default 65536)
-truncate
Truncate the destination table before insert
-verbose
Print more information about copying statistics
-version
Show the version of this tool
-workers int
Number of parallel requests to make (default 1)
real 0m0.009s
user 0m0.004s
sys 0m0.000s
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --truncate --copy-options "" --workers 16 -connection 'host=localhost password=12345'
panic: pq: invalid input syntax for type bigint: "9076997425961590393 0 Кино 1 2013-07-06 17:47:29 2013-07-06 225510 -1056921538 229 3467937489264290637 0 2 3http://liver.ru/belgorod/page/1006.jки/доп_приборы http://video.yandex.ru/1.561.540.000703/?order_Kurzarm_alia 0 16124 20 14328 22 1638 1658 23 15 7 700 0017 D<> 1 1 0 0 2095433 -1 0 0 1 1369 713 135 2013-07-06 16:25:42 0 0 0 0 windows 1601 000 5566829288329160346 940752990 0 0 0 0 0 5 2013-07-06 01:32:13 55 2 3 0 2 -1352932082 -1 -1 -1 S0<53>\f 0 0 0 0 0 0 0 0 0 NH 0 06811023348165660452 7011450103338277684 0"
goroutine 13 [running]:
main.processBatches(0xc000019140, 0xc0001eb080)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
real 0m0.191s
user 0m0.036s
sys 0m0.040s
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --truncate --copy-options "NULL AS '\N'" --workers 16 -connection 'host=localhost password=12345'
panic: pq: invalid input syntax for type bigint: "9076997425961590393 0 Кино 1 2013-07-06 17:47:29 2013-07-06 225510 -1056921538 229 3467937489264290637 0 2 3http://liver.ru/belgorod/page/1006.jки/доп_приборы http://video.yandex.ru/1.561.540.000703/?order_Kurzarm_alia 0 16124 20 14328 22 1638 1658 23 15 7 700 0017 D<> 1 1 0 0 2095433 -1 0 0 1 1369 713 135 2013-07-06 16:25:42 0 0 0 0 windows 1601 000 5566829288329160346 940752990 0 0 0 0 0 5 2013-07-06 01:32:13 55 2 3 0 2 -1352932082 -1 -1 -1 S0<53>\f 0 0 0 0 0 0 0 0 0 NH 0 06811023348165660452 7011450103338277684 0"
goroutine 11 [running]:
main.processBatches(0xc000018900, 0xc0002886c0)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
real 0m0.187s
user 0m0.020s
sys 0m0.048s
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --truncate --copy-options "DELIMITER AS '\t'" --workers 16 -connection 'host=localhost password=12345'
panic: pq: conflicting or redundant options
goroutine 13 [running]:
main.processBatches(0xc000019140, 0xc0001e9080)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
real 0m0.196s
user 0m0.048s
sys 0m0.020s
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --truncate --copy-options "TEXT DELIMITER AS '\t'" --workers 16 -connection 'host=localhost password=12345'
panic: pq: syntax error at or near "TEXT"
goroutine 22 [running]:
main.processBatches(0xc000019140, 0xc0001e9080)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
panic: pq: syntax error at or near "TEXT"
goroutine 11 [running]:
main.processBatches(0xc000019140, 0xc0001e9080)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
real 0m0.191s
user 0m0.032s
sys 0m0.036s
milovidov@mtlog-perftest03j:~$ time sudo -u postgres timescaledb-parallel-copy --db-name tutorial --table hits_100m_obfuscated --file dump.tsv --truncate --copy-options "DELIMITER AS e'\t'" --workers 16 -connection 'host=localhost password=12345'
panic: pq: conflicting or redundant options
goroutine 26 [running]:
main.processBatches(0xc0001330d0, 0xc0001e3020)
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:262 +0x879
created by main.main
/home/builder/go/src/github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy/main.go:148 +0x1bb
real 0m0.169s
user 0m0.056s
sys 0m0.016s
I will try to avoid timescaledb-parallel-copy
and use psql
instead.
milovidov@mtlog-perftest03j:~$ sudo -u postgres psql
psql (13.4 (Ubuntu 13.4-4.pgdg18.04+1))
Type "help" for help.
postgres=# \c tutorial
You are now connected to database "tutorial" as user "postgres".
tutorial=# timing
tutorial-# COPY hits_100m_obfuscated FROM 'dump.tsv'
tutorial-# ;
ERROR: syntax error at or near "timing"
LINE 1: timing
^
tutorial=# \timing
Timing is on.
tutorial=# COPY hits_100m_obfuscated FROM 'dump.tsv';
ERROR: could not open file "dump.tsv" for reading: No such file or directory
HINT: COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility such as psql's \copy.
Time: 4.348 ms
tutorial=# \copy hits_100m_obfuscated FROM 'dump.tsv';
It started to do something... fairly slow with using less than one CPU core.
Folks from TimescaleDB always recommend to enable compression, which is not by default. Let's read about it:
https://docs.timescale.com/timescaledb/latest/how-to-guides/compression/
We strongly recommend that you understand how compression works before you start enabling it on your hypertables.
The amount of hackery to overcome PostgreSQL limitations is overwhelming:
When compression is enabled, TimescaleDB converts data stored in many rows into an array. This means that instead of using lots of rows to store the data, it stores the same data in a single row.
In the meantime, copy finished in "just" 1.5 hours, 19 245 rows/second. This is extremely slow, even for single core.
tutorial=# \copy hits_100m_obfuscated FROM 'dump.tsv';
COPY 100000000
Time: 5195909.154 ms (01:26:35.909)
Running Benchmark
Let's prepare for benchmark... What is needed to execute single query in batch mode?
man psql
sudo -u postgres psql tutorial -t -c '\timing' -c 'SELECT 1' | grep 'Time'
Now we are ready to run our benchmark.
PostgreSQL does not have SHOW PROCESSLIST
.
It has select * from pg_stat_activity;
instead.
https://ma.ttias.be/show-full-processlist-equivalent-of-mysql-for-postgresql/
But it does not show query progress.
The first query SELECT count(*) FROM hits_100m_obfuscated
just hanged. It reads something from disk...
Let's check the data volume:
$ sudo du -hcs /opt/postgresql/
68G /opt/postgresql/
Looks consistent for uncompressed data.
./benchmark.sh
grep -oP 'Time: \d+' log | grep -oP '\d+' | awk '{ if (n % 3 == 0) { printf("[") }; ++n; printf("%g", $1 / 1000); if (n % 3 == 0) { printf("],\n") } else { printf(", ") } }'
Now let's enable compression.
ALTER TABLE hits_100m_obfuscated SET (timescaledb.compress);
SELECT add_compression_policy('hits_100m_obfuscated', INTERVAL '0 seconds');
milovidov@mtlog-perftest03j:~ClickHouse/benchmark/timescaledb$ sudo -u postgres psql tutorial
psql (13.4 (Ubuntu 13.4-4.pgdg18.04+1))
Type "help" for help.
tutorial=# ALTER TABLE hits_100m_obfuscated SET (timescaledb.compress);
ALTER TABLE
tutorial=# SELECT add_compression_policy('hits_100m_obfuscated', INTERVAL '0 seconds');
add_compression_policy
------------------------
1000
(1 row)
Ok, in top
I see that it started compression with using single CPU core.
300464 postgres 20 0 32.456g 932044 911452 D 48.0 0.7 1:08.11 postgres: 13/main: Compression Policy [1000]
Let's also define better order of data:
ALTER TABLE hits_100m_obfuscated
SET (timescaledb.compress,
timescaledb.compress_orderby = 'counterid, userid, event_time');
The query hanged. Maybe it's waiting for finish of previous compression?
After several minutes it answered:
ERROR: cannot change configuration on already compressed chunks
DETAIL: There are compressed chunks that prevent changing the existing compression configuration.
Ok, at least some of the chunks will have the proper order.
After a few hours looks like the compression finished.
sudo ncdu /var/lib/postgresql/13/main/
28.9 GiB [##########] /base
Yes, looks like it's compressed. About two times - not too much.
Let's rerun the benchmark.
Ok, it's slightly faster.