Introduction refactoring + a bunch of docs fixes (#8010)

* Create SECURITY.md

* [experimental] auto-mark documentation PRs with labels

* revert #6544

* Sync RPM packages instructions to other docs languages

* Move tutorial to documentation with old content (for now)

* refactor installation guide a bit

* add ../en/getting_started/index.md

* Rename ya_metrica_task.md

* Rename ya_metrica_task.md

* Refactor Yandex.Metrica dataset description

* WIP on rewriting tutorial

* tmp commit

* lots of docs fixes

* partially revert c136bee4ce

* try to fix docs build in CI

* try to fix docs build in CI

* few minor improvements

* Quick refactoring of last portion of tutorial (not thoroughly tested though)

* fix link
This commit is contained in:
Ivan Blinkov 2019-12-05 19:36:51 +03:00 committed by GitHub
parent 852e891499
commit 387cbca505
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
223 changed files with 1623 additions and 1476 deletions

View File

@ -1,51 +1,62 @@
# Anonymized Yandex.Metrica Data
Dataset consists of two tables containing anonymized data about hits (`hits_v1`) and visits (`visits_v1`) of Yandex.Metrica. Each of the tables can be downloaded as a compressed `tsv.xz` file or as prepared partitions. In addition to that, an extended version of the `hits` table containing 100 million rows is available as [TSV](https://clickhouse-datasets.s3.yandex.net/hits/tsv/hits_100m_obfuscated_v1.tsv.xz) and as [prepared partitions](https://clickhouse-datasets.s3.yandex.net/hits/partitions/hits_100m_obfuscated_v1.tar.xz).
Dataset consists of two tables containing anonymized data about hits (`hits_v1`) and visits (`visits_v1`) of Yandex.Metrica. You can read more about Yandex.Metrica in [ClickHouse history](../../introduction/history.md) section.
The dataset consists of two tables, either of them can be downloaded as a compressed `tsv.xz` file or as prepared partitions. In addition to that, an extended version of the `hits` table containing 100 million rows is available as TSV at <https://clickhouse-datasets.s3.yandex.net/hits/tsv/hits_100m_obfuscated_v1.tsv.xz> and as prepared partitions at <https://clickhouse-datasets.s3.yandex.net/hits/partitions/hits_100m_obfuscated_v1.tar.xz>.
## Obtaining Tables from Prepared Partitions
**Download and import hits:**
```bash
$ curl -O https://clickhouse-datasets.s3.yandex.net/hits/partitions/hits_v1.tar
$ tar xvf hits_v1.tar -C /var/lib/clickhouse # path to ClickHouse data directory
$ # check permissions on unpacked data, fix if required
$ sudo service clickhouse-server restart
$ clickhouse-client --query "SELECT COUNT(*) FROM datasets.hits_v1"
Download and import hits table:
``` bash
curl -O https://clickhouse-datasets.s3.yandex.net/hits/partitions/hits_v1.tar
tar xvf hits_v1.tar -C /var/lib/clickhouse # path to ClickHouse data directory
# check permissions on unpacked data, fix if required
sudo service clickhouse-server restart
clickhouse-client --query "SELECT COUNT(*) FROM datasets.hits_v1"
```
**Download and import visits:**
```bash
$ curl -O https://clickhouse-datasets.s3.yandex.net/visits/partitions/visits_v1.tar
$ tar xvf visits_v1.tar -C /var/lib/clickhouse # path to ClickHouse data directory
$ # check permissions on unpacked data, fix if required
$ sudo service clickhouse-server restart
$ clickhouse-client --query "SELECT COUNT(*) FROM datasets.visits_v1"
Download and import visits:
``` bash
curl -O https://clickhouse-datasets.s3.yandex.net/visits/partitions/visits_v1.tar
tar xvf visits_v1.tar -C /var/lib/clickhouse # path to ClickHouse data directory
# check permissions on unpacked data, fix if required
sudo service clickhouse-server restart
clickhouse-client --query "SELECT COUNT(*) FROM datasets.visits_v1"
```
## Obtaining Tables from Compressed tsv-file
**Download and import hits from compressed tsv-file**
```bash
$ curl https://clickhouse-datasets.s3.yandex.net/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
$ # now create table
$ clickhouse-client --query "CREATE DATABASE IF NOT EXISTS datasets"
$ clickhouse-client --query "CREATE TABLE datasets.hits_v1 ( WatchID UInt64, JavaEnable UInt8, Title String, GoodEvent Int16, EventTime DateTime, EventDate Date, CounterID UInt32, ClientIP UInt32, ClientIP6 FixedString(16), RegionID UInt32, UserID UInt64, CounterClass Int8, OS UInt8, UserAgent UInt8, URL String, Referer String, URLDomain String, RefererDomain String, Refresh UInt8, IsRobot UInt8, RefererCategories Array(UInt16), URLCategories Array(UInt16), URLRegions Array(UInt32), RefererRegions Array(UInt32), ResolutionWidth UInt16, ResolutionHeight UInt16, ResolutionDepth UInt8, FlashMajor UInt8, FlashMinor UInt8, FlashMinor2 String, NetMajor UInt8, NetMinor UInt8, UserAgentMajor UInt16, UserAgentMinor FixedString(2), CookieEnable UInt8, JavascriptEnable UInt8, IsMobile UInt8, MobilePhone UInt8, MobilePhoneModel String, Params String, IPNetworkID UInt32, TraficSourceID Int8, SearchEngineID UInt16, SearchPhrase String, AdvEngineID UInt8, IsArtifical UInt8, WindowClientWidth UInt16, WindowClientHeight UInt16, ClientTimeZone Int16, ClientEventTime DateTime, SilverlightVersion1 UInt8, SilverlightVersion2 UInt8, SilverlightVersion3 UInt32, SilverlightVersion4 UInt16, PageCharset String, CodeVersion UInt32, IsLink UInt8, IsDownload UInt8, IsNotBounce UInt8, FUniqID UInt64, HID UInt32, IsOldCounter UInt8, IsEvent UInt8, IsParameter UInt8, DontCountHits UInt8, WithHash UInt8, HitColor FixedString(1), UTCEventTime DateTime, Age UInt8, Sex UInt8, Income UInt8, Interests UInt16, Robotness UInt8, GeneralInterests Array(UInt16), RemoteIP UInt32, RemoteIP6 FixedString(16), WindowName Int32, OpenerName Int32, HistoryLength Int16, BrowserLanguage FixedString(2), BrowserCountry FixedString(2), SocialNetwork String, SocialAction String, HTTPError UInt16, SendTiming Int32, DNSTiming Int32, ConnectTiming Int32, ResponseStartTiming Int32, ResponseEndTiming Int32, FetchTiming Int32, RedirectTiming Int32, DOMInteractiveTiming Int32, DOMContentLoadedTiming Int32, DOMCompleteTiming Int32, LoadEventStartTiming Int32, LoadEventEndTiming Int32, NSToDOMContentLoadedTiming Int32, FirstPaintTiming Int32, RedirectCount Int8, SocialSourceNetworkID UInt8, SocialSourcePage String, ParamPrice Int64, ParamOrderID String, ParamCurrency FixedString(3), ParamCurrencyID UInt16, GoalsReached Array(UInt32), OpenstatServiceName String, OpenstatCampaignID String, OpenstatAdID String, OpenstatSourceID String, UTMSource String, UTMMedium String, UTMCampaign String, UTMContent String, UTMTerm String, FromTag String, HasGCLID UInt8, RefererHash UInt64, URLHash UInt64, CLID UInt32, YCLID UInt64, ShareService String, ShareURL String, ShareTitle String, ParsedParams Nested(Key1 String, Key2 String, Key3 String, Key4 String, Key5 String, ValueDouble Float64), IslandID FixedString(16), RequestNum UInt32, RequestTry UInt8) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192"
$ # import data
$ cat hits_v1.tsv | clickhouse-client --query "INSERT INTO datasets.hits_v1 FORMAT TSV" --max_insert_block_size=100000
$ # optionally you can optimize table
$ clickhouse-client --query "OPTIMIZE TABLE datasets.hits_v1 FINAL"
$ clickhouse-client --query "SELECT COUNT(*) FROM datasets.hits_v1"
## Obtaining Tables from Compressed TSV File
Download and import hits from compressed TSV file:
``` bash
curl https://clickhouse-datasets.s3.yandex.net/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
# now create table
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS datasets"
clickhouse-client --query "CREATE TABLE datasets.hits_v1 ( WatchID UInt64, JavaEnable UInt8, Title String, GoodEvent Int16, EventTime DateTime, EventDate Date, CounterID UInt32, ClientIP UInt32, ClientIP6 FixedString(16), RegionID UInt32, UserID UInt64, CounterClass Int8, OS UInt8, UserAgent UInt8, URL String, Referer String, URLDomain String, RefererDomain String, Refresh UInt8, IsRobot UInt8, RefererCategories Array(UInt16), URLCategories Array(UInt16), URLRegions Array(UInt32), RefererRegions Array(UInt32), ResolutionWidth UInt16, ResolutionHeight UInt16, ResolutionDepth UInt8, FlashMajor UInt8, FlashMinor UInt8, FlashMinor2 String, NetMajor UInt8, NetMinor UInt8, UserAgentMajor UInt16, UserAgentMinor FixedString(2), CookieEnable UInt8, JavascriptEnable UInt8, IsMobile UInt8, MobilePhone UInt8, MobilePhoneModel String, Params String, IPNetworkID UInt32, TraficSourceID Int8, SearchEngineID UInt16, SearchPhrase String, AdvEngineID UInt8, IsArtifical UInt8, WindowClientWidth UInt16, WindowClientHeight UInt16, ClientTimeZone Int16, ClientEventTime DateTime, SilverlightVersion1 UInt8, SilverlightVersion2 UInt8, SilverlightVersion3 UInt32, SilverlightVersion4 UInt16, PageCharset String, CodeVersion UInt32, IsLink UInt8, IsDownload UInt8, IsNotBounce UInt8, FUniqID UInt64, HID UInt32, IsOldCounter UInt8, IsEvent UInt8, IsParameter UInt8, DontCountHits UInt8, WithHash UInt8, HitColor FixedString(1), UTCEventTime DateTime, Age UInt8, Sex UInt8, Income UInt8, Interests UInt16, Robotness UInt8, GeneralInterests Array(UInt16), RemoteIP UInt32, RemoteIP6 FixedString(16), WindowName Int32, OpenerName Int32, HistoryLength Int16, BrowserLanguage FixedString(2), BrowserCountry FixedString(2), SocialNetwork String, SocialAction String, HTTPError UInt16, SendTiming Int32, DNSTiming Int32, ConnectTiming Int32, ResponseStartTiming Int32, ResponseEndTiming Int32, FetchTiming Int32, RedirectTiming Int32, DOMInteractiveTiming Int32, DOMContentLoadedTiming Int32, DOMCompleteTiming Int32, LoadEventStartTiming Int32, LoadEventEndTiming Int32, NSToDOMContentLoadedTiming Int32, FirstPaintTiming Int32, RedirectCount Int8, SocialSourceNetworkID UInt8, SocialSourcePage String, ParamPrice Int64, ParamOrderID String, ParamCurrency FixedString(3), ParamCurrencyID UInt16, GoalsReached Array(UInt32), OpenstatServiceName String, OpenstatCampaignID String, OpenstatAdID String, OpenstatSourceID String, UTMSource String, UTMMedium String, UTMCampaign String, UTMContent String, UTMTerm String, FromTag String, HasGCLID UInt8, RefererHash UInt64, URLHash UInt64, CLID UInt32, YCLID UInt64, ShareService String, ShareURL String, ShareTitle String, ParsedParams Nested(Key1 String, Key2 String, Key3 String, Key4 String, Key5 String, ValueDouble Float64), IslandID FixedString(16), RequestNum UInt32, RequestTry UInt8) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192"
# import data
cat hits_v1.tsv | clickhouse-client --query "INSERT INTO datasets.hits_v1 FORMAT TSV" --max_insert_block_size=100000
# optionally you can optimize table
clickhouse-client --query "OPTIMIZE TABLE datasets.hits_v1 FINAL"
clickhouse-client --query "SELECT COUNT(*) FROM datasets.hits_v1"
```
**Download and import visits from compressed tsv-file**
```bash
$ curl https://clickhouse-datasets.s3.yandex.net/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv
$ # now create table
$ clickhouse-client --query "CREATE DATABASE IF NOT EXISTS datasets"
$ clickhouse-client --query "CREATE TABLE datasets.visits_v1 ( CounterID UInt32, StartDate Date, Sign Int8, IsNew UInt8, VisitID UInt64, UserID UInt64, StartTime DateTime, Duration UInt32, UTCStartTime DateTime, PageViews Int32, Hits Int32, IsBounce UInt8, Referer String, StartURL String, RefererDomain String, StartURLDomain String, EndURL String, LinkURL String, IsDownload UInt8, TraficSourceID Int8, SearchEngineID UInt16, SearchPhrase String, AdvEngineID UInt8, PlaceID Int32, RefererCategories Array(UInt16), URLCategories Array(UInt16), URLRegions Array(UInt32), RefererRegions Array(UInt32), IsYandex UInt8, GoalReachesDepth Int32, GoalReachesURL Int32, GoalReachesAny Int32, SocialSourceNetworkID UInt8, SocialSourcePage String, MobilePhoneModel String, ClientEventTime DateTime, RegionID UInt32, ClientIP UInt32, ClientIP6 FixedString(16), RemoteIP UInt32, RemoteIP6 FixedString(16), IPNetworkID UInt32, SilverlightVersion3 UInt32, CodeVersion UInt32, ResolutionWidth UInt16, ResolutionHeight UInt16, UserAgentMajor UInt16, UserAgentMinor UInt16, WindowClientWidth UInt16, WindowClientHeight UInt16, SilverlightVersion2 UInt8, SilverlightVersion4 UInt16, FlashVersion3 UInt16, FlashVersion4 UInt16, ClientTimeZone Int16, OS UInt8, UserAgent UInt8, ResolutionDepth UInt8, FlashMajor UInt8, FlashMinor UInt8, NetMajor UInt8, NetMinor UInt8, MobilePhone UInt8, SilverlightVersion1 UInt8, Age UInt8, Sex UInt8, Income UInt8, JavaEnable UInt8, CookieEnable UInt8, JavascriptEnable UInt8, IsMobile UInt8, BrowserLanguage UInt16, BrowserCountry UInt16, Interests UInt16, Robotness UInt8, GeneralInterests Array(UInt16), Params Array(String), Goals Nested(ID UInt32, Serial UInt32, EventTime DateTime, Price Int64, OrderID String, CurrencyID UInt32), WatchIDs Array(UInt64), ParamSumPrice Int64, ParamCurrency FixedString(3), ParamCurrencyID UInt16, ClickLogID UInt64, ClickEventID Int32, ClickGoodEvent Int32, ClickEventTime DateTime, ClickPriorityID Int32, ClickPhraseID Int32, ClickPageID Int32, ClickPlaceID Int32, ClickTypeID Int32, ClickResourceID Int32, ClickCost UInt32, ClickClientIP UInt32, ClickDomainID UInt32, ClickURL String, ClickAttempt UInt8, ClickOrderID UInt32, ClickBannerID UInt32, ClickMarketCategoryID UInt32, ClickMarketPP UInt32, ClickMarketCategoryName String, ClickMarketPPName String, ClickAWAPSCampaignName String, ClickPageName String, ClickTargetType UInt16, ClickTargetPhraseID UInt64, ClickContextType UInt8, ClickSelectType Int8, ClickOptions String, ClickGroupBannerID Int32, OpenstatServiceName String, OpenstatCampaignID String, OpenstatAdID String, OpenstatSourceID String, UTMSource String, UTMMedium String, UTMCampaign String, UTMContent String, UTMTerm String, FromTag String, HasGCLID UInt8, FirstVisit DateTime, PredLastVisit Date, LastVisit Date, TotalVisits UInt32, TraficSource Nested(ID Int8, SearchEngineID UInt16, AdvEngineID UInt8, PlaceID UInt16, SocialSourceNetworkID UInt8, Domain String, SearchPhrase String, SocialSourcePage String), Attendance FixedString(16), CLID UInt32, YCLID UInt64, NormalizedRefererHash UInt64, SearchPhraseHash UInt64, RefererDomainHash UInt64, NormalizedStartURLHash UInt64, StartURLDomainHash UInt64, NormalizedEndURLHash UInt64, TopLevelDomain UInt64, URLScheme UInt64, OpenstatServiceNameHash UInt64, OpenstatCampaignIDHash UInt64, OpenstatAdIDHash UInt64, OpenstatSourceIDHash UInt64, UTMSourceHash UInt64, UTMMediumHash UInt64, UTMCampaignHash UInt64, UTMContentHash UInt64, UTMTermHash UInt64, FromHash UInt64, WebVisorEnabled UInt8, WebVisorActivity UInt32, ParsedParams Nested(Key1 String, Key2 String, Key3 String, Key4 String, Key5 String, ValueDouble Float64), Market Nested(Type UInt8, GoalID UInt32, OrderID String, OrderPrice Int64, PP UInt32, DirectPlaceID UInt32, DirectOrderID UInt32, DirectBannerID UInt32, GoodID String, GoodName String, GoodQuantity Int32, GoodPrice Int64), IslandID FixedString(16)) ENGINE = CollapsingMergeTree(StartDate, intHash32(UserID), (CounterID, StartDate, intHash32(UserID), VisitID), 8192, Sign)"
$ # import data
$ cat visits_v1.tsv | clickhouse-client --query "INSERT INTO datasets.visits_v1 FORMAT TSV" --max_insert_block_size=100000
$ # optionally you can optimize table
$ clickhouse-client --query "OPTIMIZE TABLE datasets.visits_v1 FINAL"
$ clickhouse-client --query "SELECT COUNT(*) FROM datasets.visits_v1"
Download and import visits from compressed tsv-file:
``` bash
curl https://clickhouse-datasets.s3.yandex.net/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv
# now create table
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS datasets"
clickhouse-client --query "CREATE TABLE datasets.visits_v1 ( CounterID UInt32, StartDate Date, Sign Int8, IsNew UInt8, VisitID UInt64, UserID UInt64, StartTime DateTime, Duration UInt32, UTCStartTime DateTime, PageViews Int32, Hits Int32, IsBounce UInt8, Referer String, StartURL String, RefererDomain String, StartURLDomain String, EndURL String, LinkURL String, IsDownload UInt8, TraficSourceID Int8, SearchEngineID UInt16, SearchPhrase String, AdvEngineID UInt8, PlaceID Int32, RefererCategories Array(UInt16), URLCategories Array(UInt16), URLRegions Array(UInt32), RefererRegions Array(UInt32), IsYandex UInt8, GoalReachesDepth Int32, GoalReachesURL Int32, GoalReachesAny Int32, SocialSourceNetworkID UInt8, SocialSourcePage String, MobilePhoneModel String, ClientEventTime DateTime, RegionID UInt32, ClientIP UInt32, ClientIP6 FixedString(16), RemoteIP UInt32, RemoteIP6 FixedString(16), IPNetworkID UInt32, SilverlightVersion3 UInt32, CodeVersion UInt32, ResolutionWidth UInt16, ResolutionHeight UInt16, UserAgentMajor UInt16, UserAgentMinor UInt16, WindowClientWidth UInt16, WindowClientHeight UInt16, SilverlightVersion2 UInt8, SilverlightVersion4 UInt16, FlashVersion3 UInt16, FlashVersion4 UInt16, ClientTimeZone Int16, OS UInt8, UserAgent UInt8, ResolutionDepth UInt8, FlashMajor UInt8, FlashMinor UInt8, NetMajor UInt8, NetMinor UInt8, MobilePhone UInt8, SilverlightVersion1 UInt8, Age UInt8, Sex UInt8, Income UInt8, JavaEnable UInt8, CookieEnable UInt8, JavascriptEnable UInt8, IsMobile UInt8, BrowserLanguage UInt16, BrowserCountry UInt16, Interests UInt16, Robotness UInt8, GeneralInterests Array(UInt16), Params Array(String), Goals Nested(ID UInt32, Serial UInt32, EventTime DateTime, Price Int64, OrderID String, CurrencyID UInt32), WatchIDs Array(UInt64), ParamSumPrice Int64, ParamCurrency FixedString(3), ParamCurrencyID UInt16, ClickLogID UInt64, ClickEventID Int32, ClickGoodEvent Int32, ClickEventTime DateTime, ClickPriorityID Int32, ClickPhraseID Int32, ClickPageID Int32, ClickPlaceID Int32, ClickTypeID Int32, ClickResourceID Int32, ClickCost UInt32, ClickClientIP UInt32, ClickDomainID UInt32, ClickURL String, ClickAttempt UInt8, ClickOrderID UInt32, ClickBannerID UInt32, ClickMarketCategoryID UInt32, ClickMarketPP UInt32, ClickMarketCategoryName String, ClickMarketPPName String, ClickAWAPSCampaignName String, ClickPageName String, ClickTargetType UInt16, ClickTargetPhraseID UInt64, ClickContextType UInt8, ClickSelectType Int8, ClickOptions String, ClickGroupBannerID Int32, OpenstatServiceName String, OpenstatCampaignID String, OpenstatAdID String, OpenstatSourceID String, UTMSource String, UTMMedium String, UTMCampaign String, UTMContent String, UTMTerm String, FromTag String, HasGCLID UInt8, FirstVisit DateTime, PredLastVisit Date, LastVisit Date, TotalVisits UInt32, TraficSource Nested(ID Int8, SearchEngineID UInt16, AdvEngineID UInt8, PlaceID UInt16, SocialSourceNetworkID UInt8, Domain String, SearchPhrase String, SocialSourcePage String), Attendance FixedString(16), CLID UInt32, YCLID UInt64, NormalizedRefererHash UInt64, SearchPhraseHash UInt64, RefererDomainHash UInt64, NormalizedStartURLHash UInt64, StartURLDomainHash UInt64, NormalizedEndURLHash UInt64, TopLevelDomain UInt64, URLScheme UInt64, OpenstatServiceNameHash UInt64, OpenstatCampaignIDHash UInt64, OpenstatAdIDHash UInt64, OpenstatSourceIDHash UInt64, UTMSourceHash UInt64, UTMMediumHash UInt64, UTMCampaignHash UInt64, UTMContentHash UInt64, UTMTermHash UInt64, FromHash UInt64, WebVisorEnabled UInt8, WebVisorActivity UInt32, ParsedParams Nested(Key1 String, Key2 String, Key3 String, Key4 String, Key5 String, ValueDouble Float64), Market Nested(Type UInt8, GoalID UInt32, OrderID String, OrderPrice Int64, PP UInt32, DirectPlaceID UInt32, DirectOrderID UInt32, DirectBannerID UInt32, GoodID String, GoodName String, GoodQuantity Int32, GoodPrice Int64), IslandID FixedString(16)) ENGINE = CollapsingMergeTree(StartDate, intHash32(UserID), (CounterID, StartDate, intHash32(UserID), VisitID), 8192, Sign)"
# import data
cat visits_v1.tsv | clickhouse-client --query "INSERT INTO datasets.visits_v1 FORMAT TSV" --max_insert_block_size=100000
# optionally you can optimize table
clickhouse-client --query "OPTIMIZE TABLE datasets.visits_v1 FINAL"
clickhouse-client --query "SELECT COUNT(*) FROM datasets.visits_v1"
```
## Queries
Examples of queries to these tables (they are named `test.hits` and `test.visits`) can be found among [stateful tests](https://github.com/ClickHouse/ClickHouse/tree/master/dbms/tests/queries/1_stateful) and in some [performance tests](https://github.com/ClickHouse/ClickHouse/tree/master/dbms/tests/performance) of ClickHouse.
## Example Queries
[ClickHouse tutorial](../../getting_started/tutorial.md) is based on Yandex.Metrica dataset and the recommended way to get started with this dataset is to just go through tutorial.
Additional examples of queries to these tables can be found among [stateful tests](https://github.com/yandex/ClickHouse/tree/master/dbms/tests/queries/1_stateful) of ClickHouse (they are named `test.hists` and `test.visits` there).

View File

@ -1,147 +1,8 @@
# Getting Started
## System Requirements
ClickHouse can run on any Linux, FreeBSD or Mac OS X with x86\_64 CPU architecture.
Though pre-built binaries are typically compiled to leverage SSE 4.2 instruction set, so unless otherwise stated usage of CPU that supports it becomes an additional system requirement. Here's the command to check if current CPU has support for SSE 4.2:
``` bash
$ grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
```
## Installation
### From DEB Packages
Yandex ClickHouse team recommends using official pre-compiled `deb` packages for Debian or Ubuntu.
To install official packages add the Yandex repository in `/etc/apt/sources.list` or in a separate `/etc/apt/sources.list.d/clickhouse.list` file:
```bash
$ deb http://repo.yandex.ru/clickhouse/deb/stable/ main/
```
If you want to use the most recent version, replace `stable` with `testing` (this is recommended for your testing environments).
Then run these commands to actually install packages:
```bash
$ sudo apt-get install dirmngr # optional
$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4 # optional
$ sudo apt-get update
$ sudo apt-get install clickhouse-client clickhouse-server
```
You can also download and install packages manually from here: <https://repo.yandex.ru/clickhouse/deb/stable/main/>.
### From RPM Packages
Yandex ClickHouse team recommends using official pre-compiled `rpm` packages for CentOS, RedHat and all other rpm-based Linux distributions.
First you need to add the official repository:
```bash
$ sudo yum install yum-utils
$ sudo rpm --import https://repo.yandex.ru/clickhouse/CLICKHOUSE-KEY.GPG
$ sudo yum-config-manager --add-repo https://repo.yandex.ru/clickhouse/rpm/stable/x86_64
```
If you want to use the most recent version, replace `stable` with `testing` (this is recommended for your testing environments).
Then run these commands to actually install packages:
```bash
$ sudo yum install clickhouse-server clickhouse-client
```
You can also download and install packages manually from here: <https://repo.yandex.ru/clickhouse/rpm/stable/x86_64>.
### From Docker Image
To run ClickHouse inside Docker follow the guide on [Docker Hub](https://hub.docker.com/r/yandex/clickhouse-server/). Those images use official `deb` packages inside.
### From Sources
To manually compile ClickHouse, follow the instructions for [Linux](../development/build.md) or [Mac OS X](../development/build_osx.md).
You can compile packages and install them or use programs without installing packages. Also by building manually you can disable SSE 4.2 requirement or build for AArch64 CPUs.
```text
Client: dbms/programs/clickhouse-client
Server: dbms/programs/clickhouse-server
```
You'll need to create a data and metadata folders and `chown` them for the desired user. Their paths can be changed in server config (src/dbms/programs/server/config.xml), by default they are:
```text
/opt/clickhouse/data/default/
/opt/clickhouse/metadata/default/
```
On Gentoo you can just use `emerge clickhouse` to install ClickHouse from sources.
## Launch
To start the server as a daemon, run:
``` bash
$ sudo service clickhouse-server start
```
If you don't have `service` command, run as
``` bash
$ sudo /etc/init.d/clickhouse-server start
```
See the logs in the `/var/log/clickhouse-server/` directory.
If the server doesn't start, check the configurations in the file `/etc/clickhouse-server/config.xml`.
You can also manually launch the server from the console:
``` bash
$ clickhouse-server --config-file=/etc/clickhouse-server/config.xml
```
In this case, the log will be printed to the console, which is convenient during development.
If the configuration file is in the current directory, you don't need to specify the `--config-file` parameter. By default, it uses `./config.xml`.
ClickHouse supports access restriction settings. They are located in the `users.xml` file (next to `config.xml`).
By default, access is allowed from anywhere for the `default` user, without a password. See `user/default/networks`.
For more information, see the section ["Configuration Files"](../operations/configuration_files.md).
After launching server, you can use the command-line client to connect to it:
``` bash
$ clickhouse-client
```
By default it connects to `localhost:9000` on behalf of the user `default` without a password. It can also be used to connect to a remote server using `--host` argument.
The terminal must use UTF-8 encoding.
For more information, see the section ["Command-line client"](../interfaces/cli.md).
Example:
``` bash
$ ./clickhouse-client
ClickHouse client version 0.0.18749.
Connecting to localhost:9000.
Connected to ClickHouse server version 0.0.18749.
```
```sql
SELECT 1
```
```text
┌─1─┐
│ 1 │
└───┘
```
**Congratulations, the system works!**
To continue experimenting, you can download one of test data sets or go through [tutorial](https://clickhouse.yandex/tutorial.html).
If you are new to ClickHouse and want to get a hands-on feeling of it's performance, first of all you need to go through the [installation process](install.md). After that you can:
* [Go through detailed tutorial](tutorial.md)
* [Experiment with example datasets](example_datasets/ontime.md)
[Original article](https://clickhouse.yandex/docs/en/getting_started/) <!--hide-->

View File

@ -0,0 +1,153 @@
# Installation
## System Requirements
ClickHouse can run on any Linux, FreeBSD or Mac OS X with x86\_64, AArch64 or PowerPC64LE CPU architecture.
Official pre-built binaries are typically compiled for x86\_64 and leverage SSE 4.2 instruction set, so unless otherwise stated usage of CPU that supports it becomes an additional system requirement. Here's the command to check if current CPU has support for SSE 4.2:
``` bash
$ grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
```
To run ClickHouse on processors that do not support SSE 4.2 or have AArch64 or PowerPC64LE architecture, you should [build ClickHouse from sources](#from-sources) with proper configuration adjustments.
## Available Installation Options
### From DEB Packages
It is recommended to use official pre-compiled `deb` packages for Debian or Ubuntu.
To install official packages add the Yandex repository in `/etc/apt/sources.list` or in a separate `/etc/apt/sources.list.d/clickhouse.list` file:
```
deb http://repo.yandex.ru/clickhouse/deb/stable/ main/
```
If you want to use the most recent version, replace `stable` with `testing` (this is recommended for your testing environments).
Then run these commands to actually install packages:
```bash
sudo apt-get install dirmngr # optional
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4 # optional
sudo apt-get update
sudo apt-get install clickhouse-client clickhouse-server
```
You can also download and install packages manually from here: <https://repo.yandex.ru/clickhouse/deb/stable/main/>.
### From RPM Packages
It is recommended to use official pre-compiled `rpm` packages for CentOS, RedHat and all other rpm-based Linux distributions.
First you need to add the official repository:
```bash
sudo yum install yum-utils
sudo rpm --import https://repo.yandex.ru/clickhouse/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.yandex.ru/clickhouse/rpm/stable/x86_64
```
If you want to use the most recent version, replace `stable` with `testing` (this is recommended for your testing environments).
Then run these commands to actually install packages:
```bash
sudo yum install clickhouse-server clickhouse-client
```
You can also download and install packages manually from here: <https://repo.yandex.ru/clickhouse/rpm/stable/x86_64>.
### From Docker Image
To run ClickHouse inside Docker follow the guide on [Docker Hub](https://hub.docker.com/r/yandex/clickhouse-server/). Those images use official `deb` packages inside.
### From Sources
To manually compile ClickHouse, follow the instructions for [Linux](../development/build.md) or [Mac OS X](../development/build_osx.md).
You can compile packages and install them or use programs without installing packages. Also by building manually you can disable SSE 4.2 requirement or build for AArch64 CPUs.
```
Client: dbms/programs/clickhouse-client
Server: dbms/programs/clickhouse-server
```
You'll need to create a data and metadata folders and `chown` them for the desired user. Their paths can be changed in server config (src/dbms/programs/server/config.xml), by default they are:
```
/opt/clickhouse/data/default/
/opt/clickhouse/metadata/default/
```
On Gentoo you can just use `emerge clickhouse` to install ClickHouse from sources.
## Launch
To start the server as a daemon, run:
``` bash
$ sudo service clickhouse-server start
```
If you don't have `service` command, run as
``` bash
$ sudo /etc/init.d/clickhouse-server start
```
See the logs in the `/var/log/clickhouse-server/` directory.
If the server doesn't start, check the configurations in the file `/etc/clickhouse-server/config.xml`.
You can also manually launch the server from the console:
``` bash
$ clickhouse-server --config-file=/etc/clickhouse-server/config.xml
```
In this case, the log will be printed to the console, which is convenient during development.
If the configuration file is in the current directory, you don't need to specify the `--config-file` parameter. By default, it uses `./config.xml`.
ClickHouse supports access restriction settings. They are located in the `users.xml` file (next to `config.xml`).
By default, access is allowed from anywhere for the `default` user, without a password. See `user/default/networks`.
For more information, see the section ["Configuration Files"](../operations/configuration_files.md).
After launching server, you can use the command-line client to connect to it:
``` bash
$ clickhouse-client
```
By default it connects to `localhost:9000` on behalf of the user `default` without a password. It can also be used to connect to a remote server using `--host` argument.
The terminal must use UTF-8 encoding.
For more information, see the section ["Command-line client"](../interfaces/cli.md).
Example:
``` bash
$ ./clickhouse-client
ClickHouse client version 0.0.18749.
Connecting to localhost:9000.
Connected to ClickHouse server version 0.0.18749.
:) SELECT 1
SELECT 1
┌─1─┐
│ 1 │
└───┘
1 rows in set. Elapsed: 0.003 sec.
:)
```
**Congratulations, the system works!**
To continue experimenting, you can download one of test data sets or go through [tutorial](https://clickhouse.yandex/tutorial.html).
[Original article](https://clickhouse.yandex/docs/en/getting_started/install/) <!--hide-->

View File

@ -0,0 +1,645 @@
# ClickHouse Tutorial
## What to Expect from This Tutorial?
By going through this tutorial you'll learn how to set up basic ClickHouse cluster, it'll be small, but fault tolerant and scalable. We will use one of example datasets to fill it with data and execute some demo queries.
## Single Node Setup
To postpone complexities of distributed environment, we'll start with deploying ClickHouse on a single server or virtual machine. ClickHouse is usually installed from [deb](index.md#from-deb-packages) or [rpm](index.md#from-rpm-packages) packages, but there are [alternatives](index.md#from-docker-image) for the operating systems that do no support them.
For example, you have chosen `deb` packages and executed:
``` bash
sudo apt-get install dirmngr
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4
echo "deb http://repo.yandex.ru/clickhouse/deb/stable/ main/" | sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update
sudo apt-get install -y clickhouse-server clickhouse-client
```
What do we have in the packages that got installed:
* `clickhouse-client` package contains [clickhouse-client](../interfaces/cli.md) application, interactive ClickHouse console client.
* `clickhouse-common` package contains a ClickHouse executable file.
* `clickhouse-server` package contains configuration files to run ClickHouse as a server.
Server config files are located in `/etc/clickhouse-server/`. Before going further please notice the `<path>` element in `config.xml`. Path determines the location for data storage, so it should be located on volume with large disk capacity, the default value is `/var/lib/clickhouse/`. If you want to adjust the configuration it's not really handy to directly edit `config.xml` file, considering it might get rewritten on future package updates. Recommended way to override the config elements is to create [files in config.d directory](../operations/configuration_files.md) which serve as "patches" to config.xml.
As you might have noticed, `clickhouse-server` is not launched automatically after package installation. It won't be automatically restarted after updates either. The way you start the server depends on your init system, usually it's:
``` bash
sudo service clickhouse-server start
```
or
``` bash
sudo /etc/init.d/clickhouse-server start
```
The default location for server logs is `/var/log/clickhouse-server/`. Server will be ready to handle client connections once `Ready for connections` message was logged.
Once the `clickhouse-server` is up and running, we can use `clickhouse-client` to connect to the server and run some test queries like `SELECT "Hello, world!";`.
<details markdown="1"><summary>Quick tips for clickhouse-client</summary>
Interactive mode:
``` bash
clickhouse-client
clickhouse-client --host=... --port=... --user=... --password=...
```
Enable multiline queries:
``` bash
clickhouse-client -m
clickhouse-client --multiline
```
Run queries in batch-mode:
``` bash
clickhouse-client --query='SELECT 1'
echo 'SELECT 1' | clickhouse-client
clickhouse-client <<< 'SELECT 1'
```
Insert data from a file in specified format:
``` bash
clickhouse-client --query='INSERT INTO table VALUES' < data.txt
clickhouse-client --query='INSERT INTO table FORMAT TabSeparated' < data.tsv
```
</details>
## Import Sample Dataset
Now it's time to fill our ClickHouse server with some sample data. In this tutorial we'll use anonymized data of Yandex.Metrica, the first service that run ClickHouse in production way before it became open-source (more on that in [history section](../introduction/history.md)). There are [multiple ways to import Yandex.Metrica dataset](example_datasets/metrica.md) and for the sake of the tutorial we'll go with the most realistic one.
### Download and Extract Table Data
``` bash
curl https://clickhouse-datasets.s3.yandex.net/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
curl https://clickhouse-datasets.s3.yandex.net/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv
```
The extracted files are about 10GB in size.
### Create Tables
Tables are logically grouped into "databases". There's a `default` database, but we'll create a new one named `tutorial`:
``` bash
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS tutorial"
```
Syntax for creating tables is way more complicated compared to databases (see [reference](../query_language/create.md). In general `CREATE TABLE` statement has to specify three key things:
1. Name of table to create.
2. Table schema, i.e. list of columns and their [data types](../data_types/index.md).
3. [Table engine](../operations/table_engines/index.md) and it's settings, which determines all the details on how queries to this table will be physically executed.
Yandex.Metrica is a web analytics service and sample dataset doesn't cover it's full functionality, so there are only two tables to create:
* `hits` is a table with each action done by all users on all websites covered by the service.
* `visits` is a table that contains pre-built sessions instead of individual actions.
Let's see and execute the real create table queries for these tables:
``` sql
CREATE TABLE tutorial.hits_v1
(
`WatchID` UInt64,
`JavaEnable` UInt8,
`Title` String,
`GoodEvent` Int16,
`EventTime` DateTime,
`EventDate` Date,
`CounterID` UInt32,
`ClientIP` UInt32,
`ClientIP6` FixedString(16),
`RegionID` UInt32,
`UserID` UInt64,
`CounterClass` Int8,
`OS` UInt8,
`UserAgent` UInt8,
`URL` String,
`Referer` String,
`URLDomain` String,
`RefererDomain` String,
`Refresh` UInt8,
`IsRobot` UInt8,
`RefererCategories` Array(UInt16),
`URLCategories` Array(UInt16),
`URLRegions` Array(UInt32),
`RefererRegions` Array(UInt32),
`ResolutionWidth` UInt16,
`ResolutionHeight` UInt16,
`ResolutionDepth` UInt8,
`FlashMajor` UInt8,
`FlashMinor` UInt8,
`FlashMinor2` String,
`NetMajor` UInt8,
`NetMinor` UInt8,
`UserAgentMajor` UInt16,
`UserAgentMinor` FixedString(2),
`CookieEnable` UInt8,
`JavascriptEnable` UInt8,
`IsMobile` UInt8,
`MobilePhone` UInt8,
`MobilePhoneModel` String,
`Params` String,
`IPNetworkID` UInt32,
`TraficSourceID` Int8,
`SearchEngineID` UInt16,
`SearchPhrase` String,
`AdvEngineID` UInt8,
`IsArtifical` UInt8,
`WindowClientWidth` UInt16,
`WindowClientHeight` UInt16,
`ClientTimeZone` Int16,
`ClientEventTime` DateTime,
`SilverlightVersion1` UInt8,
`SilverlightVersion2` UInt8,
`SilverlightVersion3` UInt32,
`SilverlightVersion4` UInt16,
`PageCharset` String,
`CodeVersion` UInt32,
`IsLink` UInt8,
`IsDownload` UInt8,
`IsNotBounce` UInt8,
`FUniqID` UInt64,
`HID` UInt32,
`IsOldCounter` UInt8,
`IsEvent` UInt8,
`IsParameter` UInt8,
`DontCountHits` UInt8,
`WithHash` UInt8,
`HitColor` FixedString(1),
`UTCEventTime` DateTime,
`Age` UInt8,
`Sex` UInt8,
`Income` UInt8,
`Interests` UInt16,
`Robotness` UInt8,
`GeneralInterests` Array(UInt16),
`RemoteIP` UInt32,
`RemoteIP6` FixedString(16),
`WindowName` Int32,
`OpenerName` Int32,
`HistoryLength` Int16,
`BrowserLanguage` FixedString(2),
`BrowserCountry` FixedString(2),
`SocialNetwork` String,
`SocialAction` String,
`HTTPError` UInt16,
`SendTiming` Int32,
`DNSTiming` Int32,
`ConnectTiming` Int32,
`ResponseStartTiming` Int32,
`ResponseEndTiming` Int32,
`FetchTiming` Int32,
`RedirectTiming` Int32,
`DOMInteractiveTiming` Int32,
`DOMContentLoadedTiming` Int32,
`DOMCompleteTiming` Int32,
`LoadEventStartTiming` Int32,
`LoadEventEndTiming` Int32,
`NSToDOMContentLoadedTiming` Int32,
`FirstPaintTiming` Int32,
`RedirectCount` Int8,
`SocialSourceNetworkID` UInt8,
`SocialSourcePage` String,
`ParamPrice` Int64,
`ParamOrderID` String,
`ParamCurrency` FixedString(3),
`ParamCurrencyID` UInt16,
`GoalsReached` Array(UInt32),
`OpenstatServiceName` String,
`OpenstatCampaignID` String,
`OpenstatAdID` String,
`OpenstatSourceID` String,
`UTMSource` String,
`UTMMedium` String,
`UTMCampaign` String,
`UTMContent` String,
`UTMTerm` String,
`FromTag` String,
`HasGCLID` UInt8,
`RefererHash` UInt64,
`URLHash` UInt64,
`CLID` UInt32,
`YCLID` UInt64,
`ShareService` String,
`ShareURL` String,
`ShareTitle` String,
`ParsedParams` Nested(
Key1 String,
Key2 String,
Key3 String,
Key4 String,
Key5 String,
ValueDouble Float64),
`IslandID` FixedString(16),
`RequestNum` UInt32,
`RequestTry` UInt8
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))
SAMPLE BY intHash32(UserID)
SETTINGS index_granularity = 8192
```
``` sql
CREATE TABLE tutorial.visits_v1
(
`CounterID` UInt32,
`StartDate` Date,
`Sign` Int8,
`IsNew` UInt8,
`VisitID` UInt64,
`UserID` UInt64,
`StartTime` DateTime,
`Duration` UInt32,
`UTCStartTime` DateTime,
`PageViews` Int32,
`Hits` Int32,
`IsBounce` UInt8,
`Referer` String,
`StartURL` String,
`RefererDomain` String,
`StartURLDomain` String,
`EndURL` String,
`LinkURL` String,
`IsDownload` UInt8,
`TraficSourceID` Int8,
`SearchEngineID` UInt16,
`SearchPhrase` String,
`AdvEngineID` UInt8,
`PlaceID` Int32,
`RefererCategories` Array(UInt16),
`URLCategories` Array(UInt16),
`URLRegions` Array(UInt32),
`RefererRegions` Array(UInt32),
`IsYandex` UInt8,
`GoalReachesDepth` Int32,
`GoalReachesURL` Int32,
`GoalReachesAny` Int32,
`SocialSourceNetworkID` UInt8,
`SocialSourcePage` String,
`MobilePhoneModel` String,
`ClientEventTime` DateTime,
`RegionID` UInt32,
`ClientIP` UInt32,
`ClientIP6` FixedString(16),
`RemoteIP` UInt32,
`RemoteIP6` FixedString(16),
`IPNetworkID` UInt32,
`SilverlightVersion3` UInt32,
`CodeVersion` UInt32,
`ResolutionWidth` UInt16,
`ResolutionHeight` UInt16,
`UserAgentMajor` UInt16,
`UserAgentMinor` UInt16,
`WindowClientWidth` UInt16,
`WindowClientHeight` UInt16,
`SilverlightVersion2` UInt8,
`SilverlightVersion4` UInt16,
`FlashVersion3` UInt16,
`FlashVersion4` UInt16,
`ClientTimeZone` Int16,
`OS` UInt8,
`UserAgent` UInt8,
`ResolutionDepth` UInt8,
`FlashMajor` UInt8,
`FlashMinor` UInt8,
`NetMajor` UInt8,
`NetMinor` UInt8,
`MobilePhone` UInt8,
`SilverlightVersion1` UInt8,
`Age` UInt8,
`Sex` UInt8,
`Income` UInt8,
`JavaEnable` UInt8,
`CookieEnable` UInt8,
`JavascriptEnable` UInt8,
`IsMobile` UInt8,
`BrowserLanguage` UInt16,
`BrowserCountry` UInt16,
`Interests` UInt16,
`Robotness` UInt8,
`GeneralInterests` Array(UInt16),
`Params` Array(String),
`Goals` Nested(
ID UInt32,
Serial UInt32,
EventTime DateTime,
Price Int64,
OrderID String,
CurrencyID UInt32),
`WatchIDs` Array(UInt64),
`ParamSumPrice` Int64,
`ParamCurrency` FixedString(3),
`ParamCurrencyID` UInt16,
`ClickLogID` UInt64,
`ClickEventID` Int32,
`ClickGoodEvent` Int32,
`ClickEventTime` DateTime,
`ClickPriorityID` Int32,
`ClickPhraseID` Int32,
`ClickPageID` Int32,
`ClickPlaceID` Int32,
`ClickTypeID` Int32,
`ClickResourceID` Int32,
`ClickCost` UInt32,
`ClickClientIP` UInt32,
`ClickDomainID` UInt32,
`ClickURL` String,
`ClickAttempt` UInt8,
`ClickOrderID` UInt32,
`ClickBannerID` UInt32,
`ClickMarketCategoryID` UInt32,
`ClickMarketPP` UInt32,
`ClickMarketCategoryName` String,
`ClickMarketPPName` String,
`ClickAWAPSCampaignName` String,
`ClickPageName` String,
`ClickTargetType` UInt16,
`ClickTargetPhraseID` UInt64,
`ClickContextType` UInt8,
`ClickSelectType` Int8,
`ClickOptions` String,
`ClickGroupBannerID` Int32,
`OpenstatServiceName` String,
`OpenstatCampaignID` String,
`OpenstatAdID` String,
`OpenstatSourceID` String,
`UTMSource` String,
`UTMMedium` String,
`UTMCampaign` String,
`UTMContent` String,
`UTMTerm` String,
`FromTag` String,
`HasGCLID` UInt8,
`FirstVisit` DateTime,
`PredLastVisit` Date,
`LastVisit` Date,
`TotalVisits` UInt32,
`TraficSource` Nested(
ID Int8,
SearchEngineID UInt16,
AdvEngineID UInt8,
PlaceID UInt16,
SocialSourceNetworkID UInt8,
Domain String,
SearchPhrase String,
SocialSourcePage String),
`Attendance` FixedString(16),
`CLID` UInt32,
`YCLID` UInt64,
`NormalizedRefererHash` UInt64,
`SearchPhraseHash` UInt64,
`RefererDomainHash` UInt64,
`NormalizedStartURLHash` UInt64,
`StartURLDomainHash` UInt64,
`NormalizedEndURLHash` UInt64,
`TopLevelDomain` UInt64,
`URLScheme` UInt64,
`OpenstatServiceNameHash` UInt64,
`OpenstatCampaignIDHash` UInt64,
`OpenstatAdIDHash` UInt64,
`OpenstatSourceIDHash` UInt64,
`UTMSourceHash` UInt64,
`UTMMediumHash` UInt64,
`UTMCampaignHash` UInt64,
`UTMContentHash` UInt64,
`UTMTermHash` UInt64,
`FromHash` UInt64,
`WebVisorEnabled` UInt8,
`WebVisorActivity` UInt32,
`ParsedParams` Nested(
Key1 String,
Key2 String,
Key3 String,
Key4 String,
Key5 String,
ValueDouble Float64),
`Market` Nested(
Type UInt8,
GoalID UInt32,
OrderID String,
OrderPrice Int64,
PP UInt32,
DirectPlaceID UInt32,
DirectOrderID UInt32,
DirectBannerID UInt32,
GoodID String,
GoodName String,
GoodQuantity Int32,
GoodPrice Int64),
`IslandID` FixedString(16)
)
ENGINE = CollapsingMergeTree(Sign)
PARTITION BY toYYYYMM(StartDate)
ORDER BY (CounterID, StartDate, intHash32(UserID), VisitID)
SAMPLE BY intHash32(UserID)
SETTINGS index_granularity = 8192
```
You can execute those queries using interactive mode of `clickhouse-client` (just launch it in terminal without specifying a query in advance) or try some [alternative interface](../interfaces/index.md) if you ant.
As we can see, `hits_v1` uses the [basic MergeTree engine](../operations/table_engines/mergetree.md), while the `visits_v1` uses the [Collapsing](../operations/table_engines/collapsingmergetree.md) variant.
### Import Data
Data import to ClickHouse is done via [INSERT INTO](../query_language/insert_into.md) query like in many other SQL databases. However data is usually provided in one of the [supported formats](../interfaces/formats.md) instead of `VALUES` clause (which is also supported).
The files we downloaded earlier are in tab-separated format, so here's how to import them via console client:
``` bash
clickhouse-client --query "INSERT INTO tutorial.hits_v1 FORMAT TSV" --max_insert_block_size=100000 < hits_v1.tsv
clickhouse-client --query "INSERT INTO tutorial.visits_v1 FORMAT TSV" --max_insert_block_size=100000 < visits_v1.tsv
```
ClickHouse has a lot of [settings to tune](../operations/settings/index.md) and one way to specify them in console client is via arguments, as we can see with `--max_insert_block_size`. The easiest way to figure out what settings are available, what do they mean and what the defaults are is to query the `system.settings` table:
``` sql
SELECT name, value, changed, description
FROM system.settings
WHERE name LIKE '%max_insert_b%'
FORMAT TSV
max_insert_block_size 1048576 0 "The maximum block size for insertion, if we control the creation of blocks for insertion."
```
Optionally you can [OPTIMIZE](../query_language/misc/#misc_operations-optimize) the tables after import. Tables that are configured with MergeTree-family engine always do merges of data parts in background to optimize data storage (or at least check if it makes sense). These queries will just force table engine to do storage optimization right now instead of some time later:
``` bash
clickhouse-client --query "OPTIMIZE TABLE tutorial.hits_v1 FINAL"
clickhouse-client --query "OPTIMIZE TABLE tutorial.visits_v1 FINAL"
```
This is I/O and CPU intensive operation so if the table constantly receives new data it's better to leave it alone and let merges run in background.
Now we can check that the tables are successfully imported:
``` bash
clickhouse-client --query "SELECT COUNT(*) FROM tutorial.hits_v1"
clickhouse-client --query "SELECT COUNT(*) FROM tutorial.visits_v1"
```
## Example Queries
``` sql
SELECT
StartURL AS URL,
AVG(Duration) AS AvgDuration
FROM tutorial.visits_v1
WHERE StartDate BETWEEN '2014-03-23' AND '2014-03-30'
GROUP BY URL
ORDER BY AvgDuration DESC
LIMIT 10
```
``` sql
SELECT
sum(Sign) AS visits,
sumIf(Sign, has(Goals.ID, 1105530)) AS goal_visits,
(100. * goal_visits) / visits AS goal_percent
FROM tutorial.visits_v1
WHERE (CounterID = 912887) AND (toYYYYMM(StartDate) = 201403) AND (domain(StartURL) = 'yandex.ru')
```
## Cluster Deployment
ClickHouse cluster is a homogenous cluster. Steps to set up:
1. Install ClickHouse server on all machines of the cluster
2. Set up cluster configs in configuration files
3. Create local tables on each instance
4. Create a [Distributed table](../operations/table_engines/distributed.md)
[Distributed table](../operations/table_engines/distributed.md) is actually a kind of "view" to local tables of ClickHouse cluster. SELECT query from a distributed table will be executed using resources of all cluster's shards. You may specify configs for multiple clusters and create multiple distributed tables providing views to different clusters.
Example config for cluster with three shards, one replica each:
``` xml
<remote_servers>
<perftest_3shards_1replicas>
<shard>
<replica>
<host>example-perftest01j.yandex.ru</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>example-perftest02j.yandex.ru</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>example-perftest03j.yandex.ru</host>
<port>9000</port>
</replica>
</shard>
</perftest_3shards_1replicas>
</remote_servers>
```
For further demonstration let's create new local table with exactly the same `CREATE TABLE` query that we used for `hits_v1`, but different table name:
``` sql
CREATE TABLE tutorial.hits_local (...) ENGINE = MergeTree() ...
```
Creating a distributed table providing a view into local tables of the cluster:
``` sql
CREATE TABLE tutorial.hits_all AS tutorial.hits_local
ENGINE = Distributed(perftest_3shards_1replicas, tutorial, hits_local, rand());
```
Common practice is to create similar Distributed tables on all machines of the cluster. This would allow to run distributed queries on any machine of the cluster. Also there's an alternative option to create temporary distributed table for a given SELECT query using [remote](../query_language/table_functions/remote.md) table function.
Let's run [INSERT SELECT](../query_language/insert_into.md) into Distributed table to spread the table to multiple servers.
``` sql
INSERT INTO tutorial.hits_all SELECT * FROM tutorial.hits_v1;
```
!!! warning "Notice"
This approach is not suitable for sharding of large tables. There's a separate tool [clickhouse-copier](../operations/utils/clickhouse-copier.md) that can re-shard arbitrary large tables.
As you could expect computationally heavy queries are executed N times faster being launched on 3 servers instead of one.
In this case we have used a cluster with 3 shards each contains a single replica.
To provide resilience in production environment we recommend that each shard should contain 2-3 replicas distributed between multiple data-centers. Note that ClickHouse supports unlimited number of replicas.
Example config for cluster of one shard containing three replicas:
``` xml
<remote_servers>
...
<perftest_1shards_3replicas>
<shard>
<replica>
<host>example-perftest01j.yandex.ru</host>
<port>9000</port>
</replica>
<replica>
<host>example-perftest02j.yandex.ru</host>
<port>9000</port>
</replica>
<replica>
<host>example-perftest03j.yandex.ru</host>
<port>9000</port>
</replica>
</shard>
</perftest_1shards_3replicas>
</remote_servers>
```
To enable native replication <a href="http://zookeeper.apache.org/" rel="external nofollow">ZooKeeper</a> is required. ClickHouse will take care of data consistency on all replicas and run restore procedure after failure
automatically. It's recommended to deploy ZooKeeper cluster to separate servers.
ZooKeeper is not a strict requirement: in some simple cases you can duplicate the data by writing it into all the replicas from your application code. This approach is **not** recommended, in this case ClickHouse won't be able to
guarantee data consistency on all replicas. This remains the responsibility of your application.
ZooKeeper locations need to be specified in configuration file:
``` xml
<zookeeper-servers>
<node>
<host>zoo01.yandex.ru</host>
<port>2181</port>
</node>
<node>
<host>zoo02.yandex.ru</host>
<port>2181</port>
</node>
<node>
<host>zoo03.yandex.ru</host>
<port>2181</port>
</node>
</zookeeper-servers>
```
Also we need to set macros for identifying each shard and replica, it will be used on table creation:
``` xml
<macros>
<shard>01</shard>
<replica>01</replica>
</macros>
```
If there are no replicas at the moment on replicated table creation, a new first replica will be instantiated. If there are already live replicas, new replica will clone the data from existing ones. You have an option to create all replicated tables first and that insert data to it. Another option is to create some replicas and add the others after or during data insertion.
``` sql
CREATE TABLE tutorial.hits_replica (...)
ENGINE = ReplcatedMergeTree(
'/clickhouse_perftest/tables/{shard}/hits',
'{replica}'
)
...
```
Here we use [ReplicatedMergeTree](../operations/table_engines/replication.md) table engine. In parameters we specify ZooKeeper path containing shard and replica identifiers.
``` sql
INSERT INTO tutorial.hits_replica SELECT * FROM tutorial.hits_local;
```
Replication operates in multi-master mode. Data can be loaded into any replica and it will be synced with other instances automatically. Replication is asynchronous so at a given moment of time not all replicas may contain recently inserted data. To allow data insertion at least one replica should be up. Others will sync up data and repair consistency once they will become active again. Please notice that such approach allows for the low possibility of loss of just appended data.

View File

@ -1,4 +1,4 @@
# Yandex.Metrica Use Case
# ClickHouse History
ClickHouse was originally developed to power [Yandex.Metrica](https://metrica.yandex.com/), [the second largest web analytics platform in the world](http://w3techs.com/technologies/overview/traffic_analysis/all), and continues to be the core component of this system. With more than 13 trillion records in the database and more than 20 billion events daily, ClickHouse allows generating custom reports on the fly directly from non-aggregated data. This article briefly covers the goals of ClickHouse in the early stages of its development.
@ -47,4 +47,4 @@ OLAPServer worked well for non-aggregated data, but it had many restrictions tha
To remove the limitations of OLAPServer and solve the problem of working with non-aggregated data for all reports, we developed the ClickHouse DBMS.
[Original article](https://clickhouse.yandex/docs/en/introduction/ya_metrika_task/) <!--hide-->
[Original article](https://clickhouse.yandex/docs/en/introduction/history/) <!--hide-->

View File

@ -1,197 +1,11 @@
<div dir="rtl" markdown="1">
# ﻥﺪﺷ ﻉﻭﺮﺷ
# شروع به کار
ﻖﯾﺮﻃ ﺯﺍ ﺪﯾﺎﺑ ﻪﻤﻫ ﺯﺍ ﻝﻭﺍ ، ﺪﯿﻨﮐ ﺱﺎﺴﺣﺍ ﺍﺭ ﻥﺁ ﺩﺮﮑﻠﻤﻋ ﺪﯿﻫﺍﻮﺧ ﯽﻣ ﻭ ﺪﯿﺘﺴﻫ ﺩﺭﺍﻭ ﻩﺯﺎﺗ[ﺐﺼﻧ ﻞﺣﺍﺮﻣ](install.md).
ﺪﯿﻨﮐ ﺏﺎﺨﺘﻧﺍ ﺍﺭ ﺮﯾﺯ ﯼﺎﻫ ﻪﻨﯾﺰﮔ ﺯﺍ ﯽﮑﯾ ﺪﯿﻧﺍﻮﺗ ﯽﻣ ﻥﺁ ﺯﺍ ﺲﭘ:
## نیازمندی های سیستم
این یک سیستم چند سکویی (Cross-Platform) نمی باشد. این ابزار نیاز به Linux Ubuntu Precise (12.04) یا جدیدتر، با معماری x86\_64 و پشتیبانی از SSE 4.2 می باشد. برای چک کردن SSE 4.2 خروجی دستور زیر را بررسی کنید:
* [ﺪﯿﻨﮐ ﯽﻃ ﺍﺭ ﻞﺼﻔﻣ ﺵﺯﻮﻣﺁ](tutorial.md)
* [ﺪﯿﻨﮐ ﺶﯾﺎﻣﺯﺁ ﻪﻧﻮﻤﻧ ﯼﺎﻫ ﻩﺩﺍﺩ ﺎﺑ](example_datasets/ontime.md)
[ﯽﻠﺻﺍ ﻪﻟﺎﻘﻣ](https://clickhouse.yandex/docs/fa/getting_started/) <!--hide-->
</div>
```bash
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
```
<div dir="rtl" markdown="1">
پیشنهاد می کنیم از Ubuntu TrustyT، Ubuntu Xenial یا Ubuntu Precise استفاده کنید. ترمینال باید از UTF-8 پشتیبانی کند. (به صورت پیش فرض در Ubuntu پشتیبانی می شود).
## نصب
### نصب از طریق پکیج های Debian/Ubuntu
در فایل `/etc/apt/sources.list` (یا در یک فایل جدا `/etc/apt/sources.list.d/clickhouse.list`)، Repo زیر را اضافه کنید:
</div>
```
deb http://repo.yandex.ru/clickhouse/deb/stable/ main/
```
<div dir="rtl" markdown="1">
اگر شما میخوایید جدیدترین نسخه ی تست را استفاده کنید، 'stable' رو به 'testing' تغییر بدید.
سپس دستورات زیر را اجرا کنید:
</div>
```bash
sudo apt-get install dirmngr # optional
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4 # optional
sudo apt-get update
sudo apt-get install clickhouse-client clickhouse-server
```
<div dir="rtl" markdown="1">
شما همچنین می توانید از طریق لینک زیر پکیج ClickHouse را به صورت دستی دانلود و نصب کنید: <https://repo.yandex.ru/clickhouse/deb/stable/main/>.
ClickHouse دارای تنظیمات محدودیت دسترسی می باشد. این تنظیمات در فایل 'users.xml' (کنار 'config.xml') می باشد. به صورت پیش فرض دسترسی برای کاربر 'default' از همه جا بدون نیاز به پسورد وجود دارد. 'user/default/networks' را مشاهده کنید. برای اطلاعات بیشتر قسمت "تنظیمات فایل ها" را مشاهده کنید.
RPM ﯼﺎﻫ ﻪﺘﺴﺑ ﺯﺍ ###
.ﺪﻨﮐ ﯽﻣ ﻪﯿﺻﻮﺗ ﺲﮐﻮﻨﯿﻟ ﺮﺑ ﯽﻨﺘﺒﻣ rpm ﺮﺑ ﯽﻨﺘﺒﻣ ﯼﺎﻫ ﻊﯾﺯﻮﺗ ﺮﯾﺎﺳ ﻭ CentOS ، RedHat ﯼﺍ
:ﺪﯿﻨﮐ ﻪﻓﺎﺿﺍ ﺍﺭ ﯽﻤﺳﺭ ﻥﺰﺨﻣ ﺪﯾﺎﺑ ﺍﺪﺘﺑﺍ
```bash
sudo yum install yum-utils
sudo rpm --import https://repo.yandex.ru/clickhouse/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.yandex.ru/clickhouse/rpm/stable/x86_64
```
.(ﺩﻮﺷ ﯽﻣ ﻪﯿﺻﻮﺗ ﺎﻤﺷ ﺶﯾﺎﻣﺯﺁ ﯼﺎﻫ ﻂﯿﺤﻣ ﯼﺍﺮﺑ ﻦﯾﺍ) ﺪﯿﻨﮐ ﻦﯾﺰﮕﯾﺎﺟ "ﺖﺴﺗ" ﺎﺑ ﺍﺭ "ﺭﺍﺪﯾﺎﭘ"
:ﺪﯿﻨﮐ ﺐﺼﻧ ﺍﺭ ﻪﺘﺴﺑ ﻊﻗﺍﻭ ﺭﺩ ﺎﺗ ﺪﯿﻨﮐ ﺍﺮﺟﺍ ﺍﺭ ﺕﺍﺭﻮﺘﺳﺩ ﻦﯾﺍ ﺲﭙﺳ
```bash
sudo yum install clickhouse-server clickhouse-client
```
.<https://repo.yandex.ru/clickhouse/rpm/stable/x86_64> :ﺪﯿﻨﮐ ﺐﺼﻧ ﻭ ﯼﺮﯿﮔﺭﺎﺑ ﺎﺠﻨ
Docker Image ﺯﺍ ###
.ﺪﻨﻨﮐ ﯽﻣ ﻩﺩﺎﻔﺘﺳﺍ ﻞﺧﺍﺩ ﺭﺩ "deb" ﯽﻤﺳﺭ ﯼﺎﻫ ﻪﺘﺴﺑ ﺯﺍ ﺮﯾﻭﺎﺼﺗ ﻦﯾﺍ .ﺪﯿﻨﮐ ﻝﺎﺒﻧﺩ ﺍﺭ (/ht
### نصب از طریق Source
برای Compile، دستورالعمل های فایل build.md را دنبال کنید:
شما میتوانید پکیج را compile و نصب کنید. شما همچنین می توانید بدون نصب پکیج از برنامه ها استفاده کنید.
</div>
```
Client: dbms/programs/clickhouse-client
Server: dbms/programs/clickhouse-server
```
<div dir="rtl" markdown="1">
برای سرور، یک کاتالوگ با دیتا بسازید، مانند
</div>
```
/opt/clickhouse/data/default/
/opt/clickhouse/metadata/default/
```
<div dir="rtl" markdown="1">
(قابل تنظیم در تنظیمات سرور). 'chown' را برای کاربر دلخواه اجرا کنید.
به مسیر لاگ ها در تنظیمات سرور توجه کنید (src/dbms/programs/config.xml).
### روش های دیگر نصب
Docker image: <https://hub.docker.com/r/yandex/clickhouse-server/>
پکیج RPM برای CentOS یا RHEL: <https://github.com/Altinity/clickhouse-rpm-install>
Gentoo: `emerge clickhouse`
## راه اندازی
برای استارت سرور (به صورت daemon)، دستور زیر را اجرا کنید:
</div>
```bash
sudo service clickhouse-server start
```
<div dir="rtl" markdown="1">
لاگ های دایرکتوری `/var/log/clickhouse-server/` directory. را مشاهده کنید.
اگر سرور استارت نشد، فایل تنظیمات را بررسی کنید `/etc/clickhouse-server/config.xml.`
شما همچنین می توانید سرور را از طریق کنسول راه اندازی کنید:
</div>
```bash
clickhouse-server --config-file=/etc/clickhouse-server/config.xml
```
<div dir="rtl" markdown="1">
در این مورد که مناسب زمان توسعه می باشد، لاگ ها در کنسول پرینت می شوند. اگر فایل تنظیمات در دایرکتوری جاری باشد، نیازی به مشخص کردن '--config-file' نمی باشد. به صورت پیش فرض از './config.xml' استفاده می شود.
شما می توانید از کلاینت command-line برای اتصال به سرور استفاده کنید:
</div>
```bash
clickhouse-client
```
<div dir="rtl" markdown="1">
پارامترهای پیش فرض، نشان از اتصال به localhost:9000 از طرف کاربر 'default' بدون پسورد را می دهد. از کلاینت میتوان برای اتصال به یک سرور remote استفاده کرد. مثال:
</div>
```bash
clickhouse-client --host=example.com
```
<div dir="rtl" markdown="1">
برای اطلاعات بیشتر، بخش "کلاینت Command-line" را مشاهده کنید.
چک کردن سیستم:
</div>
```bash
milovidov@hostname:~/work/metrica/src/dbms/src/Client$ ./clickhouse-client
ClickHouse client version 0.0.18749.
Connecting to localhost:9000.
Connected to ClickHouse server version 0.0.18749.
:) SELECT 1
SELECT 1
┌─1─┐
│ 1 │
└───┘
1 rows in set. Elapsed: 0.003 sec.
:)
```
<div dir="rtl" markdown="1">
**تبریک میگم، سیستم کار می کنه!**
برای ادامه آزمایشات، شما میتوانید دیتاست های تستی را دریافت و امتحان کنید.
</div>
[مقاله اصلی](https://clickhouse.yandex/docs/fa/getting_started/) <!--hide-->

View File

@ -0,0 +1,199 @@
<div dir="rtl" markdown="1">
# ﯼﺯﺍﺪﻧﺍ ﻩﺍﺭ ﻭ ﺐﺼﻧ
## نیازمندی های سیستم
ClickHouse ﺲﮐﻮﻨﯿﻟ ﻉﻮﻧ ﺮﻫ ﯼﻭﺭ ﺮﺑ ﺪﻧﺍﻮﺗ ﯽﻣ ، FreeBSD ﺎﯾ Mac OS X ﯼﺭﺎﻤﻌﻣ ﺎﺑ CPU x
:ﺖﺳﺍ ﻩﺪﻣﺁ ، ﺪﻨﮐ ﯽﻣ ﯽﻧﺎﺒﯿﺘﺸﭘ SSE 4.2 ﺯﺍ ﯽﻠﻌﻓ CPU ﺎﯾﺁ ﻪﮑﻨﯾﺍ ﯽﺳﺭﺮﺑ ﯼﺍﺮﺑ ﺭﻮﺘﺳﺩ ﻦﯾﺍ
</div>
```bash
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
```
<div dir="rtl" markdown="1">
ﺪﯾﺎﺑ ، ﺪﻧﺭﺍﺪﻧ PowerPC64LE ﺎﯾ AArch64 ﯼﺭﺎﻤﻌﻣ ﺎﯾ ﺪﻨﻨﮐ ﯽﻤﻧ ﯽﻧﺎﺒﯿﺘﺸﭘ SSE 4.2 ﺯﺍ ﻪﮐ[ClickHouse ﺪﯿﻨﮐ ﺩﺎﺠﯾﺍ ﻊﺑﺎﻨﻣ ﺯﺍ ﺍﺭ](#from-sources) ﺐﺳﺎﻨﻣ ﺕﺎﻤﯿﻈﻨﺗ ﺎﺑ
##ﺩﻮﺟﻮﻣ ﺐﺼﻧ ﯼﺎﻫ ﻪﻨﯾﺰﮔ
### نصب از طریق پکیج های Debian/Ubuntu {#from-deb-packages}
در فایل `/etc/apt/sources.list` (یا در یک فایل جدا `/etc/apt/sources.list.d/clickhouse.list`)، Repo زیر را اضافه کنید:
</div>
```
deb http://repo.yandex.ru/clickhouse/deb/stable/ main/
```
<div dir="rtl" markdown="1">
اگر شما میخوایید جدیدترین نسخه ی تست را استفاده کنید، 'stable' رو به 'testing' تغییر بدید.
سپس دستورات زیر را اجرا کنید:
</div>
```bash
sudo apt-get install dirmngr # optional
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4 # optional
sudo apt-get update
sudo apt-get install clickhouse-client clickhouse-server
```
<div dir="rtl" markdown="1">
شما همچنین می توانید از طریق لینک زیر پکیج ClickHouse را به صورت دستی دانلود و نصب کنید: <https://repo.yandex.ru/clickhouse/deb/stable/main/>.
ClickHouse دارای تنظیمات محدودیت دسترسی می باشد. این تنظیمات در فایل 'users.xml' (کنار 'config.xml') می باشد. به صورت پیش فرض دسترسی برای کاربر 'default' از همه جا بدون نیاز به پسورد وجود دارد. 'user/default/networks' را مشاهده کنید. برای اطلاعات بیشتر قسمت "تنظیمات فایل ها" را مشاهده کنید.
### RPM ﯼﺎﻫ ﻪﺘﺴﺑ ﺯﺍ {#from-rpm-packages}
.ﺪﻨﮐ ﯽﻣ ﻪﯿﺻﻮﺗ ﺲﮐﻮﻨﯿﻟ ﺮﺑ ﯽﻨﺘﺒﻣ rpm ﺮﺑ ﯽﻨﺘﺒﻣ ﯼﺎﻫ ﻊﯾﺯﻮﺗ ﺮﯾﺎﺳ ﻭ CentOS ، RedHat ﯼﺍ
:ﺪﯿﻨﮐ ﻪﻓﺎﺿﺍ ﺍﺭ ﯽﻤﺳﺭ ﻥﺰﺨﻣ ﺪﯾﺎﺑ ﺍﺪﺘﺑﺍ
```bash
sudo yum install yum-utils
sudo rpm --import https://repo.yandex.ru/clickhouse/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.yandex.ru/clickhouse/rpm/stable/x86_64
```
.(ﺩﻮﺷ ﯽﻣ ﻪﯿﺻﻮﺗ ﺎﻤﺷ ﺶﯾﺎﻣﺯﺁ ﯼﺎﻫ ﻂﯿﺤﻣ ﯼﺍﺮﺑ ﻦﯾﺍ) ﺪﯿﻨﮐ ﻦﯾﺰﮕﯾﺎﺟ "ﺖﺴﺗ" ﺎﺑ ﺍﺭ "ﺭﺍﺪﯾﺎﭘ"
:ﺪﯿﻨﮐ ﺐﺼﻧ ﺍﺭ ﻪﺘﺴﺑ ﻊﻗﺍﻭ ﺭﺩ ﺎﺗ ﺪﯿﻨﮐ ﺍﺮﺟﺍ ﺍﺭ ﺕﺍﺭﻮﺘﺳﺩ ﻦﯾﺍ ﺲﭙﺳ
```bash
sudo yum install clickhouse-server clickhouse-client
```
.<https://repo.yandex.ru/clickhouse/rpm/stable/x86_64> :ﺪﯿﻨﮐ ﺐﺼﻧ ﻭ ﯼﺮﯿﮔﺭﺎﺑ ﺎﺠﻨ
Docker Image ﺯﺍ ###
.ﺪﻨﻨﮐ ﯽﻣ ﻩﺩﺎﻔﺘﺳﺍ ﻞﺧﺍﺩ ﺭﺩ "deb" ﯽﻤﺳﺭ ﯼﺎﻫ ﻪﺘﺴﺑ ﺯﺍ ﺮﯾﻭﺎﺼﺗ ﻦﯾﺍ .ﺪﯿﻨﮐ ﻝﺎﺒﻧﺩ ﺍﺭ (/ht
### نصب از طریق Source {#from-sources}
برای Compile، دستورالعمل های فایل build.md را دنبال کنید:
شما میتوانید پکیج را compile و نصب کنید. شما همچنین می توانید بدون نصب پکیج از برنامه ها استفاده کنید.
</div>
```
Client: dbms/programs/clickhouse-client
Server: dbms/programs/clickhouse-server
```
<div dir="rtl" markdown="1">
برای سرور، یک کاتالوگ با دیتا بسازید، مانند
</div>
```
/opt/clickhouse/data/default/
/opt/clickhouse/metadata/default/
```
<div dir="rtl" markdown="1">
(قابل تنظیم در تنظیمات سرور). 'chown' را برای کاربر دلخواه اجرا کنید.
به مسیر لاگ ها در تنظیمات سرور توجه کنید (src/dbms/programs/config.xml).
### روش های دیگر نصب {#from-docker-image}
Docker image: <https://hub.docker.com/r/yandex/clickhouse-server/>
پکیج RPM برای CentOS یا RHEL: <https://github.com/Altinity/clickhouse-rpm-install>
Gentoo: `emerge clickhouse`
## راه اندازی
برای استارت سرور (به صورت daemon)، دستور زیر را اجرا کنید:
</div>
```bash
sudo service clickhouse-server start
```
<div dir="rtl" markdown="1">
لاگ های دایرکتوری `/var/log/clickhouse-server/` directory. را مشاهده کنید.
اگر سرور استارت نشد، فایل تنظیمات را بررسی کنید `/etc/clickhouse-server/config.xml.`
شما همچنین می توانید سرور را از طریق کنسول راه اندازی کنید:
</div>
```bash
clickhouse-server --config-file=/etc/clickhouse-server/config.xml
```
<div dir="rtl" markdown="1">
در این مورد که مناسب زمان توسعه می باشد، لاگ ها در کنسول پرینت می شوند. اگر فایل تنظیمات در دایرکتوری جاری باشد، نیازی به مشخص کردن '--config-file' نمی باشد. به صورت پیش فرض از './config.xml' استفاده می شود.
شما می توانید از کلاینت command-line برای اتصال به سرور استفاده کنید:
</div>
```bash
clickhouse-client
```
<div dir="rtl" markdown="1">
پارامترهای پیش فرض، نشان از اتصال به localhost:9000 از طرف کاربر 'default' بدون پسورد را می دهد. از کلاینت میتوان برای اتصال به یک سرور remote استفاده کرد. مثال:
</div>
```bash
clickhouse-client --host=example.com
```
<div dir="rtl" markdown="1">
برای اطلاعات بیشتر، بخش "کلاینت Command-line" را مشاهده کنید.
چک کردن سیستم:
</div>
```bash
milovidov@hostname:~/work/metrica/src/dbms/src/Client$ ./clickhouse-client
ClickHouse client version 0.0.18749.
Connecting to localhost:9000.
Connected to ClickHouse server version 0.0.18749.
:) SELECT 1
SELECT 1
┌─1─┐
│ 1 │
└───┘
1 rows in set. Elapsed: 0.003 sec.
:)
```
<div dir="rtl" markdown="1">
**تبریک میگم، سیستم کار می کنه!**
برای ادامه آزمایشات، شما میتوانید دیتاست های تستی را دریافت و امتحان کنید.
</div>
[مقاله اصلی](https://clickhouse.yandex/docs/fa/getting_started/install/) <!--hide-->

View File

@ -0,0 +1 @@
../../en/getting_started/tutorial.md

View File

@ -1,6 +1,6 @@
<div dir="rtl" markdown="1">
# Yandex.Metrica use case
# ClickHouse ﻪﭽﺨﯾﺭﺎﺗ
ClickHouse در ابتدا برای قدرت به Yandex.Metrica دومین بستر آنالیز وب در دنیا توسعه داده شد، و همچنان جز اصلی آن است. ClickHouse اجازه می دهند که با بیش از 13 تریلیون رکورد در دیتابیس و بیش از 20 میلیارد event در روز، گزارش های مستقیم (On the fly) از داده های non-aggregate تهیه کنیم. این مقاله پیشنیه ی تاریخی در ارتباط با اهداف اصلی ClickHouse قبل از آنکه به یک محصول open source تبدیل شود، می دهد.

1
docs/ja/changelog.md Symbolic link
View File

@ -0,0 +1 @@
../../CHANGELOG.md

1
docs/ja/data_types/array.md Symbolic link
View File

@ -0,0 +1 @@
../../en/data_types/array.md

View File

@ -0,0 +1 @@
../../en/data_types/boolean.md

1
docs/ja/data_types/date.md Symbolic link
View File

@ -0,0 +1 @@
../../en/data_types/date.md

View File

@ -0,0 +1 @@
../../en/data_types/datetime.md

View File

@ -0,0 +1 @@
../../en/data_types/decimal.md

View File

@ -0,0 +1 @@
../../../en/data_types/domains/ipv4.md

View File

@ -0,0 +1 @@
../../../en/data_types/domains/ipv6.md

View File

@ -0,0 +1 @@
../../../en/data_types/domains/overview.md

1
docs/ja/data_types/enum.md Symbolic link
View File

@ -0,0 +1 @@
../../en/data_types/enum.md

View File

@ -0,0 +1 @@
../../en/data_types/fixedstring.md

1
docs/ja/data_types/float.md Symbolic link
View File

@ -0,0 +1 @@
../../en/data_types/float.md

1
docs/ja/data_types/index.md Symbolic link
View File

@ -0,0 +1 @@
../../en/data_types/index.md

View File

@ -0,0 +1 @@
../../en/data_types/int_uint.md

View File

@ -0,0 +1 @@
../../../en/data_types/nested_data_structures/aggregatefunction.md

View File

@ -0,0 +1 @@
../../../en/data_types/nested_data_structures/index.md

View File

@ -0,0 +1 @@
../../../en/data_types/nested_data_structures/nested.md

View File

@ -0,0 +1 @@
../../en/data_types/nullable.md

View File

@ -0,0 +1 @@
../../../en/data_types/special_data_types/expression.md

View File

@ -0,0 +1 @@
../../../en/data_types/special_data_types/index.md

View File

@ -0,0 +1 @@
../../../en/data_types/special_data_types/interval.md

View File

@ -0,0 +1 @@
../../../en/data_types/special_data_types/nothing.md

View File

@ -0,0 +1 @@
../../../en/data_types/special_data_types/set.md

View File

@ -0,0 +1 @@
../../en/data_types/string.md

1
docs/ja/data_types/tuple.md Symbolic link
View File

@ -0,0 +1 @@
../../en/data_types/tuple.md

1
docs/ja/data_types/uuid.md Symbolic link
View File

@ -0,0 +1 @@
../../en/data_types/uuid.md

View File

@ -0,0 +1 @@
../../en/database_engines/index.md

View File

@ -0,0 +1 @@
../../en/database_engines/lazy.md

View File

@ -0,0 +1 @@
../../en/database_engines/mysql.md

View File

@ -0,0 +1 @@
../../en/development/architecture.md

View File

@ -0,0 +1 @@
../../en/development/build.md

View File

@ -0,0 +1 @@
../../en/development/build_cross_arm.md

View File

@ -0,0 +1 @@
../../en/development/build_cross_osx.md

View File

@ -0,0 +1 @@
../../en/development/build_osx.md

View File

@ -0,0 +1 @@
../../en/development/contrib.md

View File

@ -0,0 +1 @@
../../en/development/developer_instruction.md

View File

@ -0,0 +1 @@
../../en/development/index.md

View File

@ -0,0 +1 @@
../../en/development/style.md

View File

@ -0,0 +1 @@
../../en/development/tests.md

1
docs/ja/faq/general.md Symbolic link
View File

@ -0,0 +1 @@
../../en/faq/general.md

View File

@ -0,0 +1 @@
../../../en/getting_started/example_datasets/amplab_benchmark.md

View File

@ -0,0 +1 @@
../../../en/getting_started/example_datasets/criteo.md

View File

@ -0,0 +1 @@
../../../en/getting_started/example_datasets/metrica.md

View File

@ -0,0 +1 @@
../../../en/getting_started/example_datasets/nyc_taxi.md

View File

@ -0,0 +1 @@
../../../en/getting_started/example_datasets/ontime.md

View File

@ -0,0 +1 @@
../../../en/getting_started/example_datasets/star_schema.md

View File

@ -0,0 +1 @@
../../../en/getting_started/example_datasets/wikistat.md

View File

@ -0,0 +1 @@
../../en/getting_started/index.md

View File

@ -0,0 +1 @@
../../en/getting_started/install.md

View File

@ -0,0 +1 @@
../../en/getting_started/tutorial.md

View File

@ -0,0 +1 @@
../../en/guides/apply_catboost_model.md

1
docs/ja/guides/index.md Symbolic link
View File

@ -0,0 +1 @@
../../en/guides/index.md

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

12
docs/ja/images/logo.svg Normal file
View File

@ -0,0 +1,12 @@
<svg xmlns="http://www.w3.org/2000/svg" width="54" height="48" viewBox="0 0 9 8">
<style>
.o{fill:#fc0}
.r{fill:#f00}
</style>
<path class="r" d="M0,7 h1 v1 h-1 z"/>
<path class="o" d="M0,0 h1 v7 h-1 z"/>
<path class="o" d="M2,0 h1 v8 h-1 z"/>
<path class="o" d="M4,0 h1 v8 h-1 z"/>
<path class="o" d="M6,0 h1 v8 h-1 z"/>
<path class="o" d="M8,3.25 h1 v1.5 h-1 z"/>
</svg>

After

Width:  |  Height:  |  Size: 421 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

View File

@ -1,142 +0,0 @@
# ClickHouseとは?
ClickHouseは、クエリのオンライン分析処理OLAP用の列指向のデータベース管理システムDBMSです。
「通常の」行指向のDBMSでは、データは次の順序で保存されます。
| Row | WatchID | JavaEnable | Title | GoodEvent | EventTime |
| ------ | ------------------- | ---------- | ------------------ | --------- | ------------------- |
| #0 | 89354350662 | 1 | Investor Relations | 1 | 2016-05-18 05:19:20 |
| #1 | 90329509958 | 0 | Contact us | 1 | 2016-05-18 08:10:20 |
| #2 | 89953706054 | 1 | Mission | 1 | 2016-05-18 07:38:00 |
| #N | ... | ... | ... | ... | ... |
つまり、行に関連するすべての値は物理的に隣り合わせに格納されます。
行指向のDBMSの例MySQL, Postgres および MS SQL Server
{: .grey }
列指向のDBMSでは、データは次のように保存されます
| Row: | #0 | #1 | #2 | #N |
| ----------- | ------------------- | ------------------- | ------------------- | ------------------- |
| WatchID: | 89354350662 | 90329509958 | 89953706054 | ... |
| JavaEnable: | 1 | 0 | 1 | ... |
| Title: | Investor Relations | Contact us | Mission | ... |
| GoodEvent: | 1 | 1 | 1 | ... |
| EventTime: | 2016-05-18 05:19:20 | 2016-05-18 08:10:20 | 2016-05-18 07:38:00 | ... |
これらの例は、データが配置される順序のみを示しています。
異なる列の値は別々に保存され、同じ列のデータは一緒に保存されます。
列指向DBMSの例Vertica, Paraccel (Actian Matrix and Amazon Redshift), Sybase IQ, Exasol, Infobright, InfiniDB, MonetDB (VectorWise and Actian Vector), LucidDB, SAP HANA, Google Dremel, Google PowerDrill, Druid および kdb+
{: .grey }
異なったデータ格納の順序は、異なったシナリオにより適します。
データアクセスシナリオとは、クエリの実行内容、頻度、割合を指します。クエリで読み取られるの各種データの量(行、列、バイト)。データの読み取りと更新の関係。作業データのサイズとローカルでの使用方法。トランザクションが使用されるかどうか、およびそれらがどの程度分離されているか。データ複製と論理的整合性の要件。クエリの種類ごとの遅延とスループットの要件など。
システムの負荷が高いほど、使用シナリオの要件に一致するようにセットアップされたシステムをカスタマイズすることがより重要になり、このカスタマイズはより細かくなります。大きく異なるシナリオに等しく適したシステムはありません。システムがさまざまなシナリオに適応可能である場合、高負荷下では、システムはすべてのシナリオを同等に不十分に処理するか、1つまたはいくつかの可能なシナリオでうまく機能します。
## OLAPシナリオの主要なプロパティ
- リクエストの大部分は読み取りアクセス用である。
- データは、単一行ではなく、かなり大きなバッチ(> 1000行で更新されます。または、まったく更新されない。
- データはDBに追加されるが、変更されない。
- 読み取りの場合、非常に多くの行がDBから抽出されるが、一部の列のみ。
- テーブルは「幅が広く」、多数の列が含まれる。
- クエリは比較的まれ(通常、サーバーあたり毎秒数百あるいはそれ以下の数のクエリ)。
- 単純なクエリでは、約50ミリ秒の遅延が容認される。
- 列の値はかなり小さく、数値や短い文字列たとえば、URLごとに60バイト
- 単一のクエリを処理する場合、高いスループットが必要(サーバーあたり毎秒最大数十億行)。
- トランザクションは必要ない。
- データの一貫性の要件が低い。
- クエリごとに1つの大きなテーブルがある。 1つを除くすべてのテーブルは小さい。
- クエリ結果は、ソースデータよりも大幅に小さくなる。つまり、データはフィルター処理または集計されるため、結果は単一サーバーのRAMに収まる。
OLAPシナリオは、他の一般的なシナリオOLTPやKey-Valueアクセスなどとは非常に異なることが容易にわかります。 したがって、まともなパフォーマンスを得るには、OLTPまたはKey-Value DBを使用して分析クエリを処理しようとするのは無意味です。 たとえば、分析にMongoDBまたはRedisを使用しようとすると、OLAPデータベースに比べてパフォーマンスが非常に低下します。
## OLAPシナリオで列指向データベースがよりよく機能する理由
列指向データベースは、OLAPシナリオにより適しています。ほとんどのクエリの処理が少なくとも100倍高速です。 理由を以下に詳しく説明しますが、その根拠は視覚的に簡単に説明できます:
**行指向DBMS**
![Row-oriented](images/row_oriented.gif#)
**列指向DBMS**
![Column-oriented](images/column_oriented.gif#)
違いがわかりましたか?
### Input/output
1. 分析クエリでは、少数のテーブル列のみを読み取る必要があります。列指向のデータベースでは、必要なデータのみを読み取ることができます。たとえば、100のうち5つの列が必要な場合、I/Oが20倍削減されることが期待できます。
2. データはパケットで読み取られるため、圧縮が容易です。列のデータも圧縮が簡単です。これにより、I/Oボリュームがさらに削減されます。
3. I/Oの削減により、より多くのデータがシステムキャッシュに収まります。
たとえば、「各広告プラットフォームのレコード数をカウントする」クエリでは、1つの「広告プラットフォームID」列を読み取る必要がありますが、これは非圧縮では1バイトの領域を要します。トラフィックのほとんどが広告プラットフォームからのものではない場合、この列は少なくとも10倍の圧縮が期待できます。高速な圧縮アルゴリズムを使用すれば、1秒あたり少なくとも非圧縮データに換算して数ギガバイトの速度でデータを展開できます。つまり、このクエリは、単一のサーバーで1秒あたり約数十億行の速度で処理できます。この速度はまさに実際に達成されます。
<details markdown="1"><summary>Example</summary>
```
$ clickhouse-client
ClickHouse client version 0.0.52053.
Connecting to localhost:9000.
Connected to ClickHouse server version 0.0.52053.
:) SELECT CounterID, count() FROM hits GROUP BY CounterID ORDER BY count() DESC LIMIT 20
SELECT
CounterID,
count()
FROM hits
GROUP BY CounterID
ORDER BY count() DESC
LIMIT 20
┌─CounterID─┬──count()─┐
│ 114208 │ 56057344 │
│ 115080 │ 51619590 │
│ 3228 │ 44658301 │
│ 38230 │ 42045932 │
│ 145263 │ 42042158 │
│ 91244 │ 38297270 │
│ 154139 │ 26647572 │
│ 150748 │ 24112755 │
│ 242232 │ 21302571 │
│ 338158 │ 13507087 │
│ 62180 │ 12229491 │
│ 82264 │ 12187441 │
│ 232261 │ 12148031 │
│ 146272 │ 11438516 │
│ 168777 │ 11403636 │
│ 4120072 │ 11227824 │
│ 10938808 │ 10519739 │
│ 74088 │ 9047015 │
│ 115079 │ 8837972 │
│ 337234 │ 8205961 │
└───────────┴──────────┘
20 rows in set. Elapsed: 0.153 sec. Processed 1.00 billion rows, 4.00 GB (6.53 billion rows/s., 26.10 GB/s.)
:)
```
</details>
### CPU
クエリを実行するには大量の行を処理する必要があるため、個別の行ではなくベクター全体のすべての操作をディスパッチするか、ディスパッチコストがほとんどないようにクエリエンジンを実装すると効率的です。 適切なディスクサブシステムでこれを行わないと、クエリインタープリターが必然的にCPUを失速させます。
データを列に格納し、可能な場合は列ごとに処理することは理にかなっています。
これを行うには2つの方法があります:
1. ベクトルエンジン。 すべての操作は、個別の値ではなく、ベクトルに対して記述されます。 これは、オペレーションを頻繁に呼び出す必要がなく、ディスパッチコストが無視できることを意味します。 操作コードには、最適化された内部サイクルが含まれています。
2. コード生成。 クエリ用に生成されたコードには、すべての間接的な呼び出しが含まれています。
これは、単純なクエリを実行する場合には意味がないため、「通常の」データベースでは実行されません。 ただし、例外があります。 たとえば、MemSQLはコード生成を使用して、SQLクエリを処理する際の遅延を減らします。 比較のために、分析DBMSではレイテンシではなくスループットの最適化が必要です。
CPU効率のために、クエリ言語は宣言型SQLまたはMDX、または少なくともベクトルJ、Kでなければなりません。 クエリには、最適化を可能にする暗黙的なループのみを含める必要があります。
[Original article](https://clickhouse.yandex/docs/en/) <!--hide-->

1
docs/ja/index.md Symbolic link
View File

@ -0,0 +1 @@
../en/index.md

1
docs/ja/interfaces/cli.md Symbolic link
View File

@ -0,0 +1 @@
../../en/interfaces/cli.md

1
docs/ja/interfaces/cpp.md Symbolic link
View File

@ -0,0 +1 @@
../../en/interfaces/cpp.md

View File

@ -0,0 +1 @@
../../en/interfaces/formats.md

1
docs/ja/interfaces/http.md Symbolic link
View File

@ -0,0 +1 @@
../../en/interfaces/http.md

1
docs/ja/interfaces/index.md Symbolic link
View File

@ -0,0 +1 @@
../../en/interfaces/index.md

1
docs/ja/interfaces/jdbc.md Symbolic link
View File

@ -0,0 +1 @@
../../en/interfaces/jdbc.md

1
docs/ja/interfaces/odbc.md Symbolic link
View File

@ -0,0 +1 @@
../../en/interfaces/odbc.md

1
docs/ja/interfaces/tcp.md Symbolic link
View File

@ -0,0 +1 @@
../../en/interfaces/tcp.md

View File

@ -0,0 +1 @@
../../../en/interfaces/third-party/client_libraries.md

1
docs/ja/interfaces/third-party/gui.md vendored Symbolic link
View File

@ -0,0 +1 @@
../../../en/interfaces/third-party/gui.md

View File

@ -0,0 +1 @@
../../../en/interfaces/third-party/integrations.md

1
docs/ja/interfaces/third-party/proxy.md vendored Symbolic link
View File

@ -0,0 +1 @@
../../../en/interfaces/third-party/proxy.md

View File

@ -0,0 +1 @@
../../en/introduction/distinctive_features.md

View File

@ -0,0 +1 @@
../../en/introduction/features_considered_disadvantages.md

View File

@ -0,0 +1 @@
../../en/introduction/history.md

View File

@ -0,0 +1 @@
../../en/introduction/performance.md

View File

@ -0,0 +1 @@
../../en/operations/access_rights.md

View File

@ -0,0 +1 @@
../../en/operations/backup.md

View File

@ -0,0 +1 @@
../../en/operations/configuration_files.md

1
docs/ja/operations/index.md Symbolic link
View File

@ -0,0 +1 @@
../../en/operations/index.md

View File

@ -0,0 +1 @@
../../en/operations/monitoring.md

View File

@ -0,0 +1 @@
../../en/operations/quotas.md

View File

@ -0,0 +1 @@
../../en/operations/requirements.md

View File

@ -0,0 +1 @@
../../../en/operations/server_settings/index.md

View File

@ -0,0 +1 @@
../../../en/operations/server_settings/settings.md

View File

@ -0,0 +1 @@
../../../en/operations/settings/constraints_on_settings.md

View File

@ -0,0 +1 @@
../../../en/operations/settings/index.md

View File

@ -0,0 +1 @@
../../../en/operations/settings/permissions_for_queries.md

View File

@ -0,0 +1 @@
../../../en/operations/settings/query_complexity.md

View File

@ -0,0 +1 @@
../../../en/operations/settings/settings.md

View File

@ -0,0 +1 @@
../../../en/operations/settings/settings_profiles.md

View File

@ -0,0 +1 @@
../../../en/operations/settings/settings_users.md

View File

@ -0,0 +1 @@
../../en/operations/system_tables.md

View File

@ -0,0 +1 @@
../../../en/operations/table_engines/aggregatingmergetree.md

Some files were not shown because too many files have changed in this diff Show More