diff --git a/docs/en/getting-started/example-datasets/menus.md b/docs/en/getting-started/example-datasets/menus.md index 14202500595..665944b3e6f 100644 --- a/docs/en/getting-started/example-datasets/menus.md +++ b/docs/en/getting-started/example-datasets/menus.md @@ -3,7 +3,7 @@ toc_priority: 21 toc_title: Menus --- -# New York Public Library "What's on the Menu?" Dataset +# New York Public Library "What's on the Menu?" Dataset {#menus-dataset} The dataset is created by the New York Public Library. It contains historical data on the menus of hotels, restaurants and cafes with the dishes along with their prices. @@ -11,34 +11,38 @@ Source: http://menus.nypl.org/data The data is in public domain. The data is from library's archive and it may be incomplete and difficult for statistical analysis. Nevertheless it is also very yummy. -The size is just 1.3 million records about dishes in the menus (a very small data volume for ClickHouse, but it's still a good example). +The size is just 1.3 million records about dishes in the menus — it's a very small data volume for ClickHouse, but it's still a good example. -## Download the Dataset +## Download the Dataset {#download-dataset} -``` +Run the command: + +```bash wget https://s3.amazonaws.com/menusdata.nypl.org/gzips/2021_08_01_07_01_17_data.tgz ``` Replace the link to the up to date link from http://menus.nypl.org/data if needed. Download size is about 35 MB. -## Unpack the Dataset +## Unpack the Dataset {#unpack-dataset} -``` +```bash tar xvf 2021_08_01_07_01_17_data.tgz ``` Uncompressed size is about 150 MB. The data is normalized consisted of four tables: -- Menu: information about menus: the name of the restaurant, the date when menu was seen, etc; -- Dish: information about dishes: the name of the dish along with some characteristic; -- MenuPage: information about the pages in the menus; every page belongs to some menu; -- MenuItem: an item of the menu - a dish along with its price on some menu page: links to dish and menu page. +- `Menu` — Information about menus: the name of the restaurant, the date when menu was seen, etc. +- `Dish` — Information about dishes: the name of the dish along with some characteristic. +- `MenuPage` — Information about the pages in the menus, because every page belongs to some menu. +- `MenuItem` — An item of the menu. A dish along with its price on some menu page: links to dish and menu page. -## Create the Tables +## Create the Tables {#create-tables} -``` +We use [Decimal](../../sql-reference/data-types/decimal.md) data type to store prices. + +```sql CREATE TABLE dish ( id UInt32, @@ -101,35 +105,33 @@ CREATE TABLE menu_item ) ENGINE = MergeTree ORDER BY id; ``` -We use `Decimal` data type to store prices. Everything else is quite straightforward. +## Import the Data {#import-data} -## Import Data +Upload data into ClickHouse, run: -Upload data into ClickHouse: - -``` +```bash clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO dish FORMAT CSVWithNames" < Dish.csv clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO menu FORMAT CSVWithNames" < Menu.csv clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO menu_page FORMAT CSVWithNames" < MenuPage.csv clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --date_time_input_format best_effort --query "INSERT INTO menu_item FORMAT CSVWithNames" < MenuItem.csv ``` -We use `CSVWithNames` format as the data is represented by CSV with header. +We use [CSVWithNames](../../interfaces/formats.md#csvwithnames) format as the data is represented by CSV with header. We disable `format_csv_allow_single_quotes` as only double quotes are used for data fields and single quotes can be inside the values and should not confuse the CSV parser. -We disable `input_format_null_as_default` as our data does not have NULLs. Otherwise ClickHouse will try to parse `\N` sequences and can be confused with `\` in data. +We disable [input_format_null_as_default](../../operations/settings/settings.md#settings-input-format-null-as-default) as our data does not have [NULL](../../sql-reference/syntax.md#null-literal). Otherwise ClickHouse will try to parse `\N` sequences and can be confused with `\` in data. -The setting `--date_time_input_format best_effort` allows to parse `DateTime` fields in wide variety of formats. For example, ISO-8601 without seconds like '2000-01-01 01:02' will be recognized. Without this setting only fixed DateTime format is allowed. +The setting [date_time_input_format best_effort](../../operations/settings/settings.md#settings-date_time_input_format) allows to parse [DateTime](../../sql-reference/data-types/datetime.md) fields in wide variety of formats. For example, ISO-8601 without seconds like '2000-01-01 01:02' will be recognized. Without this setting only fixed DateTime format is allowed. -## Denormalize the Data +## Denormalize the Data {#denormalize-data} -Data is presented in multiple tables in normalized form. It means you have to perform JOINs if you want to query, e.g. dish names from menu items. -For typical analytical tasks it is way more efficient to deal with pre-JOINed data to avoid doing JOIN every time. It is called "denormalized" data. +Data is presented in multiple tables in [normalized form](https://en.wikipedia.org/wiki/Database_normalization#Normal_forms). It means you have to perform [JOIN](../../sql-reference/statements/select/join.md#select-join) if you want to query, e.g. dish names from menu items. +For typical analytical tasks it is way more efficient to deal with pre-JOINed data to avoid doing `JOIN` every time. It is called "denormalized" data. -We will create a table that will contain all the data JOINed together: +We will create a table `menu_item_denorm` where will contain all the data JOINed together: -``` +```sql CREATE TABLE menu_item_denorm ENGINE = MergeTree ORDER BY (dish_name, created_at) AS SELECT @@ -171,21 +173,32 @@ AS SELECT FROM menu_item JOIN dish ON menu_item.dish_id = dish.id JOIN menu_page ON menu_item.menu_page_id = menu_page.id - JOIN menu ON menu_page.menu_id = menu.id + JOIN menu ON menu_page.menu_id = menu.id; ``` -## Validate the Data +## Validate the Data {#validate-data} -``` -SELECT count() FROM menu_item_denorm -1329175 +Query: + +```sql +SELECT count() FROM menu_item_denorm; ``` -## Run Some Queries - -Averaged historical prices of dishes: +Result: +```text +┌─count()─┐ +│ 1329175 │ +└─────────┘ ``` + +## Run Some Queries {#run-queries} + +### Averaged historical prices of dishes {#query-averaged-historical-prices} + +Query: + +```sql SELECT round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d, count(), @@ -194,8 +207,12 @@ SELECT FROM menu_item_denorm WHERE (menu_currency = 'Dollars') AND (d > 0) AND (d < 2022) GROUP BY d -ORDER BY d ASC +ORDER BY d ASC; +``` +Result: + +```text ┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 100, 100)─┐ │ 1850 │ 618 │ 1.5 │ █▍ │ │ 1860 │ 1634 │ 1.29 │ █▎ │ @@ -215,15 +232,15 @@ ORDER BY d ASC │ 2000 │ 2467 │ 11.85 │ ███████████▋ │ │ 2010 │ 597 │ 25.66 │ █████████████████████████▋ │ └──────┴─────────┴──────────────────────┴──────────────────────────────┘ - -17 rows in set. Elapsed: 0.044 sec. Processed 1.33 million rows, 54.62 MB (30.00 million rows/s., 1.23 GB/s.) ``` Take it with a grain of salt. -### Burger Prices: +### Burger Prices {#query-burger-prices} -``` +Query: + +```sql SELECT round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d, count(), @@ -232,8 +249,12 @@ SELECT FROM menu_item_denorm WHERE (menu_currency = 'Dollars') AND (d > 0) AND (d < 2022) AND (dish_name ILIKE '%burger%') GROUP BY d -ORDER BY d ASC +ORDER BY d ASC; +``` +Result: + +```text ┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 50, 100)───────────┐ │ 1880 │ 2 │ 0.42 │ ▋ │ │ 1890 │ 7 │ 0.85 │ █▋ │ @@ -250,13 +271,13 @@ ORDER BY d ASC │ 2000 │ 21 │ 7.14 │ ██████████████▎ │ │ 2010 │ 6 │ 18.42 │ ████████████████████████████████████▋ │ └──────┴─────────┴──────────────────────┴───────────────────────────────────────┘ - -14 rows in set. Elapsed: 0.052 sec. Processed 1.33 million rows, 94.15 MB (25.48 million rows/s., 1.80 GB/s.) ``` -### Vodka: +### Vodka {#query-vodka} -``` +Query: + +```sql SELECT round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d, count(), @@ -265,8 +286,12 @@ SELECT FROM menu_item_denorm WHERE (menu_currency IN ('Dollars', '')) AND (d > 0) AND (d < 2022) AND (dish_name ILIKE '%vodka%') GROUP BY d -ORDER BY d ASC +ORDER BY d ASC; +``` +Result: + +```text ┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 50, 100)─┐ │ 1910 │ 2 │ 0 │ │ │ 1920 │ 1 │ 0.3 │ ▌ │ @@ -282,11 +307,13 @@ ORDER BY d ASC To get vodka we have to write `ILIKE '%vodka%'` and this definitely makes a statement. -### Caviar: +### Caviar {#query-caviar} Let's print caviar prices. Also let's print a name of any dish with caviar. -``` +Query: + +```sql SELECT round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d, count(), @@ -296,8 +323,12 @@ SELECT FROM menu_item_denorm WHERE (menu_currency IN ('Dollars', '')) AND (d > 0) AND (d < 2022) AND (dish_name ILIKE '%caviar%') GROUP BY d -ORDER BY d ASC +ORDER BY d ASC; +``` +Result: + +```text ┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 50, 100)──────┬─any(dish_name)──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ 1090 │ 1 │ 0 │ │ Caviar │ │ 1880 │ 3 │ 0 │ │ Caviar │ @@ -319,6 +350,6 @@ ORDER BY d ASC At least they have caviar with vodka. Very nice. -### Test it in Playground +## Online Playground {#playground} The data is uploaded to ClickHouse Playground, [example](https://gh-api.clickhouse.com/play?user=play#U0VMRUNUCiAgICByb3VuZCh0b1VJbnQzMk9yWmVybyhleHRyYWN0KG1lbnVfZGF0ZSwgJ15cXGR7NH0nKSksIC0xKSBBUyBkLAogICAgY291bnQoKSwKICAgIHJvdW5kKGF2ZyhwcmljZSksIDIpLAogICAgYmFyKGF2ZyhwcmljZSksIDAsIDUwLCAxMDApLAogICAgYW55KGRpc2hfbmFtZSkKRlJPTSBtZW51X2l0ZW1fZGVub3JtCldIRVJFIChtZW51X2N1cnJlbmN5IElOICgnRG9sbGFycycsICcnKSkgQU5EIChkID4gMCkgQU5EIChkIDwgMjAyMikgQU5EIChkaXNoX25hbWUgSUxJS0UgJyVjYXZpYXIlJykKR1JPVVAgQlkgZApPUkRFUiBCWSBkIEFTQw==). diff --git a/docs/en/interfaces/formats.md b/docs/en/interfaces/formats.md index 6e06fc1f06a..07104311b44 100644 --- a/docs/en/interfaces/formats.md +++ b/docs/en/interfaces/formats.md @@ -386,7 +386,7 @@ The CSV format supports the output of totals and extremes the same way as `TabSe ## CSVWithNames {#csvwithnames} -Also prints the header row, similar to `TabSeparatedWithNames`. +Also prints the header row, similar to [TabSeparatedWithNames](#tabseparatedwithnames). ## CustomSeparated {#format-customseparated} diff --git a/docs/en/operations/server-configuration-parameters/settings.md b/docs/en/operations/server-configuration-parameters/settings.md index 3d299e21019..eec034e2d42 100644 --- a/docs/en/operations/server-configuration-parameters/settings.md +++ b/docs/en/operations/server-configuration-parameters/settings.md @@ -45,7 +45,7 @@ Configuration template: - `min_part_size` – The minimum size of a data part. - `min_part_size_ratio` – The ratio of the data part size to the table size. - `method` – Compression method. Acceptable values: `lz4`, `lz4hc`, `zstd`. -- `level` – Compression level. See [Codecs](../../sql-reference/statements/create/table/#create-query-general-purpose-codecs). +- `level` – Compression level. See [Codecs](../../sql-reference/statements/create/table.md#create-query-general-purpose-codecs). You can configure multiple `` sections. diff --git a/docs/en/sql-reference/data-types/datetime64.md b/docs/en/sql-reference/data-types/datetime64.md index 112461535f7..bac1a080a80 100644 --- a/docs/en/sql-reference/data-types/datetime64.md +++ b/docs/en/sql-reference/data-types/datetime64.md @@ -7,7 +7,8 @@ toc_title: DateTime64 Allows to store an instant in time, that can be expressed as a calendar date and a time of a day, with defined sub-second precision -Tick size (precision): 10-precision seconds +Tick size (precision): 10-precision seconds. Valid range: [ 0 : 9 ]. +Typically are used - 3 (milliseconds), 6 (microseconds), 9 (nanoseconds). **Syntax:** diff --git a/docs/en/sql-reference/functions/geo/s2.md b/docs/en/sql-reference/functions/geo/s2.md new file mode 100644 index 00000000000..d669b1c8b32 --- /dev/null +++ b/docs/en/sql-reference/functions/geo/s2.md @@ -0,0 +1,386 @@ +--- +toc_title: S2 Geometry +--- + +# Functions for Working with S2 Index {#s2Index} + +[S2](https://s2geometry.io/) is a geographical indexing system where all geographical data is represented on a three-dimensional sphere (similar to a globe). + +In the S2 library points are represented as unit length vectors called S2 point indices (points on the surface of a three dimensional unit sphere) as opposed to traditional (latitude, longitude) pairs. + +## geoToS2 {#geoToS2} + +Returns [S2](#s2index) point index corresponding to the provided coordinates `(longitude, latitude)`. + +**Syntax** + +``` sql +geoToS2(lon, lat) +``` + +**Arguments** + +- `lon` — Longitude. [Float64](../../../sql-reference/data-types/float.md). +- `lat` — Latitude. [Float64](../../../sql-reference/data-types/float.md). + +**Returned values** + +- S2 point index. + +Type: [UInt64](../../../sql-reference/data-types/int-uint.md). + +**Example** + +Query: + +``` sql +SELECT geoToS2(37.79506683, 55.71290588) as s2Index; +``` + +Result: + +``` text +┌─────────────s2Index─┐ +│ 4704772434919038107 │ +└─────────────────────┘ +``` + +## s2ToGeo {#s2ToGeo} + +Returns geo coordinates `(longitude, latitude)` corresponding to the provided [S2](#s2index) point index. + +**Syntax** + +``` sql +s2ToGeo(s2index) +``` + +**Arguments** + +- `s2Index` — S2 Index. [UInt64](../../../sql-reference/data-types/int-uint.md). + +**Returned values** + +- A tuple consisting of two values: `tuple(lon,lat)`. + +Type: `lon` - [Float64](../../../sql-reference/data-types/float.md). `lat` — [Float64](../../../sql-reference/data-types/float.md). + +**Example** + +Query: + +``` sql +SELECT s2ToGeo(4704772434919038107) as s2Coodrinates; +``` + +Result: + +``` text +┌─s2Coodrinates────────────────────────┐ +│ (37.79506681471008,55.7129059052841) │ +└──────────────────────────────────────┘ +``` + +## s2GetNeighbors {#s2GetNeighbors} + +Returns S2 neighbor indices corresponding to the provided [S2](#s2index)). Each cell in the S2 system is a quadrilateral bounded by four geodesics. So, each cell has 4 neighbors. + +**Syntax** + +``` sql +s2GetNeighbors(s2index) +``` + +**Arguments** + +- `s2index` — S2 Index. [UInt64](../../../sql-reference/data-types/int-uint.md). + +**Returned values** + +- An array consisting of the 4 neighbor indices: `array[s2index1, s2index3, s2index2, s2index4]`. + +Type: Each S2 index is [UInt64](../../../sql-reference/data-types/int-uint.md). + +**Example** + +Query: + +``` sql + select s2GetNeighbors(5074766849661468672) AS s2Neighbors; +``` + +Result: + +``` text +┌─s2Neighbors───────────────────────────────────────────────────────────────────────┐ +│ [5074766987100422144,5074766712222515200,5074767536856236032,5074767261978329088] │ +└───────────────────────────────────────────────────────────────────────────────────┘ +``` + +## s2CellsIntersect {#s2CellsIntersect} + +Determines if the two provided [S2](#s2index)) cell indices intersect or not. + +**Syntax** + +``` sql +s2CellsIntersect(s2index1, s2index2) +``` + +**Arguments** + +- `siIndex1`, `s2index2` — S2 Index. [UInt64](../../../sql-reference/data-types/int-uint.md). + +**Returned values** + +- 1 — If the S2 cell indices intersect. +- 0 — If the S2 cell indices don't intersect. + +Type: [UInt8](../../../sql-reference/data-types/int-uint.md). + +**Example** + +Query: + +``` sql + select s2CellsIntersect(9926595209846587392, 9926594385212866560) as intersect; +``` + +Result: + +``` text +┌─intersect─┐ +│ 1 │ +└───────────┘ +``` + +## s2CapContains {#s2CapContains} + +A cap represents a portion of the sphere that has been cut off by a plane. It is defined by a point on a sphere and a radius in degrees. + +Determines if a cap contains a s2 point index. + +**Syntax** + +``` sql +s2CapContains(center, degrees, point) +``` + +**Arguments** + +- `center` - S2 point index corresponding to the cap. [UInt64](../../../sql-reference/data-types/int-uint.md). + - `degrees` - Radius of the cap in degrees. [Float64](../../../sql-reference/data-types/float.md). + - `point` - S2 point index. [UInt64](../../../sql-reference/data-types/int-uint.md). + +**Returned values** + +- 1 — If the cap contains the S2 point index. +- 0 — If the cap doesn't contain the S2 point index. + +Type: [UInt8](../../../sql-reference/data-types/int-uint.md). + +**Example** + +Query: + +``` sql +select s2CapContains(1157339245694594829, 1.0, 1157347770437378819) as capContains; +``` + +Result: + +``` text +┌─capContains─┐ +│ 1 │ +└─────────────┘ +``` + +## s2CapUnion {#s2CapUnion} + +A cap represents a portion of the sphere that has been cut off by a plane. It is defined by a point on a sphere and a radius in degrees. + +Determines the smallest cap that contains the given two input caps. + +**Syntax** + +``` sql +s2CapUnion(center1, radius1, center2, radius2) +``` + +**Arguments** + +- `center1`, `center2` - S2 point indices corresponding to the two input caps. [UInt64](../../../sql-reference/data-types/int-uint.md). + - `radius1`, `radius2` - Radii of the two input caps in degrees. [Float64](../../../sql-reference/data-types/float.md). + +**Returned values** + +- `center` - S2 point index corresponding the center of the smallest cap containing the two input caps. Type: [UInt64](../../../sql-reference/data-types/int-uint.md). + - `radius` - Radius of the smallest cap containing the two input caps. Type: [Float64](../../../sql-reference/data-types/float.md). + +**Example** + +Query: + +``` sql +SELECT s2CapUnion(3814912406305146967, 1.0, 1157347770437378819, 1.0) AS capUnion; +``` + +Result: + +``` text +┌─capUnion───────────────────────────────┐ +│ (4534655147792050737,60.2088283994957) │ +└────────────────────────────────────────┘ +``` + +## s2RectAdd{#s2RectAdd} + +In the S2 system, a rectangle is represented by a type of S2Region called a S2LatLngRect that represents a rectangle in latitude-longitude space. + +Increases the size of the bounding rectangle to include the given S2 point index. + +**Syntax** + +``` sql +s2RectAdd(s2pointLow, s2pointHigh, s2Point) +``` + +**Arguments** + +- `s2PointLow` - Low S2 point index corresponding to the rectangle. [UInt64](../../../sql-reference/data-types/int-uint.md). +- `s2PointHigh` - High S2 point index corresponding to the rectangle. [UInt64](../../../sql-reference/data-types/int-uint.md). +- `s2Point` - Target S2 point index that the bound rectangle should be grown to include. [UInt64](../../../sql-reference/data-types/int-uint.md). + +**Returned values** + +- `s2PointLow` - Low S2 cell id corresponding to the grown rectangle. Type: [UInt64](../../../sql-reference/data-types/int-uint.md). + - `s2PointHigh` - Hight S2 cell id corresponding to the grown rectangle. Type: [UInt64](../../../sql-reference/data-types/float.md). + +**Example** + +Query: + +``` sql +SELECT s2RectAdd(5178914411069187297, 5177056748191934217, 5179056748191934217) as rectAdd; +``` + +Result: + +``` text +┌─rectAdd───────────────────────────────────┐ +│ (5179062030687166815,5177056748191934217) │ +└───────────────────────────────────────────┘ +``` + +## s2RectContains{#s2RectContains} + +In the S2 system, a rectangle is represented by a type of S2Region called a S2LatLngRect that represents a rectangle in latitude-longitude space. + +Determines if a given rectangle contains a S2 point index. + +**Syntax** + +``` sql +s2RectContains(s2PointLow, s2PointHi, s2Point) +``` + +**Arguments** + +- `s2PointLow` - Low S2 point index corresponding to the rectangle. [UInt64](../../../sql-reference/data-types/int-uint.md). +- `s2PointHigh` - High S2 point index corresponding to the rectangle. [UInt64](../../../sql-reference/data-types/int-uint.md). +- `s2Point` - Target S2 point index. [UInt64](../../../sql-reference/data-types/int-uint.md). + +**Returned values** + +- 1 — If the rectangle contains the given S2 point. +- 0 — If the rectangle doesn't contain the given S2 point. + +**Example** + +Query: + +``` sql +SELECT s2RectContains(5179062030687166815, 5177056748191934217, 5177914411069187297) AS rectContains +``` + +Result: + +``` text +┌─rectContains─┐ +│ 0 │ +└──────────────┘ +``` + +## s2RectUinion{#s2RectUnion} + +In the S2 system, a rectangle is represented by a type of S2Region called a S2LatLngRect that represents a rectangle in latitude-longitude space. + +Returns the smallest rectangle containing the union of this rectangle and the given rectangle. + +**Syntax** + +``` sql +s2RectUnion(s2Rect1PointLow, s2Rect1PointHi, s2Rect2PointLow, s2Rect2PointHi) +``` + +**Arguments** + +- `s2Rect1PointLow`, `s2Rect1PointHi` - Low and High S2 point indices corresponding to the first rectangle. [UInt64](../../../sql-reference/data-types/int-uint.md). +- `s2Rect2PointLow`, `s2Rect2PointHi` - Low and High S2 point indices corresponding to the second rectangle. [UInt64](../../../sql-reference/data-types/int-uint.md). + +**Returned values** + +- `s2UnionRect2PointLow` - Low S2 cell id corresponding to the union rectangle. Type: [UInt64](../../../sql-reference/data-types/int-uint.md). +- `s2UnionRect2PointHi` - High S2 cell id corresponding to the union rectangle. Type: [UInt64](../../../sql-reference/data-types/int-uint.md). + +**Example** + +Query: + +``` sql +SELECT s2RectUnion(5178914411069187297, 5177056748191934217, 5179062030687166815, 5177056748191934217) AS rectUnion +``` + +Result: + +``` text +┌─rectUnion─────────────────────────────────┐ +│ (5179062030687166815,5177056748191934217) │ +└───────────────────────────────────────────┘ +``` + +## s2RectIntersection{#s2RectIntersection} + +Returns the smallest Rectangle containing the intersection of this rectangle and the given rectangle. + +**Syntax** + +``` sql +s2RectIntersection(s2Rect1PointLow, s2Rect1PointHi, s2Rect2PointLow, s2Rect2PointHi) +``` + +**Arguments** + +- `s2Rect1PointLow`, `s2Rect1PointHi` - Low and High S2 point indices corresponding to the first rectangle. [UInt64](../../../sql-reference/data-types/int-uint.md). +- `s2Rect2PointLow`, `s2Rect2PointHi` - Low and High S2 point indices corresponding to the second rectangle. [UInt64](../../../sql-reference/data-types/int-uint.md). + +**Returned values** + +- `s2UnionRect2PointLow` - Low S2 cell id corresponding to the rectangle containing the intersection of the given rectangles. Type: [UInt64](../../../sql-reference/data-types/int-uint.md). +- `s2UnionRect2PointHi` - Hi S2 cell id corresponding to the rectangle containing the intersection of the given rectangles. Type: [UInt64](../../../sql-reference/data-types/int-uint.md). + +**Example** + +Query: + +``` sql +SELECT s2RectIntersection(5178914411069187297, 5177056748191934217, 5179062030687166815, 5177056748191934217) AS rectIntersection +``` + +Result: + +``` text +┌─rectIntersection──────────────────────────┐ +│ (5178914411069187297,5177056748191934217) │ +└───────────────────────────────────────────┘ +``` diff --git a/docs/ru/getting-started/example-datasets/menus.md b/docs/ru/getting-started/example-datasets/menus.md deleted file mode 120000 index d82b3198962..00000000000 --- a/docs/ru/getting-started/example-datasets/menus.md +++ /dev/null @@ -1 +0,0 @@ -../../../en/getting-started/example-datasets/menus.md \ No newline at end of file diff --git a/docs/ru/getting-started/example-datasets/menus.md b/docs/ru/getting-started/example-datasets/menus.md new file mode 100644 index 00000000000..5b549975b8f --- /dev/null +++ b/docs/ru/getting-started/example-datasets/menus.md @@ -0,0 +1,360 @@ +--- +toc_priority: 21 +toc_title: Меню +--- + +# Набор данных публичной библиотеки Нью-Йорка "Что в меню?" {#menus-dataset} + +Набор данных создан Нью-Йоркской публичной библиотекой. Он содержит исторические данные о меню отелей, ресторанов и кафе с блюдами, а также их ценами. + +Источник: http://menus.nypl.org/data +Эти данные находятся в открытом доступе. + +Данные взяты из архива библиотеки, и они могут быть неполными и сложными для статистического анализа. Тем не менее, это тоже очень интересно. +В наборе всего 1,3 миллиона записей о блюдах в меню — очень небольшой объем данных для ClickHouse, но это все равно хороший пример. + +## Загрузите набор данных {#download-dataset} + +Выполните команду: + +```bash +wget https://s3.amazonaws.com/menusdata.nypl.org/gzips/2021_08_01_07_01_17_data.tgz +``` + +При необходимости замените ссылку на актуальную ссылку с http://menus.nypl.org/data. +Размер архива составляет около 35 МБ. + +## Распакуйте набор данных {#unpack-dataset} + +```bash +tar xvf 2021_08_01_07_01_17_data.tgz +``` + +Размер распакованных данных составляет около 150 МБ. + +Данные нормализованы и состоят из четырех таблиц: +- `Menu` — информация о меню: название ресторана, дата, когда было просмотрено меню, и т.д. +- `Dish` — информация о блюдах: название блюда вместе с некоторыми характеристиками. +- `MenuPage` — информация о страницах в меню, потому что каждая страница принадлежит какому-либо меню. +- `MenuItem` — один из пунктов меню. Блюдо вместе с его ценой на какой-либо странице меню: ссылки на блюдо и страницу меню. + +## Создайте таблицы {#create-tables} + +Для хранения цен используется тип данных [Decimal](../../sql-reference/data-types/decimal.md). + +```sql +CREATE TABLE dish +( + id UInt32, + name String, + description String, + menus_appeared UInt32, + times_appeared Int32, + first_appeared UInt16, + last_appeared UInt16, + lowest_price Decimal64(3), + highest_price Decimal64(3) +) ENGINE = MergeTree ORDER BY id; + +CREATE TABLE menu +( + id UInt32, + name String, + sponsor String, + event String, + venue String, + place String, + physical_description String, + occasion String, + notes String, + call_number String, + keywords String, + language String, + date String, + location String, + location_type String, + currency String, + currency_symbol String, + status String, + page_count UInt16, + dish_count UInt16 +) ENGINE = MergeTree ORDER BY id; + +CREATE TABLE menu_page +( + id UInt32, + menu_id UInt32, + page_number UInt16, + image_id String, + full_height UInt16, + full_width UInt16, + uuid UUID +) ENGINE = MergeTree ORDER BY id; + +CREATE TABLE menu_item +( + id UInt32, + menu_page_id UInt32, + price Decimal64(3), + high_price Decimal64(3), + dish_id UInt32, + created_at DateTime, + updated_at DateTime, + xpos Float64, + ypos Float64 +) ENGINE = MergeTree ORDER BY id; +``` + +## Импортируйте данные {#import-data} + +Импортируйте данные в ClickHouse, выполните команды: + +```bash +clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO dish FORMAT CSVWithNames" < Dish.csv +clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO menu FORMAT CSVWithNames" < Menu.csv +clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --query "INSERT INTO menu_page FORMAT CSVWithNames" < MenuPage.csv +clickhouse-client --format_csv_allow_single_quotes 0 --input_format_null_as_default 0 --date_time_input_format best_effort --query "INSERT INTO menu_item FORMAT CSVWithNames" < MenuItem.csv +``` + +Поскольку данные представлены в формате CSV с заголовком, используется формат [CSVWithNames](../../interfaces/formats.md#csvwithnames). + +Отключите `format_csv_allow_single_quotes`, так как для данных используются только двойные кавычки, а одинарные кавычки могут находиться внутри значений и не должны сбивать с толку CSV-парсер. + +Отключите [input_format_null_as_default](../../operations/settings/settings.md#settings-input-format-null-as-default), поскольку в данных нет значений [NULL](../../sql-reference/syntax.md#null-literal). + +В противном случае ClickHouse попытается проанализировать последовательности `\N` и может перепутать с `\` в данных. + +Настройка [date_time_input_format best_effort](../../operations/settings/settings.md#settings-date_time_input_format) позволяет анализировать поля [DateTime](../../sql-reference/data-types/datetime.md) в самых разных форматах. К примеру, будет распознан ISO-8601 без секунд: '2000-01-01 01:02'. Без этой настройки допускается только фиксированный формат даты и времени. + +## Денормализуйте данные {#denormalize-data} + +Данные представлены в нескольких таблицах в [нормализованном виде](https://ru.wikipedia.org/wiki/%D0%9D%D0%BE%D1%80%D0%BC%D0%B0%D0%BB%D1%8C%D0%BD%D0%B0%D1%8F_%D1%84%D0%BE%D1%80%D0%BC%D0%B0). + +Это означает, что вам нужно использовать условие объединения [JOIN](../../sql-reference/statements/select/join.md#select-join), если вы хотите получить, например, названия блюд из пунктов меню. + +Для типовых аналитических задач гораздо эффективнее работать с предварительно объединенными данными, чтобы не использовать `JOIN` каждый раз. Такие данные называются денормализованными. + +Создайте таблицу `menu_item_denorm`, которая будет содержать все данные, объединенные вместе: + +```sql +CREATE TABLE menu_item_denorm +ENGINE = MergeTree ORDER BY (dish_name, created_at) +AS SELECT + price, + high_price, + created_at, + updated_at, + xpos, + ypos, + dish.id AS dish_id, + dish.name AS dish_name, + dish.description AS dish_description, + dish.menus_appeared AS dish_menus_appeared, + dish.times_appeared AS dish_times_appeared, + dish.first_appeared AS dish_first_appeared, + dish.last_appeared AS dish_last_appeared, + dish.lowest_price AS dish_lowest_price, + dish.highest_price AS dish_highest_price, + menu.id AS menu_id, + menu.name AS menu_name, + menu.sponsor AS menu_sponsor, + menu.event AS menu_event, + menu.venue AS menu_venue, + menu.place AS menu_place, + menu.physical_description AS menu_physical_description, + menu.occasion AS menu_occasion, + menu.notes AS menu_notes, + menu.call_number AS menu_call_number, + menu.keywords AS menu_keywords, + menu.language AS menu_language, + menu.date AS menu_date, + menu.location AS menu_location, + menu.location_type AS menu_location_type, + menu.currency AS menu_currency, + menu.currency_symbol AS menu_currency_symbol, + menu.status AS menu_status, + menu.page_count AS menu_page_count, + menu.dish_count AS menu_dish_count +FROM menu_item + JOIN dish ON menu_item.dish_id = dish.id + JOIN menu_page ON menu_item.menu_page_id = menu_page.id + JOIN menu ON menu_page.menu_id = menu.id; +``` + +## Проверьте загруженные данные {#validate-data} + +Запрос: + +```sql +SELECT count() FROM menu_item_denorm; +``` + +Результат: + +```text +┌─count()─┐ +│ 1329175 │ +└─────────┘ +``` + +## Примеры запросов {#run-queries} + +### Усредненные исторические цены на блюда {#query-averaged-historical-prices} + +Запрос: + +```sql +SELECT + round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d, + count(), + round(avg(price), 2), + bar(avg(price), 0, 100, 100) +FROM menu_item_denorm +WHERE (menu_currency = 'Dollars') AND (d > 0) AND (d < 2022) +GROUP BY d +ORDER BY d ASC; +``` + +Результат: + +```text +┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 100, 100)─┐ +│ 1850 │ 618 │ 1.5 │ █▍ │ +│ 1860 │ 1634 │ 1.29 │ █▎ │ +│ 1870 │ 2215 │ 1.36 │ █▎ │ +│ 1880 │ 3909 │ 1.01 │ █ │ +│ 1890 │ 8837 │ 1.4 │ █▍ │ +│ 1900 │ 176292 │ 0.68 │ ▋ │ +│ 1910 │ 212196 │ 0.88 │ ▊ │ +│ 1920 │ 179590 │ 0.74 │ ▋ │ +│ 1930 │ 73707 │ 0.6 │ ▌ │ +│ 1940 │ 58795 │ 0.57 │ ▌ │ +│ 1950 │ 41407 │ 0.95 │ ▊ │ +│ 1960 │ 51179 │ 1.32 │ █▎ │ +│ 1970 │ 12914 │ 1.86 │ █▋ │ +│ 1980 │ 7268 │ 4.35 │ ████▎ │ +│ 1990 │ 11055 │ 6.03 │ ██████ │ +│ 2000 │ 2467 │ 11.85 │ ███████████▋ │ +│ 2010 │ 597 │ 25.66 │ █████████████████████████▋ │ +└──────┴─────────┴──────────────────────┴──────────────────────────────┘ +``` + +Просто не принимайте это всерьез. + +### Цены на бургеры {#query-burger-prices} + +Запрос: + +```sql +SELECT + round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d, + count(), + round(avg(price), 2), + bar(avg(price), 0, 50, 100) +FROM menu_item_denorm +WHERE (menu_currency = 'Dollars') AND (d > 0) AND (d < 2022) AND (dish_name ILIKE '%burger%') +GROUP BY d +ORDER BY d ASC; +``` + +Результат: + +```text +┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 50, 100)───────────┐ +│ 1880 │ 2 │ 0.42 │ ▋ │ +│ 1890 │ 7 │ 0.85 │ █▋ │ +│ 1900 │ 399 │ 0.49 │ ▊ │ +│ 1910 │ 589 │ 0.68 │ █▎ │ +│ 1920 │ 280 │ 0.56 │ █ │ +│ 1930 │ 74 │ 0.42 │ ▋ │ +│ 1940 │ 119 │ 0.59 │ █▏ │ +│ 1950 │ 134 │ 1.09 │ ██▏ │ +│ 1960 │ 272 │ 0.92 │ █▋ │ +│ 1970 │ 108 │ 1.18 │ ██▎ │ +│ 1980 │ 88 │ 2.82 │ █████▋ │ +│ 1990 │ 184 │ 3.68 │ ███████▎ │ +│ 2000 │ 21 │ 7.14 │ ██████████████▎ │ +│ 2010 │ 6 │ 18.42 │ ████████████████████████████████████▋ │ +└──────┴─────────┴──────────────────────┴───────────────────────────────────────┘ +``` + +### Водка {#query-vodka} + +Запрос: + +```sql +SELECT + round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d, + count(), + round(avg(price), 2), + bar(avg(price), 0, 50, 100) +FROM menu_item_denorm +WHERE (menu_currency IN ('Dollars', '')) AND (d > 0) AND (d < 2022) AND (dish_name ILIKE '%vodka%') +GROUP BY d +ORDER BY d ASC; +``` + +Результат: + +```text +┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 50, 100)─┐ +│ 1910 │ 2 │ 0 │ │ +│ 1920 │ 1 │ 0.3 │ ▌ │ +│ 1940 │ 21 │ 0.42 │ ▋ │ +│ 1950 │ 14 │ 0.59 │ █▏ │ +│ 1960 │ 113 │ 2.17 │ ████▎ │ +│ 1970 │ 37 │ 0.68 │ █▎ │ +│ 1980 │ 19 │ 2.55 │ █████ │ +│ 1990 │ 86 │ 3.6 │ ███████▏ │ +│ 2000 │ 2 │ 3.98 │ ███████▊ │ +└──────┴─────────┴──────────────────────┴─────────────────────────────┘ +``` + +Чтобы получить водку, мы должны написать `ILIKE '%vodka%'`, и это хорошая идея. + +### Икра {#query-caviar} + +Посмотрите цены на икру. Получите название любого блюда с икрой. + +Запрос: + +```sql +SELECT + round(toUInt32OrZero(extract(menu_date, '^\\d{4}')), -1) AS d, + count(), + round(avg(price), 2), + bar(avg(price), 0, 50, 100), + any(dish_name) +FROM menu_item_denorm +WHERE (menu_currency IN ('Dollars', '')) AND (d > 0) AND (d < 2022) AND (dish_name ILIKE '%caviar%') +GROUP BY d +ORDER BY d ASC; +``` + +Результат: + +```text +┌────d─┬─count()─┬─round(avg(price), 2)─┬─bar(avg(price), 0, 50, 100)──────┬─any(dish_name)──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ 1090 │ 1 │ 0 │ │ Caviar │ +│ 1880 │ 3 │ 0 │ │ Caviar │ +│ 1890 │ 39 │ 0.59 │ █▏ │ Butter and caviar │ +│ 1900 │ 1014 │ 0.34 │ ▋ │ Anchovy Caviar on Toast │ +│ 1910 │ 1588 │ 1.35 │ ██▋ │ 1/1 Brötchen Caviar │ +│ 1920 │ 927 │ 1.37 │ ██▋ │ ASTRAKAN CAVIAR │ +│ 1930 │ 289 │ 1.91 │ ███▋ │ Astrachan caviar │ +│ 1940 │ 201 │ 0.83 │ █▋ │ (SPECIAL) Domestic Caviar Sandwich │ +│ 1950 │ 81 │ 2.27 │ ████▌ │ Beluga Caviar │ +│ 1960 │ 126 │ 2.21 │ ████▍ │ Beluga Caviar │ +│ 1970 │ 105 │ 0.95 │ █▊ │ BELUGA MALOSSOL CAVIAR AMERICAN DRESSING │ +│ 1980 │ 12 │ 7.22 │ ██████████████▍ │ Authentic Iranian Beluga Caviar the world's finest black caviar presented in ice garni and a sampling of chilled 100° Russian vodka │ +│ 1990 │ 74 │ 14.42 │ ████████████████████████████▋ │ Avocado Salad, Fresh cut avocado with caviare │ +│ 2000 │ 3 │ 7.82 │ ███████████████▋ │ Aufgeschlagenes Kartoffelsueppchen mit Forellencaviar │ +│ 2010 │ 6 │ 15.58 │ ███████████████████████████████▏ │ "OYSTERS AND PEARLS" "Sabayon" of Pearl Tapioca with Island Creek Oysters and Russian Sevruga Caviar │ +└──────┴─────────┴──────────────────────┴──────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +По крайней мере, есть икра с водкой. Очень мило. + +## Online Playground {#playground} + +Этот набор данных доступен в интерактивном ресурсе [Online Playground](https://gh-api.clickhouse.tech/play?user=play#U0VMRUNUCiAgICByb3VuZCh0b1VJbnQzMk9yWmVybyhleHRyYWN0KG1lbnVfZGF0ZSwgJ15cXGR7NH0nKSksIC0xKSBBUyBkLAogICAgY291bnQoKSwKICAgIHJvdW5kKGF2ZyhwcmljZSksIDIpLAogICAgYmFyKGF2ZyhwcmljZSksIDAsIDUwLCAxMDApLAogICAgYW55KGRpc2hfbmFtZSkKRlJPTSBtZW51X2l0ZW1fZGVub3JtCldIRVJFIChtZW51X2N1cnJlbmN5IElOICgnRG9sbGFycycsICcnKSkgQU5EIChkID4gMCkgQU5EIChkIDwgMjAyMikgQU5EIChkaXNoX25hbWUgSUxJS0UgJyVjYXZpYXIlJykKR1JPVVAgQlkgZApPUkRFUiBCWSBkIEFTQw==). diff --git a/docs/ru/interfaces/formats.md b/docs/ru/interfaces/formats.md index 059ddc3ed44..d0bcb057cf2 100644 --- a/docs/ru/interfaces/formats.md +++ b/docs/ru/interfaces/formats.md @@ -364,7 +364,7 @@ $ clickhouse-client --format_csv_delimiter="|" --query="INSERT INTO test.csv FOR ## CSVWithNames {#csvwithnames} -Выводит также заголовок, аналогично `TabSeparatedWithNames`. +Выводит также заголовок, аналогично [TabSeparatedWithNames](#tabseparatedwithnames). ## CustomSeparated {#format-customseparated} diff --git a/docs/ru/sql-reference/data-types/datetime64.md b/docs/ru/sql-reference/data-types/datetime64.md index 73daada3af3..869543dbbaf 100644 --- a/docs/ru/sql-reference/data-types/datetime64.md +++ b/docs/ru/sql-reference/data-types/datetime64.md @@ -7,7 +7,8 @@ toc_title: DateTime64 Позволяет хранить момент времени, который может быть представлен как календарная дата и время, с заданной суб-секундной точностью. -Размер тика (точность, precision): 10-precision секунд, где precision - целочисленный параметр. +Размер тика (точность, precision): 10-precision секунд, где precision - целочисленный параметр. Возможные значения: [ 0 : 9 ]. +Обычно используются - 3 (миллисекунды), 6 (микросекунды), 9 (наносекунды). **Синтаксис:**