Add anchors to all the headings.

This commit is contained in:
romanzhukov 2021-09-16 17:02:36 +03:00
parent 1be67e00cb
commit d3e6b9044a
2 changed files with 29 additions and 30 deletions

View File

@ -3,7 +3,7 @@ toc_priority: 21
toc_title: Menus
---
# New York Public Library "What's on the Menu?" Dataset
# New York Public Library "What's on the Menu?" Dataset {#menus-dataset}
The dataset is created by the New York Public Library. It contains historical data on the menus of hotels, restaurants and cafes with the dishes along with their prices.
@ -13,7 +13,7 @@ The data is in public domain.
The data is from library's archive and it may be incomplete and difficult for statistical analysis. Nevertheless it is also very yummy.
The size is just 1.3 million records about dishes in the menus — it's a very small data volume for ClickHouse, but it's still a good example.
## Download the Dataset
## Download the Dataset {#download-dataset}
Run the command:
@ -24,7 +24,7 @@ wget https://s3.amazonaws.com/menusdata.nypl.org/gzips/2021_08_01_07_01_17_data.
Replace the link to the up to date link from http://menus.nypl.org/data if needed.
Download size is about 35 MB.
## Unpack the Dataset
## Unpack the Dataset {#unpack-dataset}
```bash
tar xvf 2021_08_01_07_01_17_data.tgz
@ -38,7 +38,7 @@ The data is normalized consisted of four tables:
- MenuPage — information about the pages in the menus, because every page belongs to some menu.
- MenuItem — an item of the menu. A dish along with its price on some menu page: links to dish and menu page.
## Create the Tables
## Create the Tables {#create-tables}
```sql
CREATE TABLE dish
@ -105,7 +105,7 @@ CREATE TABLE menu_item
We use [Decimal](../../sql-reference/data-types/decimal.md) data type to store prices. Everything else is quite straightforward.
## Import Data
## Import the Data {#import-data}
Upload data into ClickHouse:
@ -120,13 +120,13 @@ We use [CSVWithNames](../../interfaces/formats.md#csvwithnames) format as the da
We disable `format_csv_allow_single_quotes` as only double quotes are used for data fields and single quotes can be inside the values and should not confuse the CSV parser.
We disable `input_format_null_as_default` as our data does not have NULLs. Otherwise ClickHouse will try to parse `\N` sequences and can be confused with `\` in data.
We disable [input_format_null_as_default](../../operations/settings/settings.mdsettings-input-format-null-as-default) as our data does not have [NULL](../../sql-reference/syntax.md#null-literal). Otherwise ClickHouse will try to parse `\N` sequences and can be confused with `\` in data.
The setting `--date_time_input_format best_effort` allows to parse `DateTime` fields in wide variety of formats. For example, ISO-8601 without seconds like '2000-01-01 01:02' will be recognized. Without this setting only fixed [DateTime](../../sql-reference/data-types/datetime.md) format is allowed.
The setting [--date_time_input_format best_effort](../../operations/settings/settings.md#settings-date_time_input_format) allows to parse [DateTime](../../sql-reference/data-types/datetime.md) fields in wide variety of formats. For example, ISO-8601 without seconds like '2000-01-01 01:02' will be recognized. Without this setting only fixed DateTime format is allowed.
## Denormalize the Data
## Denormalize the Data {#denormalize-data}
Data is presented in multiple tables in normalized form. It means you have to perform JOINs if you want to query, e.g. dish names from menu items.
Data is presented in multiple tables in normalized form. It means you have to perform [JOIN](../../sql-reference/statements/select/join.md#select-join) if you want to query, e.g. dish names from menu items.
For typical analytical tasks it is way more efficient to deal with pre-JOINed data to avoid doing JOIN every time. It is called "denormalized" data.
We will create a table that will contain all the data JOINed together:
@ -176,7 +176,7 @@ FROM menu_item
JOIN menu ON menu_page.menu_id = menu.id;
```
## Validate the Data
## Validate the Data {#validate-data}
Query:
@ -192,9 +192,9 @@ Result:
└─────────┘
```
## Run Some Queries
## Run Some Queries {#run-queries}
### Averaged historical prices of dishes
### Averaged historical prices of dishes {#query-averaged-historical-prices}
Query:
@ -236,7 +236,7 @@ Result:
Take it with a grain of salt.
### Burger Prices
### Burger Prices {#query-burger-prices}
Query:
@ -273,7 +273,7 @@ Result:
└──────┴─────────┴──────────────────────┴───────────────────────────────────────┘
```
### Vodka
### Vodka {#query-vodka}
Query:
@ -307,7 +307,7 @@ Result:
To get vodka we have to write `ILIKE '%vodka%'` and this definitely makes a statement.
### Caviar
### Caviar {#query-caviar}
Let's print caviar prices. Also let's print a name of any dish with caviar.
@ -350,6 +350,6 @@ Result:
At least they have caviar with vodka. Very nice.
### Test it in Playground
### Test it in Playground {#playground}
The data is uploaded to ClickHouse Playground, [example](https://gh-api.clickhouse.tech/play?user=play#U0VMRUNUCiAgICByb3VuZCh0b1VJbnQzMk9yWmVybyhleHRyYWN0KG1lbnVfZGF0ZSwgJ15cXGR7NH0nKSksIC0xKSBBUyBkLAogICAgY291bnQoKSwKICAgIHJvdW5kKGF2ZyhwcmljZSksIDIpLAogICAgYmFyKGF2ZyhwcmljZSksIDAsIDUwLCAxMDApLAogICAgYW55KGRpc2hfbmFtZSkKRlJPTSBtZW51X2l0ZW1fZGVub3JtCldIRVJFIChtZW51X2N1cnJlbmN5IElOICgnRG9sbGFycycsICcnKSkgQU5EIChkID4gMCkgQU5EIChkIDwgMjAyMikgQU5EIChkaXNoX25hbWUgSUxJS0UgJyVjYXZpYXIlJykKR1JPVVAgQlkgZApPUkRFUiBCWSBkIEFTQw==).

View File

@ -3,7 +3,7 @@ toc_priority: 21
toc_title: Menus
---
# New York Public Library "What's on the Menu?" Dataset
# New York Public Library "What's on the Menu?" Dataset {#menus-dataset}
Набор данных создан Нью-Йоркской публичной библиотекой. Он содержит исторические данные о меню отелей, ресторанов и кафе с блюдами, а также их ценами.
@ -13,7 +13,7 @@ toc_title: Menus
Данные взяты из архива библиотеки, и они могут быть неполными и сложными для статистического анализа. Тем не менее это тоже очень интересно.
В наборе всего 1,3 миллиона записей о блюдах в меню — очень небольшой объем данных для ClickHouse, но это все равно хороший пример.
## Загрузите набор данных
## Загрузите набор данных {#download-dataset}
Выполните команду:
@ -24,7 +24,7 @@ wget https://s3.amazonaws.com/menusdata.nypl.org/gzips/2021_08_01_07_01_17_data.
Замените ссылку на актуальную ссылку с http://menus.nypl.org/data, если это необходимо.
Размер загрузки составляет около 35 МБ.
## Распакуйте набор данных
## Распакуйте набор данных {#unpack-dataset}
```bash
tar xvf 2021_08_01_07_01_17_data.tgz
@ -38,7 +38,7 @@ The data is normalized consisted of four tables:
- MenuPage — information about the pages in the menus, because every page belongs to some menu.
- MenuItem — an item of the menu. A dish along with its price on some menu page: links to dish and menu page.
## Создайте таблицы
## Создайте таблицы {#create-tables}
```sql
CREATE TABLE dish
@ -105,7 +105,7 @@ CREATE TABLE menu_item
We use [Decimal](../../sql-reference/data-types/decimal.md) data type to store prices. Everything else is quite straightforward.
## Import Data
## Import Data {#import-data}
Upload data into ClickHouse:
@ -125,7 +125,7 @@ We disable `input_format_null_as_default` as our data does not have NULLs. Other
The setting `--date_time_input_format best_effort` allows to parse `DateTime` fields in wide variety of formats. For example, ISO-8601 without seconds like '2000-01-01 01:02' will be recognized. Without this setting only fixed [DateTime](../../sql-reference/data-types/datetime.md) format is allowed.
## Denormalize the Data
## Denormalize the Data {#denormalize-data}
Data is presented in multiple tables in normalized form. It means you have to perform JOINs if you want to query, e.g. dish names from menu items.
For typical analytical tasks it is way more efficient to deal with pre-JOINed data to avoid doing JOIN every time. It is called "denormalized" data.
@ -177,8 +177,7 @@ FROM menu_item
JOIN menu ON menu_page.menu_id = menu.id;
```
## Validate the Data
## Validate the Data {#validate-data}
Запрос:
@ -194,9 +193,9 @@ SELECT count() FROM menu_item_denorm;
└─────────┘
```
## Примеры
## Примеры запросов {#run-queries}
### Усредненные исторические цены на блюда
### Усредненные исторические цены на блюда {#query-averaged-historical-prices}
Запрос:
@ -238,7 +237,7 @@ ORDER BY d ASC;
Просто не принимайте это всерьез.
### Цены на бургеры
### Цены на бургеры {#query-burger-prices}
Запрос:
@ -275,7 +274,7 @@ ORDER BY d ASC;
└──────┴─────────┴──────────────────────┴───────────────────────────────────────┘
```
### Водка
### Водка {#query-vodka}
Запрос:
@ -309,7 +308,7 @@ ORDER BY d ASC;
To get vodka we have to write `ILIKE '%vodka%'` and this definitely makes a statement.
### Икра
### Икра {#query-caviar}
Давайте выведем цены на икру. Также давайте выведем название любого блюда с икрой.
@ -352,6 +351,6 @@ ORDER BY d ASC;
По крайней мере, есть икра с водкой. Очень мило.
### Test it in Playground
### Test it in Playground {#playground}
The data is uploaded to ClickHouse Playground, [example](https://gh-api.clickhouse.tech/play?user=play#U0VMRUNUCiAgICByb3VuZCh0b1VJbnQzMk9yWmVybyhleHRyYWN0KG1lbnVfZGF0ZSwgJ15cXGR7NH0nKSksIC0xKSBBUyBkLAogICAgY291bnQoKSwKICAgIHJvdW5kKGF2ZyhwcmljZSksIDIpLAogICAgYmFyKGF2ZyhwcmljZSksIDAsIDUwLCAxMDApLAogICAgYW55KGRpc2hfbmFtZSkKRlJPTSBtZW51X2l0ZW1fZGVub3JtCldIRVJFIChtZW51X2N1cnJlbmN5IElOICgnRG9sbGFycycsICcnKSkgQU5EIChkID4gMCkgQU5EIChkIDwgMjAyMikgQU5EIChkaXNoX25hbWUgSUxJS0UgJyVjYXZpYXIlJykKR1JPVVAgQlkgZApPUkRFUiBCWSBkIEFTQw==).