mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-25 09:02:00 +00:00
Merge pull request #44708 from Avogar/schema-inference-docs
Add detailed documentation about schema inference
This commit is contained in:
commit
bc456feb4b
File diff suppressed because it is too large
Load Diff
1570
docs/en/interfaces/schema-inference.md
Normal file
1570
docs/en/interfaces/schema-inference.md
Normal file
File diff suppressed because it is too large
Load Diff
@ -3625,7 +3625,7 @@ z IPv4
|
|||||||
Controls making inferred types `Nullable` in schema inference for formats without information about nullability.
|
Controls making inferred types `Nullable` in schema inference for formats without information about nullability.
|
||||||
If the setting is enabled, the inferred type will be `Nullable` only if column contains `NULL` in a sample that is parsed during schema inference.
|
If the setting is enabled, the inferred type will be `Nullable` only if column contains `NULL` in a sample that is parsed during schema inference.
|
||||||
|
|
||||||
Default value: `false`.
|
Default value: `true`.
|
||||||
|
|
||||||
## input_format_try_infer_integers {#input_format_try_infer_integers}
|
## input_format_try_infer_integers {#input_format_try_infer_integers}
|
||||||
|
|
||||||
|
70
docs/en/operations/system-tables/schema_inference_cache.md
Normal file
70
docs/en/operations/system-tables/schema_inference_cache.md
Normal file
@ -0,0 +1,70 @@
|
|||||||
|
---
|
||||||
|
slug: /en/operations/system-tables/schema_inference_cache
|
||||||
|
---
|
||||||
|
# Schema inference cache
|
||||||
|
|
||||||
|
Contains information about all cached file schemas.
|
||||||
|
|
||||||
|
Columns:
|
||||||
|
- `storage` ([String](/docs/en/sql-reference/data-types/string.md)) — Storage name: File, URL, S3 or HDFS.
|
||||||
|
- `source` ([String](/docs/en/sql-reference/data-types/string.md)) — File source.
|
||||||
|
- `format` ([String](/docs/en/sql-reference/data-types/string.md)) — Format name.
|
||||||
|
- `additional_format_info` ([String](/docs/en/sql-reference/data-types/string.md)) - Additional information required to identify the schema. For example, format specific settings.
|
||||||
|
- `registration_time` ([DateTime](/docs/en/sql-reference/data-types/datetime.md)) — Timestamp when schema was added in cache.
|
||||||
|
- `schema` ([String](/docs/en/sql-reference/data-types/string.md)) - Cached schema.
|
||||||
|
|
||||||
|
**Example**
|
||||||
|
|
||||||
|
Let's say we have a file `data.jsonl` with this content:
|
||||||
|
```json
|
||||||
|
{"id" : 1, "age" : 25, "name" : "Josh", "hobbies" : ["football", "cooking", "music"]}
|
||||||
|
{"id" : 2, "age" : 19, "name" : "Alan", "hobbies" : ["tennis", "art"]}
|
||||||
|
{"id" : 3, "age" : 32, "name" : "Lana", "hobbies" : ["fitness", "reading", "shopping"]}
|
||||||
|
{"id" : 4, "age" : 47, "name" : "Brayan", "hobbies" : ["movies", "skydiving"]}
|
||||||
|
```
|
||||||
|
|
||||||
|
:::tip
|
||||||
|
Place `data.jsonl` in the `user_files_path` directory. You can find this by looking
|
||||||
|
in your ClickHouse configuration files. The default is:
|
||||||
|
```
|
||||||
|
<user_files_path>/var/lib/clickhouse/user_files/</user_files_path>
|
||||||
|
```
|
||||||
|
:::
|
||||||
|
|
||||||
|
Open `clickhouse-client` and run the `DESCRIBE` query:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
DESCRIBE file('data.jsonl') SETTINGS input_format_try_infer_integers=0;
|
||||||
|
```
|
||||||
|
|
||||||
|
```response
|
||||||
|
┌─name────┬─type────────────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
|
||||||
|
│ id │ Nullable(Float64) │ │ │ │ │ │
|
||||||
|
│ age │ Nullable(Float64) │ │ │ │ │ │
|
||||||
|
│ name │ Nullable(String) │ │ │ │ │ │
|
||||||
|
│ hobbies │ Array(Nullable(String)) │ │ │ │ │ │
|
||||||
|
└─────────┴─────────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
Let's see the content of the `system.schema_inference_cache` table:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT *
|
||||||
|
FROM system.schema_inference_cache
|
||||||
|
FORMAT Vertical
|
||||||
|
```
|
||||||
|
```response
|
||||||
|
Row 1:
|
||||||
|
──────
|
||||||
|
storage: File
|
||||||
|
source: /home/droscigno/user_files/data.jsonl
|
||||||
|
format: JSONEachRow
|
||||||
|
additional_format_info: schema_inference_hints=, max_rows_to_read_for_schema_inference=25000, schema_inference_make_columns_nullable=true, try_infer_integers=false, try_infer_dates=true, try_infer_datetimes=true, try_infer_numbers_from_strings=true, read_bools_as_numbers=true, try_infer_objects=false
|
||||||
|
registration_time: 2022-12-29 17:49:52
|
||||||
|
schema: id Nullable(Float64), age Nullable(Float64), name Nullable(String), hobbies Array(Nullable(String))
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
**See also**
|
||||||
|
- [Automatic schema inference from input data](/docs/en/interfaces/schema-inference.md)
|
||||||
|
|
@ -33,6 +33,7 @@ CustomSeparatedWithNames
|
|||||||
CustomSeparatedWithNamesAndTypes
|
CustomSeparatedWithNamesAndTypes
|
||||||
DBMSs
|
DBMSs
|
||||||
DateTime
|
DateTime
|
||||||
|
DateTimes
|
||||||
DockerHub
|
DockerHub
|
||||||
Doxygen
|
Doxygen
|
||||||
Encodings
|
Encodings
|
||||||
@ -55,6 +56,7 @@ IPv
|
|||||||
IntN
|
IntN
|
||||||
Integrations
|
Integrations
|
||||||
JSONAsString
|
JSONAsString
|
||||||
|
JSONAsObject
|
||||||
JSONColumns
|
JSONColumns
|
||||||
JSONColumnsWithMetadata
|
JSONColumnsWithMetadata
|
||||||
JSONCompact
|
JSONCompact
|
||||||
@ -171,6 +173,7 @@ Werror
|
|||||||
Woboq
|
Woboq
|
||||||
WriteBuffer
|
WriteBuffer
|
||||||
WriteBuffers
|
WriteBuffers
|
||||||
|
WithNamesAndTypes
|
||||||
XCode
|
XCode
|
||||||
YAML
|
YAML
|
||||||
YYYY
|
YYYY
|
||||||
@ -247,6 +250,7 @@ datafiles
|
|||||||
dataset
|
dataset
|
||||||
datasets
|
datasets
|
||||||
datetime
|
datetime
|
||||||
|
datetimes
|
||||||
dbms
|
dbms
|
||||||
ddl
|
ddl
|
||||||
deallocation
|
deallocation
|
||||||
@ -361,6 +365,7 @@ mysqldump
|
|||||||
mysqljs
|
mysqljs
|
||||||
noop
|
noop
|
||||||
nullable
|
nullable
|
||||||
|
nullability
|
||||||
num
|
num
|
||||||
obfuscator
|
obfuscator
|
||||||
odbc
|
odbc
|
||||||
|
Loading…
Reference in New Issue
Block a user