2021-05-10 10:42:32 +00:00
---
2022-04-03 23:05:35 +00:00
toc_priority: 67
toc_title: NLP
2021-05-10 10:42:32 +00:00
---
2021-08-02 12:32:45 +00:00
# [experimental] Natural Language Processing functions {#nlp-functions}
2022-04-03 23:05:35 +00:00
!!! warning "Warning"
This is an experimental feature that is currently in development and is not ready for general use. It will change in unpredictable backwards-incompatible ways in future releases. Set `allow_experimental_nlp_functions = 1` to enable it.
2021-05-10 10:42:32 +00:00
2021-06-05 03:57:53 +00:00
## stem {#stem}
2021-05-10 10:42:32 +00:00
2021-08-02 12:32:45 +00:00
Performs stemming on a given word.
2021-05-10 10:42:32 +00:00
**Syntax**
``` sql
2021-06-05 03:57:53 +00:00
stem('language', word)
2021-05-10 10:42:32 +00:00
```
**Arguments**
2021-06-05 03:57:53 +00:00
- `language` — Language which rules will be applied. Must be in lowercase. [String ](../../sql-reference/data-types/string.md#string ).
- `word` — word that needs to be stemmed. Must be in lowercase. [String ](../../sql-reference/data-types/string.md#string ).
2021-05-10 10:42:32 +00:00
**Examples**
Query:
``` sql
2021-09-28 14:26:35 +00:00
SELECT arrayMap(x -> stem('en', x), ['I', 'think', 'it', 'is', 'a', 'blessing', 'in', 'disguise']) as res;
2021-05-10 10:42:32 +00:00
```
Result:
``` text
2021-06-05 03:57:53 +00:00
┌─res────────────────────────────────────────────────┐
│ ['I','think','it','is','a','bless','in','disguis'] │
└────────────────────────────────────────────────────┘
2021-05-10 10:42:32 +00:00
```
2021-06-05 03:57:53 +00:00
## lemmatize {#lemmatize}
2021-05-10 10:42:32 +00:00
2021-08-02 15:54:24 +00:00
Performs lemmatization on a given word. Needs dictionaries to operate, which can be obtained [here ](https://github.com/vpodpecan/lemmagen3/tree/master/src/lemmagen3/models ).
2021-05-10 10:42:32 +00:00
**Syntax**
``` sql
2021-06-05 03:57:53 +00:00
lemmatize('language', word)
2021-05-10 10:42:32 +00:00
```
**Arguments**
- `language` — Language which rules will be applied. [String ](../../sql-reference/data-types/string.md#string ).
2021-06-05 03:57:53 +00:00
- `word` — Word that needs to be lemmatized. Must be lowercase. [String ](../../sql-reference/data-types/string.md#string ).
2021-05-10 10:42:32 +00:00
**Examples**
Query:
``` sql
2021-06-05 03:57:53 +00:00
SELECT lemmatize('en', 'wolves');
2021-05-10 10:42:32 +00:00
```
Result:
``` text
2021-06-05 03:57:53 +00:00
┌─lemmatize("wolves")─┐
│ "wolf" │
└─────────────────────┘
2021-05-10 10:42:32 +00:00
```
2021-06-05 03:57:53 +00:00
Configuration:
``` xml
< lemmatizers >
< lemmatizer >
< lang > en< / lang >
< path > en.bin< / path >
< / lemmatizer >
< / lemmatizers >
```
2021-05-10 10:42:32 +00:00
2021-06-05 03:57:53 +00:00
## synonyms {#synonyms}
2021-08-02 12:32:45 +00:00
Finds synonyms to a given word. There are two types of synonym extensions: `plain` and `wordnet` .
2021-08-02 15:54:24 +00:00
With the `plain` extension type we need to provide a path to a simple text file, where each line corresponds to a certain synonym set. Words in this line must be separated with space or tab characters.
2021-08-02 12:32:45 +00:00
2021-08-02 15:54:24 +00:00
With the `wordnet` extension type we need to provide a path to a directory with WordNet thesaurus in it. Thesaurus must contain a WordNet sense index.
2021-05-10 10:42:32 +00:00
**Syntax**
``` sql
2021-06-05 03:57:53 +00:00
synonyms('extension_name', word)
2021-05-10 10:42:32 +00:00
```
**Arguments**
2021-08-02 12:32:45 +00:00
- `extension_name` — Name of the extension in which search will be performed. [String ](../../sql-reference/data-types/string.md#string ).
2021-06-05 03:57:53 +00:00
- `word` — Word that will be searched in extension. [String ](../../sql-reference/data-types/string.md#string ).
2021-05-10 10:42:32 +00:00
**Examples**
Query:
``` sql
2021-06-05 03:57:53 +00:00
SELECT synonyms('list', 'important');
2021-05-10 10:42:32 +00:00
```
Result:
``` text
2021-06-05 03:57:53 +00:00
┌─synonyms('list', 'important')────────────┐
│ ['important','big','critical','crucial'] │
└──────────────────────────────────────────┘
2021-05-10 10:42:32 +00:00
```
2021-06-05 03:57:53 +00:00
Configuration:
``` xml
< synonyms_extensions >
< extension >
< name > en< / name >
< type > plain< / type >
< path > en.txt< / path >
< / extension >
< extension >
< name > en< / name >
< type > wordnet< / type >
< path > en/< / path >
< / extension >
< / synonyms_extensions >
2021-08-02 15:54:24 +00:00
```