This is an experimental feature that is currently in development and is not ready for general use. It will change in unpredictable backwards-incompatible ways in future releases. Set `allow_experimental_nlp_functions = 1` to enable it.
Performs lemmatization on a given word. Needs dictionaries to operate, which can be obtained [here](https://github.com/vpodpecan/lemmagen3/tree/master/src/lemmagen3/models).
With the `plain` extension type we need to provide a path to a simple text file, where each line corresponds to a certain synonym set. Words in this line must be separated with space or tab characters.
Detects the language of the UTF8-encoded input string. The function uses the [CLD2 library](https://github.com/CLD2Owners/cld2) for detection, and it returns the 2-letter ISO language code.
The `detectLanguage` function works best when providing over 200 characters in the input string.
**Syntax**
``` sql
detectLanguage('text_to_be_analyzed')
```
**Arguments**
-`text_to_be_analyzed` — A collection (or sentences) of strings to analyze. [String](../../sql-reference/data-types/string.md#string).
**Returned value**
- The 2-letter ISO code of the detected language
Other possible results:
-`un` = unknown, can not detect any language.
-`other` = the detected language does not have 2 letter code.
**Examples**
Query:
```sql
SELECT detectLanguageMixed('Je pense que je ne parviendrai jamais à parler français comme un natif. Where there’s a will, there’s a way.');
```
Result:
```response
fr
```
## detectLanguageMixed
Similar to the `detectLanguage` function, but `detectLanguageMixed` returns a `Map` of 2-letter language codes that are mapped to the percentage of the certain language in the text.
**Syntax**
``` sql
detectLanguageMixed('text_to_be_analyzed')
```
**Arguments**
-`text_to_be_analyzed` — A collection (or sentences) of strings to analyze. [String](../../sql-reference/data-types/string.md#string).
**Returned value**
-`Map(String, Float32)`: The keys are 2-letter ISO codes and the values are a perentage of text found for that language
**Examples**
Query:
```sql
SELECT detectLanguageMixed('二兎を追う者は一兎をも得ず二兎を追う者は一兎をも得ず A vaincre sans peril, on triomphe sans gloire.');
```
Result:
```response
┌─detectLanguageMixed()─┐
│ {'ja':0.62,'fr':0.36 │
└───────────────────────┘
```
## detectLanguageUnknown
Similar to the `detectLanguage` function, except the `detectLanguageUnknown` function works with non-UTF8-encoded strings. Prefer this version when your character set is UTF-16 or UTF-32.
**Syntax**
``` sql
detectLanguageUnknown('text_to_be_analyzed')
```
**Arguments**
-`text_to_be_analyzed` — A collection (or sentences) of strings to analyze. [String](../../sql-reference/data-types/string.md#string).
**Returned value**
- The 2-letter ISO code of the detected language
Other possible results:
-`un` = unknown, can not detect any language.
-`other` = the detected language does not have 2 letter code.
**Examples**
Query:
```sql
SELECT detectLanguageUnknown('Ich bleibe für ein paar Tage.');
```
Result:
```response
┌─detectLanguageUnknown('Ich bleibe für ein paar Tage.')─┐