Merge pull request #27084 from evillique/nlp-docs

Improve documentation for NLP functions
2024-11-22 07:31:57 +00:00 · 2021-08-02 21:08:30 +03:00 · 2021-08-02 21:08:30 +03:00 · 8234ed6ecf
commit 8234ed6ecf
parent d63a5e1c96 0f6fff47d3
2 changed files with 23 additions and 9 deletions
--- a/docs/en/sql-reference/functions/nlp-functions.md
+++ b/docs/en/sql-reference/functions/nlp-functions.md
@ -3,11 +3,14 @@ toc_priority: 67
 toc_title: NLP
 ---

-# Natural Language Processing functions {#nlp-functions}
+# [experimental] Natural Language Processing functions {#nlp-functions}
+
+!!! warning "Warning"
+    This is an experimental feature that is currently in development and is not ready for general use. It will change in unpredictable backwards-incompatible ways in future releases. Set `allow_experimental_nlp_functions = 1` to enable it.

 ## stem {#stem}

-Performs stemming on a previously tokenized text.
+Performs stemming on a given word.

 **Syntax**

@ -38,7 +41,7 @@ Result:

 ## lemmatize {#lemmatize}

-Performs lemmatization on a given word.
+Performs lemmatization on a given word. Needs dictionaries to operate, which can be obtained [here](https://github.com/vpodpecan/lemmagen3/tree/master/src/lemmagen3/models).

 **Syntax**

@ -79,7 +82,11 @@ Configuration:

 ## synonyms {#synonyms}

-Finds synonyms to a given word. 
+Finds synonyms to a given word. There are two types of synonym extensions: `plain` and `wordnet`.
+
+With the `plain` extension type we need to provide a path to a simple text file, where each line corresponds to a certain synonym set. Words in this line must be separated with space or tab characters.
+
+With the `wordnet` extension type we need to provide a path to a directory with WordNet thesaurus in it. Thesaurus must contain a WordNet sense index.

 **Syntax**

@ -89,7 +96,7 @@ synonyms('extension_name', word)

 **Arguments**

-   `extension_name` — Name of the extention in which search will be performed. [String](../../sql-reference/data-types/string.md#string).
+-   `extension_name` — Name of the extension in which search will be performed. [String](../../sql-reference/data-types/string.md#string).
 -   `word` — Word that will be searched in extension. [String](../../sql-reference/data-types/string.md#string).

 **Examples**
@ -122,4 +129,4 @@ Configuration:
        <path>en/</path>
    </extension>
 </synonyms_extensions>
-```
+```
--- a/docs/ru/sql-reference/functions/nlp-functions.md
+++ b/docs/ru/sql-reference/functions/nlp-functions.md
@ -3,7 +3,10 @@ toc_priority: 67
 toc_title: NLP
 ---

-# Функции для работы с ествественным языком {#nlp-functions}
+# [экспериментально] Функции для работы с ествественным языком {#nlp-functions}
+
+!!! warning "Предупреждение"
+    Сейчас использование функций для работы с ествественным языком является экспериментальной возможностью. Чтобы использовать данные функции, включите настройку `allow_experimental_nlp_functions = 1`.

 ## stem {#stem}

@ -38,7 +41,7 @@ Result:

 ## lemmatize {#lemmatize}

-Данная функция проводит лемматизацию для заданного слова.
+Данная функция проводит лемматизацию для заданного слова. Для работы лемматизатора необходимы словари, которые можно найти [здесь](https://github.com/vpodpecan/lemmagen3/tree/master/src/lemmagen3/models).

 **Синтаксис**

@ -79,7 +82,11 @@ SELECT lemmatize('en', 'wolves');

 ## synonyms {#synonyms}

-Находит синонимы к заданному слову.
+Находит синонимы к заданному слову. Представлены два типа расширений словарей: `plain` и `wordnet`.
+
+Для работы расширения типа `plain` необходимо указать путь до простого текстового файла, где каждая строка соотвествует одному набору синонимов. Слова в данной строке должны быть разделены с помощью пробела или знака табуляции.
+
+Для работы расширения типа `plain` необходимо указать путь до WordNet тезауруса. Тезаурус должен содержать WordNet sense index.

 **Синтаксис**