ClickHouse/docs/en/engines/table-engines/integrations/mongodb.md

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

192 lines
6.3 KiB
Markdown
Raw Normal View History

2021-02-07 14:29:54 +00:00
---
2022-08-28 14:53:34 +00:00
slug: /en/engines/table-engines/integrations/mongodb
sidebar_position: 135
sidebar_label: MongoDB
2021-02-07 14:29:54 +00:00
---
2022-06-02 10:55:18 +00:00
# MongoDB
2021-02-07 14:29:54 +00:00
MongoDB engine is read-only table engine which allows to read data from remote [MongoDB](https://www.mongodb.com/) collection.
2021-02-07 14:29:54 +00:00
2024-04-20 18:26:06 +00:00
Only MongoDB v3.6+ servers are supported.
2024-07-31 12:08:53 +00:00
:::note
If you're facing troubles, please report the issue, and try to use [the legacy implementation](../../../operations/server-configuration-parameters/settings.md#use_legacy_mongodb_integration).
Keep in mind that it is deprecated, and will be removed in next releases.
:::
2024-05-02 14:58:50 +00:00
2021-02-07 14:29:54 +00:00
## Creating a Table {#creating-a-table}
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name
(
name1 [type1],
name2 [type2],
...
2021-03-23 15:18:12 +00:00
) ENGINE = MongoDB(host:port, database, collection, user, password [, options]);
2021-02-07 14:29:54 +00:00
```
**Engine Parameters**
- `host:port` — MongoDB server address.
2021-02-07 14:29:54 +00:00
- `database` — Remote database name.
2021-02-07 14:29:54 +00:00
- `collection` — Remote collection name.
2021-02-07 14:29:54 +00:00
- `user` — MongoDB user.
2021-02-07 14:29:54 +00:00
- `password` — User password.
2021-02-07 14:29:54 +00:00
- `options` — MongoDB connection string options (optional parameter).
2021-03-23 15:18:12 +00:00
2023-07-07 14:32:44 +00:00
:::tip
If you are using the MongoDB Atlas cloud offering:
2023-07-07 14:32:44 +00:00
```
- connection url can be obtained from 'Atlas SQL' option
- use options: 'connectTimeoutMS=10000&ssl=true&authSource=admin'
2023-07-07 14:32:44 +00:00
```
:::
2024-05-20 00:46:29 +00:00
Also, you can simply pass a URI:
2024-07-31 12:08:53 +00:00
2024-05-12 18:00:34 +00:00
``` sql
ENGINE = MongoDB(uri, collection);
```
**Engine Parameters**
- `uri` — MongoDB server's connection URI
- `collection` — Remote collection name.
2021-02-07 14:29:54 +00:00
2024-07-31 12:08:53 +00:00
## Types mappings
2021-02-07 14:29:54 +00:00
2024-07-31 12:08:53 +00:00
| MongoDB | ClickHouse |
|--------------------|-----------------------------------------------------------------------|
| bool, int32, int64 | *any numeric type*, String |
| double | Float64, String |
| date | Date, Date32, DateTime, DateTime64, String |
| string | String, UUID |
| document | String(as JSON) |
| array | Array, String(as JSON) |
| oid | String |
| binary | String if in column, base64 encoded string if in an array or document |
| *any other* | String |
If key is not found in MongoDB document (for example, column name doesn't match), default value or `NULL` (if the column is nullable) will be inserted.
## Supported clauses
Only queries with simple expressions are supported (for example, `WHERE field = <constant> ORDER BY field2 LIMIT <constant>`).
Such expressions are translated to MongoDB query language and executed on the server side.
You can disable all these restriction, using [mongodb_throw_on_unsupported_query](../../../operations/settings/settings.md#mongodb_throw_on_unsupported_query).
In that case ClickHouse tries to convert query on best effort basis, but it can lead to full table scan and processing on ClickHouse side.
:::note
It's always better to explicitly set type of literal because Mongo requires strict typed filters.\
For example you want to filter by `Date`:
```sql
SELECT * FROM mongo_table WHERE date = '2024-01-01'
2021-02-07 14:29:54 +00:00
```
2024-07-31 12:08:53 +00:00
This will not work because Mongo will not cast string to `Date`, so you need to cast it manually:
```sql
SELECT * FROM mongo_table WHERE date = '2024-01-01'::Date OR date = toDate('2024-01-01')
2024-05-12 18:00:34 +00:00
```
2021-02-07 14:29:54 +00:00
2024-07-31 12:08:53 +00:00
This applied for `Date`, `Date32`, `DateTime`, `Bool`, `UUID`.
:::
## Usage Example {#usage-example}
Assuming MongoDB has [sample_mflix](https://www.mongodb.com/docs/atlas/sample-data/sample-mflix) dataset loaded
Create a table in ClickHouse which allows to read data from MongoDB collection:
2021-03-23 15:18:12 +00:00
2021-09-07 17:07:27 +00:00
``` sql
2024-07-31 12:08:53 +00:00
CREATE TABLE sample_mflix_table
2021-03-23 15:18:12 +00:00
(
2024-07-31 12:08:53 +00:00
_id String,
title String,
plot String,
genres Array(String),
directors Array(String),
writers Array(String),
released Date,
imdb String,
year String,
) ENGINE = MongoDB('mongodb+srv://<USERNAME>:<PASSWORD>@cluster0.cdojylq.mongodb.net/sample_mflix', 'movies');
2021-03-23 15:18:12 +00:00
```
2021-02-07 14:29:54 +00:00
Query:
``` sql
2024-07-31 12:08:53 +00:00
SELECT count() FROM sample_mflix_table
2021-02-07 14:29:54 +00:00
```
``` text
2024-07-31 12:08:53 +00:00
┌─count()─┐
1. │ 21349 │
└─────────┘
2021-02-07 14:29:54 +00:00
```
2024-07-31 12:08:53 +00:00
```SQL
-- JSONExtractString cannot be pushed down to MongoDB
SET mongodb_throw_on_unsupported_query = 0;
2022-01-13 06:37:57 +00:00
2024-07-31 12:08:53 +00:00
-- Find all 'Back to the Future' sequels with rating > 7.5
SELECT title, plot, genres, directors, released FROM sample_mflix_table
WHERE title IN ('Back to the Future', 'Back to the Future Part II', 'Back to the Future Part III')
AND toFloat32(JSONExtractString(imdb, 'rating')) > 7.5
ORDER BY year
FORMAT Vertical;
```
```text
Row 1:
──────
title: Back to the Future
plot: A young man is accidentally sent 30 years into the past in a time-traveling DeLorean invented by his friend, Dr. Emmett Brown, and must make sure his high-school-age parents unite in order to save his own existence.
genres: ['Adventure','Comedy','Sci-Fi']
directors: ['Robert Zemeckis']
released: 1985-07-03
Row 2:
──────
title: Back to the Future Part II
plot: After visiting 2015, Marty McFly must repeat his visit to 1955 to prevent disastrous changes to 1985... without interfering with his first trip.
genres: ['Action','Adventure','Comedy']
directors: ['Robert Zemeckis']
released: 1989-11-22
```
```SQL
-- Find top 3 movies based on Cormac McCarthy's books
SELECT title, toFloat32(JSONExtractString(imdb, 'rating')) as rating
FROM sample_mflix_table
WHERE arrayExists(x -> x like 'Cormac McCarthy%', writers)
ORDER BY rating DESC
LIMIT 3;
```
```text
┌─title──────────────────┬─rating─┐
1. │ No Country for Old Men │ 8.1 │
2. │ The Sunset Limited │ 7.4 │
3. │ The Road │ 7.3 │
└────────────────────────┴────────┘
2022-01-13 06:37:57 +00:00
```
2024-05-02 14:58:50 +00:00
## Troubleshooting
You can see the generated MongoDB query in DEBUG level logs.
2024-05-12 18:00:34 +00:00
Implementation details can be found in [mongocxx](https://github.com/mongodb/mongo-cxx-driver) and [mongoc](https://github.com/mongodb/mongo-c-driver) documentations.