Inverted indexes are an experimental type of [secondary indexes](/docs/en/engines/table-engines/mergetree-family/mergetree.md/#available-types-of-indices) which provide fast text search
Let's look at the performance improvements of inverted indexes on a large dataset with lots of text. We will use 28.7M rows of comments on the popular Hacker News website. Here is the table without an inverted index:
```sql
CREATE TABLE hackernews (
id UInt64,
deleted UInt8,
type String,
author String,
timestamp DateTime,
comment String,
dead UInt8,
parent UInt64,
poll UInt64,
children Array(UInt32),
url String,
score UInt32,
title String,
parts Array(UInt32),
descendants UInt32
)
ENGINE = MergeTree
ORDER BY (type, author);
```
The 28.7M rows are in a Parquet file in S3 - let's insert them into the `hackernews` table:
Consider the following simple search for the term `ClickHouse` (and its varied upper and lower cases) in the `comment` column:
```sql
SELECT count()
FROM hackernews
WHERE hasToken(lower(comment), 'clickhouse');
```
Notice it takes 3 seconds to execute the query:
```response
┌─count()─┐
│ 1145 │
└─────────┘
1 row in set. Elapsed: 3.001 sec. Processed 28.74 million rows, 9.75 GB (9.58 million rows/s., 3.25 GB/s.)
```
We will use `ALTER TABLE` and add an inverted index on the lowercase of the `comment` column, then materialize it (which can take a while - wait for it to materialize):
```sql
ALTER TABLE hackernews
ADD INDEX comment_lowercase(lower(comment)) TYPE inverted;
ALTER TABLE hackernews MATERIALIZE INDEX comment_lowercase;
```
We run the same query...
```sql
SELECT count()
FROM hackernews
WHERE hasToken(lower(comment), 'clickhouse')
```
...and notice the query executes 4x faster:
```response
┌─count()─┐
│ 1145 │
└─────────┘
1 row in set. Elapsed: 0.747 sec. Processed 4.49 million rows, 1.77 GB (6.01 million rows/s., 2.37 GB/s.)