Commit Graph

199 Commits

Author SHA1 Message Date
taiyang-li
14f84f02d5 Merge branch 'master' into async_hdfs_read_buffer 2022-05-23 18:36:21 +08:00
Nikolai Kochetov
56feef01e7 Move some resources 2022-05-20 19:49:31 +00:00
avogar
a4cf07708c Fix comments 2022-05-20 14:57:27 +00:00
avogar
566d1b15fd Merge branch 'master' of github.com:ClickHouse/ClickHouse into formats-with-names 2022-05-20 13:54:52 +00:00
Kseniia Sumarokova
d4ad138a04
Merge pull request #37103 from bigo-sg/hive_partition_key_read
optimization for reading hive file  when all columns to read are partition keys
2022-05-19 14:24:00 +02:00
lgbo-ustc
1497e08301 update exception msg 2022-05-17 19:27:43 +08:00
taiyang-li
14ab7eb5a3 merge master and solve conflict 2022-05-17 16:28:08 +08:00
lgbo-ustc
0b3468a150 TOO_MANY_PARTITIONS 2022-05-17 15:50:03 +08:00
lgbo-ustc
f4f4a2d85b reuse setting max_partitions_to_read 2022-05-17 15:49:14 +08:00
lgbo-ustc
a161a21992 add max partitions check for each hive table 2022-05-17 15:37:32 +08:00
lgbo-ustc
d8ad9ad2a6 update codes 2022-05-17 09:27:03 +08:00
avogar
68bb07d166 Better naming 2022-05-13 18:39:19 +00:00
avogar
b17fec659a Improve performance and memory usage for select of subset of columns for some formats 2022-05-13 13:51:28 +00:00
lgbo-ustc
4411fd87c8 reading optimization when all columns to read are partition keys 2022-05-11 16:49:30 +08:00
Robert Schulze
e583099158
Fix build, pt. V 2022-05-04 15:50:52 +02:00
mergify[bot]
64084b5e32
Merge branch 'master' into shared_ptr_helper3 2022-05-03 20:46:16 +00:00
Dmitry Novik
5ba7a55c18
Merge pull request #36650 from bigo-sg/hive_text_parallel_parsing
Parallel parsing of hive text format
2022-05-03 15:56:28 +02:00
Robert Schulze
777b5bc15b
Don't let storages inherit from boost::noncopyable
... IStorage has deleted copy ctor / assignment already
2022-05-03 09:07:08 +02:00
Robert Schulze
330212e0f4
Remove inherited create() method + disallow copying
The original motivation for this commit was that shared_ptr_helper used
std::shared_ptr<>() which does two heap allocations instead of
make_shared<>() which does a single allocation. Turned out that
1. the affected code (--> Storages/) is not on a hot path (rendering the
performance argument moot ...)
2. yet copying Storage objects is potentially dangerous and was
   previously allowed.

Hence, this change

- removes shared_ptr_helper and as a result all inherited create() methods,

- instead, Storage objects are now created using make_shared<>() by the
  caller (for that to work, many constructors had to be made public), and

- all Storage classes were marked as noncopyable using boost::noncopyable.

In sum, we are (likely) not making things faster but the code becomes
cleaner and harder to misuse.
2022-05-02 08:46:52 +02:00
Amos Bird
4a5e4274f0
base should not depend on Common 2022-04-29 10:26:35 +08:00
taiyang-li
99dee35b6e parallel parsing of hive text format 2022-04-26 14:33:10 +08:00
taiyang-li
9e37764bb0 fix build error 2022-04-22 12:37:01 +08:00
taiyang-li
883139ff69 fix code syle 2022-04-21 18:31:13 +08:00
taiyang-li
94d0358b15 fix code style 2022-04-21 17:40:55 +08:00
taiyang-li
169dae2a35 ready for review 2022-04-21 17:37:12 +08:00
taiyang-li
9a251a820b fix bug 2022-04-21 17:13:59 +08:00
taiyang-li
87e76a1757 add swtich contril 2022-04-21 12:30:14 +08:00
taiyang-li
3b722eea7a profileing 2022-04-20 20:59:36 +08:00
taiyang-li
d533b569ad debugging 2022-04-20 19:58:31 +08:00
taiyang-li
56fe6fa608 finish dev 2022-04-20 17:49:53 +08:00
taiyang-li
fb6a56d4b0 finish debug 2022-04-20 16:24:18 +08:00
taiyang-li
0ad2a76fae Merge remote-tracking branch 'origin/master' into async_hdfs_read_buffer 2022-04-16 18:45:39 +08:00
taiyang-li
cd83fd5f8a tobe debug 2022-04-16 18:41:18 +08:00
avogar
42726639f3 Check ORC/Parquet/Arrow format magic bytes before loading file in memory 2022-04-13 19:27:38 +00:00
taiyang-li
090fd72884 fix bug 2022-04-11 11:19:31 +08:00
taiyang-li
7e89f760f3 remove useless code 2022-04-09 10:43:58 +08:00
taiyang-li
70f4503ba5 use global context for cache 2022-04-09 00:28:07 +08:00
taiyang-li
cd807da838 finish test 2022-04-09 00:15:33 +08:00
taiyang-li
e319df1799 finish dev 2022-04-08 23:58:56 +08:00
taiyang-li
2c99ef0ecc refactor HiveTableMetadata 2022-04-08 23:04:24 +08:00
taiyang-li
2e6f0db825 first commit 2022-04-08 15:12:24 +08:00
taiyang-li
87507ec9e8 fix conflicts 2022-04-07 20:52:54 +08:00
taiyang-li
d7c79c3a54 merge master and solve conflicts 2022-04-07 20:48:16 +08:00
taiyang-li
e9de38c52b fix bug 2022-04-07 20:45:07 +08:00
taiyang-li
2dc420c66b rename some symbols in hivefile 2022-04-07 15:48:42 +08:00
taiyang-li
4763a39802 merge bigo-sg/use_minmax_index and solve conflict 2022-04-07 15:45:28 +08:00
taiyang-li
046a2ba51c rename some symboles 2022-04-07 15:35:08 +08:00
taiyang-li
ad074fee91 merge use_minmax_index and solve conflict 2022-04-07 15:19:45 +08:00
taiyang-li
f02d769343 fix build error 2022-04-07 14:29:35 +08:00
taiyang-li
acc7046d54 remove some useless virtual and rename some functions in HiveFile 2022-04-07 11:46:57 +08:00
taiyang-li
df00bd214d merge bigo-sg/use_minmax_index and solve conflict 2022-04-07 11:18:24 +08:00
taiyang-li
2ef316801c Merge branch 'master' into use_minmax_index 2022-04-07 10:53:25 +08:00
taiyang-li
0b0c8ef09e add integration tests 2022-04-06 18:47:34 +08:00
taiyang-li
acb9f1632e suppoort skip splits in orc and parquet 2022-04-06 16:40:22 +08:00
taiyang-li
43e8af697a fix code style 2022-04-06 11:41:16 +08:00
taiyang-li
38f149b533 optimize trivial count hive query 2022-04-04 15:28:26 +08:00
taiyang-li
4e2d5f1841 Merge remote-tracking branch 'bigo-sg/use_minmax_index' into optimize_trivial_hive_query 2022-04-04 10:42:28 +08:00
taiyang-li
cbfc0f6bac fix typo 2022-04-04 10:42:22 +08:00
Kseniia Sumarokova
d3b3294872
Merge pull request #35365 from bigo-sg/improve_access_type
Improve check access in table functions
2022-04-01 10:47:02 +02:00
taiyang-li
16bb4c4ad0 respect remote_url_allow_hosts for hive 2022-03-30 15:33:59 +08:00
taiyang-li
0af6fdb576 fix building 2022-03-30 11:28:21 +08:00
taiyang-li
b79cec6806 Merge branch 'use_minmax_index' of https://github.com/bigo-sg/ClickHouse into use_minmax_index 2022-03-25 23:33:49 +08:00
taiyang-li
eee8949150 fix code 2022-03-25 23:33:46 +08:00
taiyang-li
4aaa361f2e Merge remote-tracking branch 'ck/master' into use_minmax_index 2022-03-25 22:48:03 +08:00
李扬
9cc528b01f
Update HiveFile.h 2022-03-23 21:57:58 +08:00
taiyang-li
ae3d55c6a2 merge master and fix conflict 2022-03-23 14:31:12 +08:00
taiyang-li
68d5b538aa fix build error 2022-03-23 11:15:42 +08:00
lgbo-ustc
967d5a8055 Merge remote-tracking branch 'ck/master' into hive_column_pruning_bug 2022-03-21 19:52:06 +08:00
taiyang-li
49b6f3dfc5 merge master and fix conflict 2022-03-21 15:05:43 +08:00
taiyang-li
bf05b94940 fix build 2022-03-21 15:03:28 +08:00
taiyang-li
7d50bd1eb3 add access type hive 2022-03-21 11:19:45 +08:00
lgbo-ustc
f7aa40af5b update codes 2022-03-21 09:25:20 +08:00
lgbo-ustc
e78cfe3b26 update codes 2022-03-18 15:07:52 +08:00
lgbo-ustc
abfaa82bca fixed hive query bugs 2022-03-15 12:01:34 +08:00
Anton Popov
36ec379aeb Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-14 16:28:35 +00:00
Kseniia Sumarokova
e6ee891c9c
Merge pull request #34957 from bigo-sg/hive_random_access_file_cache
Optimization for first time to read a random access readbuffer in hive
2022-03-10 11:36:22 +01:00
Kseniia Sumarokova
1eb2bae792
Merge pull request #34954 from bigo-sg/hive_read_columns_pruning
read columns pruning for hive
2022-03-08 10:17:24 +01:00
lgbo-ustc
256e92ffee Merge remote-tracking branch 'ck/master' into hive_random_access_file_cache 2022-03-08 14:14:40 +08:00
lgbo-ustc
a8cfc2458a update codes 2022-03-08 11:55:15 +08:00
Kseniia Sumarokova
5511f2f6e6
Merge pull request #34940 from bigo-sg/hive_client_connection_pool
Use connection pool in HiveMetastoreClient
2022-03-07 17:14:56 +01:00
Kseniia Sumarokova
28b9ec01c0
Merge pull request #34945 from bigo-sg/hive_bug_fixed
unexpected result when use `in` in hive query
2022-03-07 17:13:11 +01:00
lgbo-ustc
8ae5296ee8 fixed compile errors 2022-03-07 17:26:48 +08:00
lgbo-ustc
cfeedd2cb5 fixed code style 2022-03-07 12:28:31 +08:00
lgbo-ustc
c37eedd887 update codes 2022-03-07 10:30:54 +08:00
lgbo-ustc
75a50a30c4 update codes 2022-03-07 09:43:53 +08:00
lgbo-ustc
d907b70cc4 update codes: get actual read block 2022-03-07 09:26:05 +08:00
lgbo-ustc
f4d8fb46c5 update codes 2022-03-07 09:26:05 +08:00
lgbo-ustc
62c1bd5ae9 hive read columns pruning 2022-03-07 09:26:05 +08:00
Anton Popov
c1fdcf7a64 Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-01 20:21:39 +03:00
lgbo-ustc
ca470e1b94 lazy initialization about getting hive metadata in HiveStorage 2022-03-01 19:04:44 +08:00
lgbo-ustc
5ed41bda9b fixed code style 2022-03-01 17:20:32 +08:00
lgbo-ustc
91a45d799e optimization for first time to read a random access readbuffer 2022-03-01 15:22:07 +08:00
lgbo-ustc
99cd25d70e add new table function: hive() 2022-02-28 20:51:33 +08:00
lgbo-ustc
6473767c99 fixed code style 2022-02-28 17:10:56 +08:00
lgbo-ustc
5885cfd869 fixed bug : unexpected result when using in clause for filtering partitions 2022-02-28 16:47:50 +08:00
lgbo-ustc
c5e02be44e fixed code-style 2022-02-28 15:22:54 +08:00
lgbo-ustc
2176d74cd1 Use connection pool in HiveMetastoreClient
1. remove lock for hive metastore client access
2. auo reconnect when connection is broken
2022-02-28 15:11:38 +08:00
taiyang-li
a4baec6d26 fix building 2022-02-16 15:12:43 +08:00
taiyang-li
afcb295273 fix compile error 2022-02-16 14:51:56 +08:00
taiyang-li
f19f0d847f fix code style 2022-02-16 12:23:06 +08:00