Merge branch 'master' into better_parallel_hash2

This commit is contained in:
Nikita Taranov 2024-11-10 20:24:37 +01:00 committed by GitHub
commit db30c33d04
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1510 changed files with 48352 additions and 16128 deletions

View File

@ -12,7 +12,7 @@ tests/ci/cancel_and_rerun_workflow_lambda/app.py
- Backward Incompatible Change
- Build/Testing/Packaging Improvement
- Documentation (changelog entry is not required)
- Critical Bug Fix (crash, LOGICAL_ERROR, data loss, RBAC)
- Critical Bug Fix (crash, data loss, RBAC)
- Bug Fix (user-visible misbehavior in an official stable release)
- CI Fix or Improvement (changelog entry is not required)
- Not for changelog (changelog entry is not required)

1
.gitignore vendored
View File

@ -159,6 +159,7 @@ website/package-lock.json
/programs/server/store
/programs/server/uuid
/programs/server/coordination
/programs/server/workload
# temporary test files
tests/queries/0_stateless/test_*

8
.gitmodules vendored
View File

@ -227,12 +227,6 @@
[submodule "contrib/minizip-ng"]
path = contrib/minizip-ng
url = https://github.com/zlib-ng/minizip-ng
[submodule "contrib/qpl"]
path = contrib/qpl
url = https://github.com/intel/qpl
[submodule "contrib/idxd-config"]
path = contrib/idxd-config
url = https://github.com/intel/idxd-config
[submodule "contrib/QAT-ZSTD-Plugin"]
path = contrib/QAT-ZSTD-Plugin
url = https://github.com/intel/QAT-ZSTD-Plugin
@ -338,7 +332,7 @@
url = https://github.com/ClickHouse/usearch.git
[submodule "contrib/SimSIMD"]
path = contrib/SimSIMD
url = https://github.com/ashvardanian/SimSIMD.git
url = https://github.com/ClickHouse/SimSIMD.git
[submodule "contrib/FP16"]
path = contrib/FP16
url = https://github.com/Maratyszcza/FP16.git

View File

@ -1,4 +1,5 @@
### Table of Contents
**[ClickHouse release v24.10, 2024-10-31](#2410)**<br/>
**[ClickHouse release v24.9, 2024-09-26](#249)**<br/>
**[ClickHouse release v24.8 LTS, 2024-08-20](#248)**<br/>
**[ClickHouse release v24.7, 2024-07-30](#247)**<br/>
@ -12,6 +13,165 @@
# 2024 Changelog
### <a id="2410"></a> ClickHouse release 24.10, 2024-10-31
#### Backward Incompatible Change
* Allow to write `SETTINGS` before `FORMAT` in a chain of queries with `UNION` when subqueries are inside parentheses. This closes [#39712](https://github.com/ClickHouse/ClickHouse/issues/39712). Change the behavior when a query has the SETTINGS clause specified twice in a sequence. The closest SETTINGS clause will have a preference for the corresponding subquery. In the previous versions, the outermost SETTINGS clause could take a preference over the inner one. [#68614](https://github.com/ClickHouse/ClickHouse/pull/68614) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Reordering of filter conditions from `[PRE]WHERE` clause is now allowed by default. It could be disabled by setting `allow_reorder_prewhere_conditions` to `false`. [#70657](https://github.com/ClickHouse/ClickHouse/pull/70657) ([Nikita Taranov](https://github.com/nickitat)).
* Remove the `idxd-config` library, which has an incompatible license. This also removes the experimental Intel DeflateQPL codec. [#70987](https://github.com/ClickHouse/ClickHouse/pull/70987) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### New Feature
* Allow to grant access to the wildcard prefixes. `GRANT SELECT ON db.table_pefix_* TO user`. [#65311](https://github.com/ClickHouse/ClickHouse/pull/65311) ([pufit](https://github.com/pufit)).
* If you press space bar during query runtime, the client will display a real-time table with detailed metrics. You can enable it globally with the new `--progress-table` option in clickhouse-client; a new `--enable-progress-table-toggle` is associated with the `--progress-table` option, and toggles the rendering of the progress table by pressing the control key (Space). [#63689](https://github.com/ClickHouse/ClickHouse/pull/63689) ([Maria Khristenko](https://github.com/mariaKhr)), [#70423](https://github.com/ClickHouse/ClickHouse/pull/70423) ([Julia Kartseva](https://github.com/jkartseva)).
* Allow to cache read files for object storage table engines and data lakes using hash from ETag + file path as cache key. [#70135](https://github.com/ClickHouse/ClickHouse/pull/70135) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Support creating a table with a query: `CREATE TABLE ... CLONE AS ...`. It clones the source table's schema and then attaches all partitions to the newly created table. This feature is only supported with tables of the `MergeTree` family Closes [#65015](https://github.com/ClickHouse/ClickHouse/issues/65015). [#69091](https://github.com/ClickHouse/ClickHouse/pull/69091) ([tuanpach](https://github.com/tuanpach)).
* Add a new system table, `system.query_metric_log` which contains history of memory and metric values from table system.events for individual queries, periodically flushed to disk. [#66532](https://github.com/ClickHouse/ClickHouse/pull/66532) ([Pablo Marcos](https://github.com/pamarcos)).
* A simple SELECT query can be written with implicit SELECT to enable calculator-style expressions, e.g., `ch "1 + 2"`. This is controlled by a new setting, `implicit_select`. [#68502](https://github.com/ClickHouse/ClickHouse/pull/68502) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Support the `--copy` mode for clickhouse local as a shortcut for format conversion [#68503](https://github.com/ClickHouse/ClickHouse/issues/68503). [#68583](https://github.com/ClickHouse/ClickHouse/pull/68583) ([Denis Hananein](https://github.com/denis-hananein)).
* Add a builtin HTML page for visualizing merges which is available at the `/merges` path. [#70821](https://github.com/ClickHouse/ClickHouse/pull/70821) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add support for `arrayUnion` function. [#68989](https://github.com/ClickHouse/ClickHouse/pull/68989) ([Peter Nguyen](https://github.com/petern48)).
* Allow parametrised SQL aliases. [#50665](https://github.com/ClickHouse/ClickHouse/pull/50665) ([Anton Kozlov](https://github.com/tonickkozlov)).
* A new aggregate function `quantileExactWeightedInterpolated`, which is a interpolated version based on quantileExactWeighted. Some people may wonder why we need a new `quantileExactWeightedInterpolated` since we already have `quantileExactInterpolatedWeighted`. The reason is the new one is more accurate than the old one. This is for spark compatibility. [#69619](https://github.com/ClickHouse/ClickHouse/pull/69619) ([李扬](https://github.com/taiyang-li)).
* A new function `arrayElementOrNull`. It returns `NULL` if the array index is out of range or a Map key not found. [#69646](https://github.com/ClickHouse/ClickHouse/pull/69646) ([李扬](https://github.com/taiyang-li)).
* Allows users to specify regular expressions through new `message_regexp` and `message_regexp_negative` fields in the `config.xml` file to filter out logging. The logging is applied to the formatted un-colored text for the most intuitive developer experience. [#69657](https://github.com/ClickHouse/ClickHouse/pull/69657) ([Peter Nguyen](https://github.com/petern48)).
* Added `RIPEMD160` function, which computes the RIPEMD-160 cryptographic hash of a string. Example: `SELECT HEX(RIPEMD160('The quick brown fox jumps over the lazy dog'))` returns `37F332F68DB77BD9D7EDD4969571AD671CF9DD3B`. [#70087](https://github.com/ClickHouse/ClickHouse/pull/70087) ([Dergousov Maxim](https://github.com/m7kss1)).
* Support reading `Iceberg` tables on `HDFS`. [#70268](https://github.com/ClickHouse/ClickHouse/pull/70268) ([flynn](https://github.com/ucasfl)).
* Support for CTE in the form of `WITH ... INSERT`, as previously we only supported `INSERT ... WITH ...`. [#70593](https://github.com/ClickHouse/ClickHouse/pull/70593) ([Shichao Jin](https://github.com/jsc0218)).
* MongoDB integration: support for all MongoDB types, support for WHERE and ORDER BY statements on MongoDB side, restriction for expressions unsupported by MongoDB. Note that the new inegration is disabled by default, to use it, please set `<use_legacy_mongodb_integration>` to `false` in server config. [#63279](https://github.com/ClickHouse/ClickHouse/pull/63279) ([Kirill Nikiforov](https://github.com/allmazz)).
* A new function `getSettingOrDefault` added to return the default value and avoid exception if a custom setting is not found in the current profile. [#69917](https://github.com/ClickHouse/ClickHouse/pull/69917) ([Shankar](https://github.com/shiyer7474)).
#### Experimental feature
* Refreshable materialized views are production ready. [#70550](https://github.com/ClickHouse/ClickHouse/pull/70550) ([Michael Kolupaev](https://github.com/al13n321)). Refreshable materialized views are now supported in Replicated databases. [#60669](https://github.com/ClickHouse/ClickHouse/pull/60669) ([Michael Kolupaev](https://github.com/al13n321)).
* Parallel replicas are moved from experimental to beta. Reworked settings that control the behavior of parallel replicas algorithms. A quick recap: ClickHouse has four different algorithms for parallel reading involving multiple replicas, which is reflected in the setting `parallel_replicas_mode`, the default value for it is `read_tasks` Additionally, the toggle-switch setting `enable_parallel_replicas` has been added. [#63151](https://github.com/ClickHouse/ClickHouse/pull/63151) ([Alexey Milovidov](https://github.com/alexey-milovidov)), ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Support for the `Dynamic` type in most functions by executing them on internal types inside `Dynamic`. [#69691](https://github.com/ClickHouse/ClickHouse/pull/69691) ([Pavel Kruglov](https://github.com/Avogar)).
* Allow to read/write the `JSON` type as a binary string in `RowBinary` format under settings `input_format_binary_read_json_as_string/output_format_binary_write_json_as_string`. [#70288](https://github.com/ClickHouse/ClickHouse/pull/70288) ([Pavel Kruglov](https://github.com/Avogar)).
* Allow to serialize/deserialize `JSON` column as single String column in the Native format. For output use setting `output_format_native_write_json_as_string`. For input, use serialization version `1` before the column data. [#70312](https://github.com/ClickHouse/ClickHouse/pull/70312) ([Pavel Kruglov](https://github.com/Avogar)).
* Introduced a special (experimental) mode of a merge selector for MergeTree tables which makes it more aggressive for the partitions that are close to the limit by the number of parts. It is controlled by the `merge_selector_use_blurry_base` MergeTree-level setting. [#70645](https://github.com/ClickHouse/ClickHouse/pull/70645) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Implement generic ser/de between Avro's `Union` and ClickHouse's `Variant` types. Resolves [#69713](https://github.com/ClickHouse/ClickHouse/issues/69713). [#69712](https://github.com/ClickHouse/ClickHouse/pull/69712) ([Jiří Kozlovský](https://github.com/jirislav)).
#### Performance Improvement
* Refactor `IDisk` and `IObjectStorage` for better performance. Tables from `plain` and `plain_rewritable` object storages will initialize faster. [#68146](https://github.com/ClickHouse/ClickHouse/pull/68146) ([Alexey Milovidov](https://github.com/alexey-milovidov), [Julia Kartseva](https://github.com/jkartseva)). Do not call the LIST object storage API when determining if a file or directory exists on the plain rewritable disk, as it can be cost-inefficient. [#70852](https://github.com/ClickHouse/ClickHouse/pull/70852) ([Julia Kartseva](https://github.com/jkartseva)). Reduce the number of object storage HEAD API requests in the plain_rewritable disk. [#70915](https://github.com/ClickHouse/ClickHouse/pull/70915) ([Julia Kartseva](https://github.com/jkartseva)).
* Added an ability to parse data directly into sparse columns. [#69828](https://github.com/ClickHouse/ClickHouse/pull/69828) ([Anton Popov](https://github.com/CurtizJ)).
* Improved performance of parsing formats with high number of missed values (e.g. `JSONEachRow`). [#69875](https://github.com/ClickHouse/ClickHouse/pull/69875) ([Anton Popov](https://github.com/CurtizJ)).
* Supports parallel reading of parquet row groups and prefetching of row groups in single-threaded mode. [#69862](https://github.com/ClickHouse/ClickHouse/pull/69862) ([LiuNeng](https://github.com/liuneng1994)).
* Support minmax index for `pointInPolygon`. [#62085](https://github.com/ClickHouse/ClickHouse/pull/62085) ([JackyWoo](https://github.com/JackyWoo)).
* Use bloom filters when reading Parquet files. [#62966](https://github.com/ClickHouse/ClickHouse/pull/62966) ([Arthur Passos](https://github.com/arthurpassos)).
* Lock-free parts rename to avoid INSERT affect SELECT (due to parts lock) (under normal circumstances with `fsync_part_directory`, QPS of SELECT with INSERT in parallel, increased 2x, under heavy load the effect is even bigger). Note, this only includes `ReplicatedMergeTree` for now. [#64955](https://github.com/ClickHouse/ClickHouse/pull/64955) ([Azat Khuzhin](https://github.com/azat)).
* Respect `ttl_only_drop_parts` on `materialize ttl`; only read necessary columns to recalculate TTL and drop parts by replacing them with an empty one. [#65488](https://github.com/ClickHouse/ClickHouse/pull/65488) ([Andrey Zvonov](https://github.com/zvonand)).
* Optimized thread creation in the ThreadPool to minimize lock contention. Thread creation is now performed outside of the critical section to avoid delays in job scheduling and thread management under high load conditions. This leads to a much more responsive ClickHouse under heavy concurrent load. [#68694](https://github.com/ClickHouse/ClickHouse/pull/68694) ([filimonov](https://github.com/filimonov)).
* Enable reading `LowCardinality` string columns from `ORC`. [#69481](https://github.com/ClickHouse/ClickHouse/pull/69481) ([李扬](https://github.com/taiyang-li)).
* Use `LowCardinality` for `ProfileEvents` in system logs such as `part_log`, `query_views_log`, `filesystem_cache_log`. [#70152](https://github.com/ClickHouse/ClickHouse/pull/70152) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Improve performance of `fromUnixTimestamp`/`toUnixTimestamp` functions. [#71042](https://github.com/ClickHouse/ClickHouse/pull/71042) ([kevinyhzou](https://github.com/KevinyhZou)).
* Don't disable nonblocking read from page cache for the entire server when reading from a blocking I/O. This was leading to a poorer performance when a single filesystem (e.g., tmpfs) didn't support the `preadv2` syscall while others do. [#70299](https://github.com/ClickHouse/ClickHouse/pull/70299) ([Antonio Andelic](https://github.com/antonio2368)).
* `ALTER TABLE .. REPLACE PARTITION` doesn't wait anymore for mutations/merges that happen in other partitions. [#59138](https://github.com/ClickHouse/ClickHouse/pull/59138) ([Vasily Nemkov](https://github.com/Enmk)).
* Don't do validation when synchronizing ACL from Keeper. It's validating during creation. It shouldn't matter that much, but there are installations with tens of thousands or even more user created, and the unnecessary hash validation can take a long time to finish during server startup (it synchronizes everything from keeper). [#70644](https://github.com/ClickHouse/ClickHouse/pull/70644) ([Raúl Marín](https://github.com/Algunenano)).
#### Improvement
* `CREATE TABLE AS` will copy `PRIMARY KEY`, `ORDER BY`, and similar clauses (of `MergeTree` tables). [#69739](https://github.com/ClickHouse/ClickHouse/pull/69739) ([sakulali](https://github.com/sakulali)).
* Support 64-bit XID in Keeper. It can be enabled with the `use_xid_64` configuration value. [#69908](https://github.com/ClickHouse/ClickHouse/pull/69908) ([Antonio Andelic](https://github.com/antonio2368)).
* Command-line arguments for Bool settings are set to true when no value is provided for the argument (e.g. `clickhouse-client --optimize_aggregation_in_order --query "SELECT 1"`). [#70459](https://github.com/ClickHouse/ClickHouse/pull/70459) ([davidtsuk](https://github.com/davidtsuk)).
* Added user-level settings `min_free_disk_bytes_to_throw_insert` and `min_free_disk_ratio_to_throw_insert` to prevent insertions on disks that are almost full. [#69755](https://github.com/ClickHouse/ClickHouse/pull/69755) ([Marco Vilas Boas](https://github.com/marco-vb)).
* Embedded documentation for settings will be strictly more detailed and complete than the documentation on the website. This is the first step before making the website documentation always auto-generated from the source code. This has long-standing implications: - it will be guaranteed to have every setting; - there is no chance of having default values obsolete; - we can generate this documentation for each ClickHouse version; - the documentation can be displayed by the server itself even without Internet access. Generate the docs on the website from the source code. [#70289](https://github.com/ClickHouse/ClickHouse/pull/70289) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Allow empty needle in the function `replace`, the same behavior with PostgreSQL. [#69918](https://github.com/ClickHouse/ClickHouse/pull/69918) ([zhanglistar](https://github.com/zhanglistar)).
* Allow empty needle in functions `replaceRegexp*`. [#70053](https://github.com/ClickHouse/ClickHouse/pull/70053) ([zhanglistar](https://github.com/zhanglistar)).
* Symbolic links for tables in the `data/database_name/` directory are created for the actual paths to the table's data, depending on the storage policy, instead of the `store/...` directory on the default disk. [#61777](https://github.com/ClickHouse/ClickHouse/pull/61777) ([Kirill](https://github.com/kirillgarbar)).
* While parsing an `Enum` field from `JSON`, a string containing an integer will be interpreted as the corresponding `Enum` element. This closes [#65119](https://github.com/ClickHouse/ClickHouse/issues/65119). [#66801](https://github.com/ClickHouse/ClickHouse/pull/66801) ([scanhex12](https://github.com/scanhex12)).
* Allow `TRIM` -ing `LEADING` or `TRAILING` empty string as a no-op. Closes [#67792](https://github.com/ClickHouse/ClickHouse/issues/67792). [#68455](https://github.com/ClickHouse/ClickHouse/pull/68455) ([Peter Nguyen](https://github.com/petern48)).
* Improve compatibility of `cast(timestamp as String)` with Spark. [#69179](https://github.com/ClickHouse/ClickHouse/pull/69179) ([Wenzheng Liu](https://github.com/lwz9103)).
* Always use the new analyzer to calculate constant expressions when `enable_analyzer` is set to `true`. Support calculation of `executable` table function arguments without using `SELECT` query for constant expressions. [#69292](https://github.com/ClickHouse/ClickHouse/pull/69292) ([Dmitry Novik](https://github.com/novikd)).
* Add a setting `enable_secure_identifiers` to disallow identifiers with special characters. [#69411](https://github.com/ClickHouse/ClickHouse/pull/69411) ([tuanpach](https://github.com/tuanpach)).
* Add `show_create_query_identifier_quoting_rule` to define identifier quoting behavior in the `SHOW CREATE TABLE` query result. Possible values: - `user_display`: When the identifiers is a keyword. - `when_necessary`: When the identifiers is one of `{"distinct", "all", "table"}` and when it could lead to ambiguity: column names, dictionary attribute names. - `always`: Always quote identifiers. [#69448](https://github.com/ClickHouse/ClickHouse/pull/69448) ([tuanpach](https://github.com/tuanpach)).
* Improve restoring of access entities' dependencies [#69563](https://github.com/ClickHouse/ClickHouse/pull/69563) ([Vitaly Baranov](https://github.com/vitlibar)).
* If you run `clickhouse-client` or other CLI application, and it starts up slowly due to an overloaded server, and you start typing your query, such as `SELECT`, the previous versions will display the remaining of the terminal echo contents before printing the greetings message, such as `SELECTClickHouse local version 24.10.1.1.` instead of `ClickHouse local version 24.10.1.1.`. Now it is fixed. This closes [#31696](https://github.com/ClickHouse/ClickHouse/issues/31696). [#69856](https://github.com/ClickHouse/ClickHouse/pull/69856) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add new column `readonly_duration` to the `system.replicas` table. Needed to be able to distinguish actual readonly replicas from sentinel ones in alerts. [#69871](https://github.com/ClickHouse/ClickHouse/pull/69871) ([Miсhael Stetsyuk](https://github.com/mstetsyuk)).
* Change the type of `join_output_by_rowlist_perkey_rows_threshold` setting type to unsigned integer. [#69886](https://github.com/ClickHouse/ClickHouse/pull/69886) ([kevinyhzou](https://github.com/KevinyhZou)).
* Enhance OpenTelemetry span logging to include query settings. [#70011](https://github.com/ClickHouse/ClickHouse/pull/70011) ([sharathks118](https://github.com/sharathks118)).
* Add diagnostic info about higher-order array functions if lambda result type is unexpected. [#70093](https://github.com/ClickHouse/ClickHouse/pull/70093) ([ttanay](https://github.com/ttanay)).
* Keeper improvement: less locking during cluster changes. [#70275](https://github.com/ClickHouse/ClickHouse/pull/70275) ([Antonio Andelic](https://github.com/antonio2368)).
* Add `WITH IMPLICIT` and `FINAL` keywords to the `SHOW GRANTS` command. Fix a minor bug with implicit grants: [#70094](https://github.com/ClickHouse/ClickHouse/issues/70094). [#70293](https://github.com/ClickHouse/ClickHouse/pull/70293) ([pufit](https://github.com/pufit)).
* Respect `compatibility` for MergeTree settings. The `compatibility` value is taken from the `default` profile on server startup, and default MergeTree settings are changed accordingly. Further changes of the `compatibility` setting do not affect MergeTree settings. [#70322](https://github.com/ClickHouse/ClickHouse/pull/70322) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Avoid spamming the logs with large HTTP response bodies in case of errors during inter-server communication. [#70487](https://github.com/ClickHouse/ClickHouse/pull/70487) ([Vladimir Cherkasov](https://github.com/vdimir)).
* Added a new setting `max_parts_to_move` to control the maximum number of parts that can be moved at once. [#70520](https://github.com/ClickHouse/ClickHouse/pull/70520) ([Vladimir Cherkasov](https://github.com/vdimir)).
* Limit the frequency of certain log messages. [#70601](https://github.com/ClickHouse/ClickHouse/pull/70601) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* `CHECK TABLE` with `PART` qualifier was incorrectly formatted in the client. [#70660](https://github.com/ClickHouse/ClickHouse/pull/70660) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Support writing the column index and the offset index using parquet native writer. [#70669](https://github.com/ClickHouse/ClickHouse/pull/70669) ([LiuNeng](https://github.com/liuneng1994)).
* Support parsing `DateTime64` for microsecond and timezone in joda syntax ("joda" is a popular Java library for date and time, and the "joda syntax" is that library's style). [#70737](https://github.com/ClickHouse/ClickHouse/pull/70737) ([kevinyhzou](https://github.com/KevinyhZou)).
* Changed an approach to figure out if a cloud storage supports [batch delete](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html) or not. [#70786](https://github.com/ClickHouse/ClickHouse/pull/70786) ([Vitaly Baranov](https://github.com/vitlibar)).
* Support for Parquet page v2 in the native reader. [#70807](https://github.com/ClickHouse/ClickHouse/pull/70807) ([Arthur Passos](https://github.com/arthurpassos)).
* A check if table has both `storage_policy` and `disk` set. A check if a new storage policy is compatible with an old one when using `disk` setting is added. [#70839](https://github.com/ClickHouse/ClickHouse/pull/70839) ([Kirill](https://github.com/kirillgarbar)).
* Add `system.s3_queue_settings` and `system.azure_queue_settings`. [#70841](https://github.com/ClickHouse/ClickHouse/pull/70841) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Functions `base58Encode` and `base58Decode` now accept arguments of type `FixedString`. Example: `SELECT base58Encode(toFixedString('plaintext', 9));`. [#70846](https://github.com/ClickHouse/ClickHouse/pull/70846) ([Faizan Patel](https://github.com/faizan2786)).
* Add the `partition` column to every entry type of the part log. Previously, it was set only for some entries. This closes [#70819](https://github.com/ClickHouse/ClickHouse/issues/70819). [#70848](https://github.com/ClickHouse/ClickHouse/pull/70848) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add `MergeStart` and `MutateStart` events into `system.part_log` which helps with merges analysis and visualization. [#70850](https://github.com/ClickHouse/ClickHouse/pull/70850) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add a profile event about the number of merged source parts. It allows the monitoring of the fanout of the merge tree in production. [#70908](https://github.com/ClickHouse/ClickHouse/pull/70908) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Background downloads to the filesystem cache were enabled back. [#70929](https://github.com/ClickHouse/ClickHouse/pull/70929) ([Nikita Taranov](https://github.com/nickitat)).
* Add a new merge selector algorithm, named `Trivial`, for professional usage only. It is worse than the `Simple` merge selector. [#70969](https://github.com/ClickHouse/ClickHouse/pull/70969) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Support for atomic `CREATE OR REPLACE VIEW`. [#70536](https://github.com/ClickHouse/ClickHouse/pull/70536) ([tuanpach](https://github.com/tuanpach))
* Added `strict_once` mode to aggregate function `windowFunnel` to avoid counting one event several times in case it matches multiple conditions, close [#21835](https://github.com/ClickHouse/ClickHouse/issues/21835). [#69738](https://github.com/ClickHouse/ClickHouse/pull/69738) ([Vladimir Cherkasov](https://github.com/vdimir)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Apply configuration updates in global context object. It fixes issues like [#62308](https://github.com/ClickHouse/ClickHouse/issues/62308). [#62944](https://github.com/ClickHouse/ClickHouse/pull/62944) ([Amos Bird](https://github.com/amosbird)).
* Fix `ReadSettings` not using user set values, because defaults were only used. [#65625](https://github.com/ClickHouse/ClickHouse/pull/65625) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix type mismatch issue in `sumMapFiltered` when using signed arguments. [#58408](https://github.com/ClickHouse/ClickHouse/pull/58408) ([Chen768959](https://github.com/Chen768959)).
* Fix toHour-like conversion functions' monotonicity when optional time zone argument is passed. [#60264](https://github.com/ClickHouse/ClickHouse/pull/60264) ([Amos Bird](https://github.com/amosbird)).
* Relax `supportsPrewhere` check for `Merge` tables. This fixes [#61064](https://github.com/ClickHouse/ClickHouse/issues/61064). It was hardened unnecessarily in [#60082](https://github.com/ClickHouse/ClickHouse/issues/60082). [#61091](https://github.com/ClickHouse/ClickHouse/pull/61091) ([Amos Bird](https://github.com/amosbird)).
* Fix `use_concurrency_control` setting handling for proper `concurrent_threads_soft_limit_num` limit enforcing. This enables concurrency control by default because previously it was broken. [#61473](https://github.com/ClickHouse/ClickHouse/pull/61473) ([Sergei Trifonov](https://github.com/serxa)).
* Fix incorrect `JOIN ON` section optimization in case of `IS NULL` check under any other function (like `NOT`) that may lead to wrong results. Closes [#67915](https://github.com/ClickHouse/ClickHouse/issues/67915). [#68049](https://github.com/ClickHouse/ClickHouse/pull/68049) ([Vladimir Cherkasov](https://github.com/vdimir)).
* Prevent `ALTER` queries that would make the `CREATE` query of tables invalid. [#68574](https://github.com/ClickHouse/ClickHouse/pull/68574) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Fix inconsistent AST formatting for `negate` (`-`) and `NOT` functions with tuples and arrays. [#68600](https://github.com/ClickHouse/ClickHouse/pull/68600) ([Vladimir Cherkasov](https://github.com/vdimir)).
* Fix insertion of incomplete type into `Dynamic` during deserialization. It could lead to `Parameter out of bound` errors. [#69291](https://github.com/ClickHouse/ClickHouse/pull/69291) ([Pavel Kruglov](https://github.com/Avogar)).
* Zero-copy replication, which is experimental and should not be used in production: fix inf loop after `restore replica` in the replicated merge tree with zero copy. [#69293](https://github.com/CljmnickHouse/ClickHouse/pull/69293) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
* Return back default value of `processing_threads_num` as number of cpu cores in storage `S3Queue`. [#69384](https://github.com/ClickHouse/ClickHouse/pull/69384) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Bypass try/catch flow when de/serializing nested repeated protobuf to nested columns (fixes [#41971](https://github.com/ClickHouse/ClickHouse/issues/41971)). [#69556](https://github.com/ClickHouse/ClickHouse/pull/69556) ([Eliot Hautefeuille](https://github.com/hileef)).
* Fix crash during insertion into FixedString column in PostgreSQL engine. [#69584](https://github.com/ClickHouse/ClickHouse/pull/69584) ([Pavel Kruglov](https://github.com/Avogar)).
* Fix crash when executing `create view t as (with recursive 42 as ttt select ttt);`. [#69676](https://github.com/ClickHouse/ClickHouse/pull/69676) ([Han Fei](https://github.com/hanfei1991)).
* Fixed `maxMapState` throwing 'Bad get' if value type is DateTime64. [#69787](https://github.com/ClickHouse/ClickHouse/pull/69787) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix `getSubcolumn` with `LowCardinality` columns by overriding `useDefaultImplementationForLowCardinalityColumns` to return `true`. [#69831](https://github.com/ClickHouse/ClickHouse/pull/69831) ([Miсhael Stetsyuk](https://github.com/mstetsyuk)).
* Fix permanent blocked distributed sends if a DROP of distributed table failed. [#69843](https://github.com/ClickHouse/ClickHouse/pull/69843) ([Azat Khuzhin](https://github.com/azat)).
* Fix non-cancellable queries containing WITH FILL with NaN keys. This closes [#69261](https://github.com/ClickHouse/ClickHouse/issues/69261). [#69845](https://github.com/ClickHouse/ClickHouse/pull/69845) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix analyzer default with old compatibility value. [#69895](https://github.com/ClickHouse/ClickHouse/pull/69895) ([Raúl Marín](https://github.com/Algunenano)).
* Don't check dependencies during CREATE OR REPLACE VIEW during DROP of old table. Previously CREATE OR REPLACE query failed when there are dependent tables of the recreated view. [#69907](https://github.com/ClickHouse/ClickHouse/pull/69907) ([Pavel Kruglov](https://github.com/Avogar)).
* Something for Decimal. Fixes [#69730](https://github.com/ClickHouse/ClickHouse/issues/69730). [#69978](https://github.com/ClickHouse/ClickHouse/pull/69978) ([Arthur Passos](https://github.com/arthurpassos)).
* Now DEFINER/INVOKER will work with parameterized views. [#69984](https://github.com/ClickHouse/ClickHouse/pull/69984) ([pufit](https://github.com/pufit)).
* Fix parsing for view's definers. [#69985](https://github.com/ClickHouse/ClickHouse/pull/69985) ([pufit](https://github.com/pufit)).
* Fixed a bug when the timezone could change the result of the query with a `Date` or `Date32` arguments. [#70036](https://github.com/ClickHouse/ClickHouse/pull/70036) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Fixes `Block structure mismatch` for queries with nested views and `WHERE` condition. Fixes [#66209](https://github.com/ClickHouse/ClickHouse/issues/66209). [#70054](https://github.com/ClickHouse/ClickHouse/pull/70054) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Avoid reusing columns among different named tuples when evaluating `tuple` functions. This fixes [#70022](https://github.com/ClickHouse/ClickHouse/issues/70022). [#70103](https://github.com/ClickHouse/ClickHouse/pull/70103) ([Amos Bird](https://github.com/amosbird)).
* Fix wrong LOGICAL_ERROR when replacing literals in ranges. [#70122](https://github.com/ClickHouse/ClickHouse/pull/70122) ([Pablo Marcos](https://github.com/pamarcos)).
* Check for Nullable(Nothing) type during ALTER TABLE MODIFY COLUMN/QUERY to prevent tables with such data type. [#70123](https://github.com/ClickHouse/ClickHouse/pull/70123) ([Pavel Kruglov](https://github.com/Avogar)).
* Proper error message for illegal query `JOIN ... ON *` , close [#68650](https://github.com/ClickHouse/ClickHouse/issues/68650). [#70124](https://github.com/ClickHouse/ClickHouse/pull/70124) ([Vladimir Cherkasov](https://github.com/vdimir)).
* Fix wrong result with skipping index. [#70127](https://github.com/ClickHouse/ClickHouse/pull/70127) ([Raúl Marín](https://github.com/Algunenano)).
* Fix data race in ColumnObject/ColumnTuple decompress method that could lead to heap use after free. [#70137](https://github.com/ClickHouse/ClickHouse/pull/70137) ([Pavel Kruglov](https://github.com/Avogar)).
* Fix possible hung in ALTER COLUMN with Dynamic type. [#70144](https://github.com/ClickHouse/ClickHouse/pull/70144) ([Pavel Kruglov](https://github.com/Avogar)).
* Now ClickHouse will consider more errors as retriable and will not mark data parts as broken in case of such errors. [#70145](https://github.com/ClickHouse/ClickHouse/pull/70145) ([alesapin](https://github.com/alesapin)).
* Use correct `max_types` parameter during Dynamic type creation for JSON subcolumn. [#70147](https://github.com/ClickHouse/ClickHouse/pull/70147) ([Pavel Kruglov](https://github.com/Avogar)).
* Fix the password being displayed in `system.query_log` for users with bcrypt password authentication method. [#70148](https://github.com/ClickHouse/ClickHouse/pull/70148) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix event counter for the native interface (InterfaceNativeSendBytes). [#70153](https://github.com/ClickHouse/ClickHouse/pull/70153) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix possible crash related to JSON columns. [#70172](https://github.com/ClickHouse/ClickHouse/pull/70172) ([Pavel Kruglov](https://github.com/Avogar)).
* Fix multiple issues with arrayMin and arrayMax. [#70207](https://github.com/ClickHouse/ClickHouse/pull/70207) ([Raúl Marín](https://github.com/Algunenano)).
* Respect setting allow_simdjson in the JSON type parser. [#70218](https://github.com/ClickHouse/ClickHouse/pull/70218) ([Pavel Kruglov](https://github.com/Avogar)).
* Fix a null pointer dereference on creating a materialized view with two selects and an `INTERSECT`, e.g. `CREATE MATERIALIZED VIEW v0 AS (SELECT 1) INTERSECT (SELECT 1);`. [#70264](https://github.com/ClickHouse/ClickHouse/pull/70264) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Don't modify global settings with startup scripts. Previously, changing a setting in a startup script would change it globally. [#70310](https://github.com/ClickHouse/ClickHouse/pull/70310) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix ALTER of `Dynamic` type with reducing max_types parameter that could lead to server crash. [#70328](https://github.com/ClickHouse/ClickHouse/pull/70328) ([Pavel Kruglov](https://github.com/Avogar)).
* Fix crash when using WITH FILL incorrectly. [#70338](https://github.com/ClickHouse/ClickHouse/pull/70338) ([Raúl Marín](https://github.com/Algunenano)).
* Fix possible use-after-free in `SYSTEM DROP FORMAT SCHEMA CACHE FOR Protobuf`. [#70358](https://github.com/ClickHouse/ClickHouse/pull/70358) ([Azat Khuzhin](https://github.com/azat)).
* Fix crash during GROUP BY JSON sub-object subcolumn. [#70374](https://github.com/ClickHouse/ClickHouse/pull/70374) ([Pavel Kruglov](https://github.com/Avogar)).
* Don't prefetch parts for vertical merges if part has no rows. [#70452](https://github.com/ClickHouse/ClickHouse/pull/70452) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix crash in WHERE with lambda functions. [#70464](https://github.com/ClickHouse/ClickHouse/pull/70464) ([Raúl Marín](https://github.com/Algunenano)).
* Fix table creation with `CREATE ... AS table_function(...)` with database `Replicated` and unavailable table function source on secondary replica. [#70511](https://github.com/ClickHouse/ClickHouse/pull/70511) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Ignore all output on async insert with `wait_for_async_insert=1`. Closes [#62644](https://github.com/ClickHouse/ClickHouse/issues/62644). [#70530](https://github.com/ClickHouse/ClickHouse/pull/70530) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Ignore frozen_metadata.txt while traversing shadow directory from system.remote_data_paths. [#70590](https://github.com/ClickHouse/ClickHouse/pull/70590) ([Aleksei Filatov](https://github.com/aalexfvk)).
* Fix creation of stateful window functions on misaligned memory. [#70631](https://github.com/ClickHouse/ClickHouse/pull/70631) ([Raúl Marín](https://github.com/Algunenano)).
* Fixed rare crashes in `SELECT`-s and merges after adding a column of `Array` type with non-empty default expression. [#70695](https://github.com/ClickHouse/ClickHouse/pull/70695) ([Anton Popov](https://github.com/CurtizJ)).
* Insert into table function s3 will respect query settings. [#70696](https://github.com/ClickHouse/ClickHouse/pull/70696) ([Vladimir Cherkasov](https://github.com/vdimir)).
* Fix infinite recursion when inferring a protobuf schema when skipping unsupported fields is enabled. [#70697](https://github.com/ClickHouse/ClickHouse/pull/70697) ([Raúl Marín](https://github.com/Algunenano)).
* Disable enable_named_columns_in_function_tuple by default. [#70833](https://github.com/ClickHouse/ClickHouse/pull/70833) ([Raúl Marín](https://github.com/Algunenano)).
* Fix S3Queue table engine setting processing_threads_num not being effective in case it was deduced from the number of cpu cores on the server. [#70837](https://github.com/ClickHouse/ClickHouse/pull/70837) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Normalize named tuple arguments in aggregation states. This fixes [#69732](https://github.com/ClickHouse/ClickHouse/issues/69732) . [#70853](https://github.com/ClickHouse/ClickHouse/pull/70853) ([Amos Bird](https://github.com/amosbird)).
* Fix a logical error due to negative zeros in the two-level hash table. This closes [#70973](https://github.com/ClickHouse/ClickHouse/issues/70973). [#70979](https://github.com/ClickHouse/ClickHouse/pull/70979) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix `limit by`, `limit with ties` for distributed and parallel replicas. [#70880](https://github.com/ClickHouse/ClickHouse/pull/70880) ([Nikita Taranov](https://github.com/nickitat)).
### <a id="249"></a> ClickHouse release 24.9, 2024-09-26
#### Backward Incompatible Change
@ -328,6 +488,7 @@
* Remove `is_deterministic` field from the `system.functions` table. [#66630](https://github.com/ClickHouse/ClickHouse/pull/66630) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Function `tuple` will now try to construct named tuples in query (controlled by `enable_named_columns_in_function_tuple`). Introduce function `tupleNames` to extract names from tuples. [#54881](https://github.com/ClickHouse/ClickHouse/pull/54881) ([Amos Bird](https://github.com/amosbird)).
* Change how deduplication for Materialized Views works. Fixed a lot of cases like: - on destination table: data is split for 2 or more blocks and that blocks is considered as duplicate when that block is inserted in parallel. - on MV destination table: the equal blocks are deduplicated, that happens when MV often produces equal data as a result for different input data due to performing aggregation. - on MV destination table: the equal blocks which comes from different MV are deduplicated. [#61601](https://github.com/ClickHouse/ClickHouse/pull/61601) ([Sema Checherinda](https://github.com/CheSema)).
* Functions `bitShiftLeft` and `bitShitfRight` return an error for out of bounds shift positions [#65838](https://github.com/ClickHouse/ClickHouse/pull/65838) ([Pablo Marcos](https://github.com/pamarcos)).
#### New Feature
* Add `ASOF JOIN` support for `full_sorting_join` algorithm. [#55051](https://github.com/ClickHouse/ClickHouse/pull/55051) ([vdimir](https://github.com/vdimir)).
@ -439,7 +600,6 @@
* Functions `bitTest`, `bitTestAll`, and `bitTestAny` now return an error if the specified bit index is out-of-bounds [#65818](https://github.com/ClickHouse/ClickHouse/pull/65818) ([Pablo Marcos](https://github.com/pamarcos)).
* Setting `join_any_take_last_row` is supported in any query with hash join. [#65820](https://github.com/ClickHouse/ClickHouse/pull/65820) ([vdimir](https://github.com/vdimir)).
* Better handling of join conditions involving `IS NULL` checks (for example `ON (a = b AND (a IS NOT NULL) AND (b IS NOT NULL) ) OR ( (a IS NULL) AND (b IS NULL) )` is rewritten to `ON a <=> b`), fix incorrect optimization when condition other then `IS NULL` are present. [#65835](https://github.com/ClickHouse/ClickHouse/pull/65835) ([vdimir](https://github.com/vdimir)).
* Functions `bitShiftLeft` and `bitShitfRight` return an error for out of bounds shift positions [#65838](https://github.com/ClickHouse/ClickHouse/pull/65838) ([Pablo Marcos](https://github.com/pamarcos)).
* Fix growing memory usage in S3Queue. [#65839](https://github.com/ClickHouse/ClickHouse/pull/65839) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix tie handling in `arrayAUC` to match sklearn. [#65840](https://github.com/ClickHouse/ClickHouse/pull/65840) ([gabrielmcg44](https://github.com/gabrielmcg44)).
* Fix possible issues with MySQL server protocol TLS connections. [#65917](https://github.com/ClickHouse/ClickHouse/pull/65917) ([Azat Khuzhin](https://github.com/azat)).

View File

@ -88,6 +88,7 @@ string (TOUPPER ${CMAKE_BUILD_TYPE} CMAKE_BUILD_TYPE_UC)
list(REVERSE CMAKE_FIND_LIBRARY_SUFFIXES)
option (ENABLE_FUZZING "Fuzzy testing using libfuzzer" OFF)
option (ENABLE_FUZZER_TEST "Build testing fuzzers in order to test libFuzzer functionality" OFF)
if (ENABLE_FUZZING)
# Also set WITH_COVERAGE=1 for better fuzzing process

View File

@ -42,31 +42,20 @@ Keep an eye out for upcoming meetups and events around the world. Somewhere else
Upcoming meetups
* [Jakarta Meetup](https://www.meetup.com/clickhouse-indonesia-user-group/events/303191359/) - October 1
* [Singapore Meetup](https://www.meetup.com/clickhouse-singapore-meetup-group/events/303212064/) - October 3
* [Madrid Meetup](https://www.meetup.com/clickhouse-spain-user-group/events/303096564/) - October 22
* [Oslo Meetup](https://www.meetup.com/open-source-real-time-data-warehouse-real-time-analytics/events/302938622) - October 31
* [Barcelona Meetup](https://www.meetup.com/clickhouse-spain-user-group/events/303096876/) - November 12
* [Ghent Meetup](https://www.meetup.com/clickhouse-belgium-user-group/events/303049405/) - November 19
* [Dubai Meetup](https://www.meetup.com/clickhouse-dubai-meetup-group/events/303096989/) - November 21
* [Paris Meetup](https://www.meetup.com/clickhouse-france-user-group/events/303096434) - November 26
* [Amsterdam Meetup](https://www.meetup.com/clickhouse-netherlands-user-group/events/303638814) - December 3
* [Stockholm Meetup](https://www.meetup.com/clickhouse-stockholm-user-group/events/304382411) - December 9
* [New York Meetup](https://www.meetup.com/clickhouse-new-york-user-group/events/304268174) - December 9
* [San Francisco Meetup](https://www.meetup.com/clickhouse-silicon-valley-meetup-group/events/304286951/) - December 12
Recently completed meetups
* [ClickHouse Guangzhou User Group Meetup](https://mp.weixin.qq.com/s/GSvo-7xUoVzCsuUvlLTpCw) - August 25
* [Seattle Meetup (Statsig)](https://www.meetup.com/clickhouse-seattle-user-group/events/302518075/) - August 27
* [Melbourne Meetup](https://www.meetup.com/clickhouse-australia-user-group/events/302732666/) - August 27
* [Sydney Meetup](https://www.meetup.com/clickhouse-australia-user-group/events/302862966/) - September 5
* [Zurich Meetup](https://www.meetup.com/clickhouse-switzerland-meetup-group/events/302267429/) - September 5
* [San Francisco Meetup (Cloudflare)](https://www.meetup.com/clickhouse-silicon-valley-meetup-group/events/302540575) - September 5
* [Raleigh Meetup (Deutsche Bank)](https://www.meetup.com/triangletechtalks/events/302723486/) - September 9
* [New York Meetup (Rokt)](https://www.meetup.com/clickhouse-new-york-user-group/events/302575342) - September 10
* [Toronto Meetup (Shopify)](https://www.meetup.com/clickhouse-toronto-user-group/events/301490855/) - September 10
* [Chicago Meetup (Jump Capital)](https://lu.ma/43tvmrfw) - September 12
* [London Meetup](https://www.meetup.com/clickhouse-london-user-group/events/302977267) - September 17
* [Austin Meetup](https://www.meetup.com/clickhouse-austin-user-group/events/302558689/) - September 17
* [Bangalore Meetup](https://www.meetup.com/clickhouse-bangalore-user-group/events/303208274/) - September 18
* [Tel Aviv Meetup](https://www.meetup.com/clickhouse-meetup-israel/events/303095121) - September 22
* [Madrid Meetup](https://www.meetup.com/clickhouse-spain-user-group/events/303096564/) - October 22
* [Singapore Meetup](https://www.meetup.com/clickhouse-singapore-meetup-group/events/303212064/) - October 3
* [Jakarta Meetup](https://www.meetup.com/clickhouse-indonesia-user-group/events/303191359/) - October 1
## Recent Recordings
* **Recent Meetup Videos**: [Meetup Playlist](https://www.youtube.com/playlist?list=PL0Z2YDlm0b3iNDUzpY1S3L_iV4nARda_U) Whenever possible recordings of the ClickHouse Community Meetups are edited and presented as individual talks. Current featuring "Modern SQL in 2023", "Fast, Concurrent, and Consistent Asynchronous INSERTS in ClickHouse", and "Full-Text Indices: Design and Experiments"

View File

@ -14,9 +14,10 @@ The following versions of ClickHouse server are currently supported with securit
| Version | Supported |
|:-|:-|
| 24.10 | ✔️ |
| 24.9 | ✔️ |
| 24.8 | ✔️ |
| 24.7 | ✔️ |
| 24.7 | |
| 24.6 | ❌ |
| 24.5 | ❌ |
| 24.4 | ❌ |

View File

@ -86,7 +86,7 @@ using StringRefs = std::vector<StringRef>;
* For more information, see hash_map_string_2.cpp
*/
inline bool compare8(const char * p1, const char * p2)
inline bool compare16(const char * p1, const char * p2)
{
return 0xFFFF == _mm_movemask_epi8(_mm_cmpeq_epi8(
_mm_loadu_si128(reinterpret_cast<const __m128i *>(p1)),
@ -115,7 +115,7 @@ inline bool compare64(const char * p1, const char * p2)
#elif defined(__aarch64__) && defined(__ARM_NEON)
inline bool compare8(const char * p1, const char * p2)
inline bool compare16(const char * p1, const char * p2)
{
uint64_t mask = getNibbleMask(vceqq_u8(
vld1q_u8(reinterpret_cast<const unsigned char *>(p1)), vld1q_u8(reinterpret_cast<const unsigned char *>(p2))));
@ -185,13 +185,22 @@ inline bool memequalWide(const char * p1, const char * p2, size_t size)
switch (size / 16) // NOLINT(bugprone-switch-missing-default-case)
{
case 3: if (!compare8(p1 + 32, p2 + 32)) return false; [[fallthrough]];
case 2: if (!compare8(p1 + 16, p2 + 16)) return false; [[fallthrough]];
case 1: if (!compare8(p1, p2)) return false; [[fallthrough]];
case 3:
if (!compare16(p1 + 32, p2 + 32))
return false;
[[fallthrough]];
case 2:
if (!compare16(p1 + 16, p2 + 16))
return false;
[[fallthrough]];
case 1:
if (!compare16(p1, p2))
return false;
[[fallthrough]];
default: ;
}
return compare8(p1 + size - 16, p2 + size - 16);
return compare16(p1 + size - 16, p2 + size - 16);
}
#endif
@ -369,11 +378,15 @@ namespace PackedZeroTraits
{
template <typename Second, template <typename, typename> class PackedPairNoInit>
inline bool check(const PackedPairNoInit<StringRef, Second> p)
{ return 0 == p.key.size; }
{
return 0 == p.key.size;
}
template <typename Second, template <typename, typename> class PackedPairNoInit>
inline void set(PackedPairNoInit<StringRef, Second> & p)
{ p.key.size = 0; }
{
p.key.size = 0;
}
}

View File

@ -4,6 +4,7 @@
#include <string>
#include <sstream>
#include <cctz/time_zone.h>
#include <fmt/core.h>
inline std::string to_string(const std::time_t & time)
@ -11,18 +12,6 @@ inline std::string to_string(const std::time_t & time)
return cctz::format("%Y-%m-%d %H:%M:%S", std::chrono::system_clock::from_time_t(time), cctz::local_time_zone());
}
template <typename Clock, typename Duration = typename Clock::duration>
std::string to_string(const std::chrono::time_point<Clock, Duration> & tp)
{
// Don't use DateLUT because it shows weird characters for
// TimePoint::max(). I wish we could use C++20 format, but it's not
// there yet.
// return DateLUT::instance().timeToString(std::chrono::system_clock::to_time_t(tp));
auto in_time_t = std::chrono::system_clock::to_time_t(tp);
return to_string(in_time_t);
}
template <typename Rep, typename Period = std::ratio<1>>
std::string to_string(const std::chrono::duration<Rep, Period> & duration)
{
@ -33,6 +22,20 @@ std::string to_string(const std::chrono::duration<Rep, Period> & duration)
return std::to_string(seconds_as_double.count()) + "s";
}
template <typename Clock, typename Duration = typename Clock::duration>
std::string to_string(const std::chrono::time_point<Clock, Duration> & tp)
{
// Don't use DateLUT because it shows weird characters for
// TimePoint::max(). I wish we could use C++20 format, but it's not
// there yet.
// return DateLUT::instance().timeToString(std::chrono::system_clock::to_time_t(tp));
if constexpr (std::is_same_v<Clock, std::chrono::system_clock>)
return to_string(std::chrono::system_clock::to_time_t(tp));
else
return to_string(tp.time_since_epoch());
}
template <typename Clock, typename Duration = typename Clock::duration>
std::ostream & operator<<(std::ostream & o, const std::chrono::time_point<Clock, Duration> & tp)
{
@ -44,3 +47,23 @@ std::ostream & operator<<(std::ostream & o, const std::chrono::duration<Rep, Per
{
return o << to_string(duration);
}
template <typename Clock, typename Duration>
struct fmt::formatter<std::chrono::time_point<Clock, Duration>> : fmt::formatter<std::string>
{
template <typename FormatCtx>
auto format(const std::chrono::time_point<Clock, Duration> & tp, FormatCtx & ctx) const
{
return fmt::formatter<std::string>::format(::to_string(tp), ctx);
}
};
template <typename Rep, typename Period>
struct fmt::formatter<std::chrono::duration<Rep, Period>> : fmt::formatter<std::string>
{
template <typename FormatCtx>
auto format(const std::chrono::duration<Rep, Period> & duration, FormatCtx & ctx) const
{
return fmt::formatter<std::string>::format(::to_string(duration), ctx);
}
};

View File

@ -25,9 +25,10 @@
// We don't have libc struct available here.
// Compute aux vector manually (from /proc/self/auxv).
//
// Right now there is only 51 AT_* constants,
// so 64 should be enough until this implementation will be replaced with musl.
static unsigned long __auxv_procfs[64];
// Right now there are 51 AT_* constants. Custom kernels have been encountered
// making use of up to 71. 128 should be enough until this implementation is
// replaced with musl.
static unsigned long __auxv_procfs[128];
static unsigned long __auxv_secure = 0;
// Common
static unsigned long * __auxv_environ = NULL;

View File

@ -952,6 +952,8 @@ private:
static std::pair<LoggerMapIterator, bool> add(Logger * pLogger);
static std::optional<LoggerMapIterator> find(const std::string & name);
static Logger * findRawPtr(const std::string & name);
void unsafeSetChannel(Channel * pChannel);
Channel* unsafeGetChannel() const;
Logger();
Logger(const Logger &);

View File

@ -61,6 +61,13 @@ Logger::~Logger()
void Logger::setChannel(Channel* pChannel)
{
std::lock_guard<std::mutex> lock(getLoggerMutex());
unsafeSetChannel(pChannel);
}
void Logger::unsafeSetChannel(Channel* pChannel)
{
if (_pChannel) _pChannel->release();
_pChannel = pChannel;
@ -69,6 +76,14 @@ void Logger::setChannel(Channel* pChannel)
Channel* Logger::getChannel() const
{
std::lock_guard<std::mutex> lock(getLoggerMutex());
return unsafeGetChannel();
}
Channel* Logger::unsafeGetChannel() const
{
return _pChannel;
}
@ -89,7 +104,7 @@ void Logger::setLevel(const std::string& level)
void Logger::setProperty(const std::string& name, const std::string& value)
{
if (name == "channel")
setChannel(LoggingRegistry::defaultRegistry().channelForName(value));
unsafeSetChannel(LoggingRegistry::defaultRegistry().channelForName(value));
else if (name == "level")
setLevel(value);
else
@ -160,7 +175,7 @@ void Logger::setChannel(const std::string& name, Channel* pChannel)
if (len == 0 ||
(it.first.compare(0, len, name) == 0 && (it.first.length() == len || it.first[len] == '.')))
{
it.second.logger->setChannel(pChannel);
it.second.logger->unsafeSetChannel(pChannel);
}
}
}
@ -393,7 +408,7 @@ std::pair<Logger::LoggerMapIterator, bool> Logger::unsafeGet(const std::string&
else
{
Logger& par = parent(name);
logger = new Logger(name, par.getChannel(), par.getLevel());
logger = new Logger(name, par.unsafeGetChannel(), par.getLevel());
}
return add(logger);

1
ci/README.md Normal file
View File

@ -0,0 +1 @@
Note: This directory is under active development for CI improvements and is not currently in use within the scope of the existing CI pipeline.

View File

@ -0,0 +1,109 @@
# docker build -t clickhouse/fasttest .
FROM ubuntu:22.04
# ARG for quick switch to a given ubuntu mirror
ARG apt_archive="http://archive.ubuntu.com"
RUN sed -i "s|http://archive.ubuntu.com|$apt_archive|g" /etc/apt/sources.list
ENV DEBIAN_FRONTEND=noninteractive LLVM_VERSION=18
RUN apt-get update \
&& apt-get install \
apt-transport-https \
apt-utils \
ca-certificates \
curl \
gnupg \
lsb-release \
wget \
git \
--yes --no-install-recommends --verbose-versions \
&& export LLVM_PUBKEY_HASH="bda960a8da687a275a2078d43c111d66b1c6a893a3275271beedf266c1ff4a0cdecb429c7a5cccf9f486ea7aa43fd27f" \
&& wget -nv -O /tmp/llvm-snapshot.gpg.key https://apt.llvm.org/llvm-snapshot.gpg.key \
&& echo "${LLVM_PUBKEY_HASH} /tmp/llvm-snapshot.gpg.key" | sha384sum -c \
&& apt-key add /tmp/llvm-snapshot.gpg.key \
&& export CODENAME="$(lsb_release --codename --short | tr 'A-Z' 'a-z')" \
&& echo "deb https://apt.llvm.org/${CODENAME}/ llvm-toolchain-${CODENAME}-${LLVM_VERSION} main" >> \
/etc/apt/sources.list \
&& apt-get update \
&& apt-get install --yes --no-install-recommends --verbose-versions llvm-${LLVM_VERSION} \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /var/cache/debconf /tmp/*
# moreutils - provides ts fo FT
# expect, bzip2 - requried by FT
# bsdmainutils - provides hexdump for FT
# nasm - nasm copiler for one of submodules, required from normal build
# yasm - asssembler for libhdfs3, required from normal build
RUN apt-get update \
&& apt-get install \
clang-${LLVM_VERSION} \
cmake \
libclang-${LLVM_VERSION}-dev \
libclang-rt-${LLVM_VERSION}-dev \
lld-${LLVM_VERSION} \
llvm-${LLVM_VERSION}-dev \
lsof \
ninja-build \
python3 \
python3-pip \
zstd \
moreutils \
expect \
bsdmainutils \
pv \
jq \
bzip2 \
nasm \
yasm \
--yes --no-install-recommends \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /var/cache/debconf /tmp/*
COPY --from=clickhouse/cctools:0d6b90a7a490 /opt/gdb /opt/gdb
# Give suid to gdb to grant it attach permissions
RUN chmod u+s /opt/gdb/bin/gdb
ENV PATH="/opt/gdb/bin:${PATH}"
# This symlink is required by gcc to find the lld linker
RUN ln -s /usr/bin/lld-${LLVM_VERSION} /usr/bin/ld.lld
# FIXME: workaround for "The imported target "merge-fdata" references the file" error
# https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/commit/992e52c0b156a5ba9c6a8a54f8c4857ddd3d371d
RUN sed -i '/_IMPORT_CHECK_FILES_FOR_\(mlir-\|llvm-bolt\|merge-fdata\|MLIR\)/ {s|^|#|}' /usr/lib/llvm-${LLVM_VERSION}/lib/cmake/llvm/LLVMExports-*.cmake
# LLVM changes paths for compiler-rt libraries. For some reason clang-18.1.8 cannot catch up libraries from default install path.
# It's very dirty workaround, better to build compiler and LLVM ourself and use it. Details: https://github.com/llvm/llvm-project/issues/95792
RUN test ! -d /usr/lib/llvm-18/lib/clang/18/lib/x86_64-pc-linux-gnu || ln -s /usr/lib/llvm-18/lib/clang/18/lib/x86_64-pc-linux-gnu /usr/lib/llvm-18/lib/clang/18/lib/x86_64-unknown-linux-gnu
ARG TARGETARCH
ARG SCCACHE_VERSION=v0.7.7
ENV SCCACHE_IGNORE_SERVER_IO_ERROR=1
# sccache requires a value for the region. So by default we use The Default Region
ENV SCCACHE_REGION=us-east-1
RUN arch=${TARGETARCH} \
&& case $arch in \
amd64) rarch=x86_64 ;; \
arm64) rarch=aarch64 ;; \
esac \
&& curl -Ls "https://github.com/mozilla/sccache/releases/download/$SCCACHE_VERSION/sccache-$SCCACHE_VERSION-$rarch-unknown-linux-musl.tar.gz" | \
tar xz -C /tmp \
&& mv "/tmp/sccache-$SCCACHE_VERSION-$rarch-unknown-linux-musl/sccache" /usr/bin \
&& rm "/tmp/sccache-$SCCACHE_VERSION-$rarch-unknown-linux-musl" -r
COPY requirements.txt /
RUN pip3 install --no-cache-dir -r /requirements.txt
# chmod 777 to make the container user independent
RUN mkdir -p /var/lib/clickhouse \
&& chmod 777 /var/lib/clickhouse
ENV TZ=Europe/Amsterdam
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN groupadd --system --gid 1000 clickhouse \
&& useradd --system --gid 1000 --uid 1000 -m clickhouse \
&& mkdir -p /.cache/sccache && chmod 777 /.cache/sccache
ENV PYTHONPATH="/wd"
ENV PYTHONUNBUFFERED=1

View File

@ -0,0 +1,6 @@
Jinja2==3.1.3
numpy==1.26.4
requests==2.32.3
pandas==1.5.3
scipy==1.12.0
#https://clickhouse-builds.s3.amazonaws.com/packages/praktika-0.1-py3-none-any.whl

View File

@ -0,0 +1,5 @@
requests==2.32.3
yamllint==1.26.3
codespell==2.2.1
#use praktika from CH repo
#https://clickhouse-builds.s3.amazonaws.com/packages/praktika-0.1-py3-none-any.whl

102
ci/jobs/build_clickhouse.py Normal file
View File

@ -0,0 +1,102 @@
import argparse
from praktika.result import Result
from praktika.settings import Settings
from praktika.utils import MetaClasses, Shell, Utils
class JobStages(metaclass=MetaClasses.WithIter):
CHECKOUT_SUBMODULES = "checkout"
CMAKE = "cmake"
BUILD = "build"
def parse_args():
parser = argparse.ArgumentParser(description="ClickHouse Build Job")
parser.add_argument("BUILD_TYPE", help="Type: <amd|arm_debug|release_sanitizer>")
parser.add_argument("--param", help="Optional custom job start stage", default=None)
return parser.parse_args()
def main():
args = parse_args()
stop_watch = Utils.Stopwatch()
stages = list(JobStages)
stage = args.param or JobStages.CHECKOUT_SUBMODULES
if stage:
assert stage in JobStages, f"--param must be one of [{list(JobStages)}]"
print(f"Job will start from stage [{stage}]")
while stage in stages:
stages.pop(0)
stages.insert(0, stage)
cmake_build_type = "Release"
sanitizer = ""
if "debug" in args.BUILD_TYPE.lower():
print("Build type set: debug")
cmake_build_type = "Debug"
if "asan" in args.BUILD_TYPE.lower():
print("Sanitizer set: address")
sanitizer = "address"
# if Environment.is_local_run():
# build_cache_type = "disabled"
# else:
build_cache_type = "sccache"
current_directory = Utils.cwd()
build_dir = f"{Settings.TEMP_DIR}/build"
res = True
results = []
if res and JobStages.CHECKOUT_SUBMODULES in stages:
Shell.check(f"rm -rf {build_dir} && mkdir -p {build_dir}")
results.append(
Result.create_from_command_execution(
name="Checkout Submodules",
command=f"git submodule sync --recursive && git submodule init && git submodule update --depth 1 --recursive --jobs {min([Utils.cpu_count(), 20])}",
)
)
res = results[-1].is_ok()
if res and JobStages.CMAKE in stages:
results.append(
Result.create_from_command_execution(
name="Cmake configuration",
command=f"cmake --debug-trycompile -DCMAKE_VERBOSE_MAKEFILE=1 -LA -DCMAKE_BUILD_TYPE={cmake_build_type} \
-DSANITIZE={sanitizer} -DENABLE_CHECK_HEAVY_BUILDS=1 -DENABLE_CLICKHOUSE_SELF_EXTRACTING=1 -DENABLE_TESTS=0 \
-DENABLE_UTILS=0 -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON -DCMAKE_INSTALL_PREFIX=/usr \
-DCMAKE_INSTALL_SYSCONFDIR=/etc -DCMAKE_INSTALL_LOCALSTATEDIR=/var -DCMAKE_SKIP_INSTALL_ALL_DEPENDENCY=ON \
-DCMAKE_C_COMPILER=clang-18 -DCMAKE_CXX_COMPILER=clang++-18 -DCOMPILER_CACHE={build_cache_type} -DENABLE_TESTS=1 \
-DENABLE_BUILD_PROFILING=1 {current_directory}",
workdir=build_dir,
with_log=True,
)
)
res = results[-1].is_ok()
if res and JobStages.BUILD in stages:
Shell.check("sccache --show-stats")
results.append(
Result.create_from_command_execution(
name="Build ClickHouse",
command="ninja clickhouse-bundle clickhouse-odbc-bridge clickhouse-library-bridge",
workdir=build_dir,
with_log=True,
)
)
Shell.check("sccache --show-stats")
Shell.check(f"ls -l {build_dir}/programs/")
res = results[-1].is_ok()
Result.create_from(results=results, stopwatch=stop_watch).finish_job_accordingly()
if __name__ == "__main__":
main()

View File

@ -2,7 +2,6 @@ import math
import multiprocessing
import os
import re
import sys
from concurrent.futures import ProcessPoolExecutor
from pathlib import Path
@ -51,25 +50,6 @@ def run_check_concurrent(check_name, check_function, files, nproc=NPROC):
return result
def run_simple_check(check_name, check_function, **kwargs):
stop_watch = Utils.Stopwatch()
error = check_function(**kwargs)
result = Result(
name=check_name,
status=Result.Status.SUCCESS if not error else Result.Status.FAILED,
start_time=stop_watch.start_time,
duration=stop_watch.duration,
info=error,
)
return result
def run_check(check_name, check_function, files):
return run_check_concurrent(check_name, check_function, files, nproc=1)
def check_duplicate_includes(file_path):
includes = []
with open(file_path, "r", encoding="utf-8", errors="ignore") as f:
@ -88,7 +68,7 @@ def check_duplicate_includes(file_path):
def check_whitespaces(file_paths):
for file in file_paths:
exit_code, out, err = Shell.get_res_stdout_stderr(
f'./ci_v2/jobs/scripts/check_style/double_whitespaces.pl "{file}"',
f'./ci/jobs/scripts/check_style/double_whitespaces.pl "{file}"',
verbose=False,
)
if out or err:
@ -117,7 +97,7 @@ def check_xmllint(file_paths):
def check_functional_test_cases(files):
"""
Queries with event_date should have yesterday() not today()
NOTE: it is not that accuate, but at least something.
NOTE: it is not that accurate, but at least something.
"""
patterns = [
@ -194,7 +174,7 @@ def check_broken_links(path, exclude_paths):
def check_cpp_code():
res, out, err = Shell.get_res_stdout_stderr(
"./ci_v2/jobs/scripts/check_style/check_cpp.sh"
"./ci/jobs/scripts/check_style/check_cpp.sh"
)
if err:
out += err
@ -203,7 +183,7 @@ def check_cpp_code():
def check_repo_submodules():
res, out, err = Shell.get_res_stdout_stderr(
"./ci_v2/jobs/scripts/check_style/check_submodules.sh"
"./ci/jobs/scripts/check_style/check_submodules.sh"
)
if err:
out += err
@ -212,7 +192,7 @@ def check_repo_submodules():
def check_other():
res, out, err = Shell.get_res_stdout_stderr(
"./ci_v2/jobs/scripts/check_style/checks_to_refactor.sh"
"./ci/jobs/scripts/check_style/checks_to_refactor.sh"
)
if err:
out += err
@ -221,7 +201,7 @@ def check_other():
def check_codespell():
res, out, err = Shell.get_res_stdout_stderr(
"./ci_v2/jobs/scripts/check_style/check_typos.sh"
"./ci/jobs/scripts/check_style/check_typos.sh"
)
if err:
out += err
@ -230,7 +210,7 @@ def check_codespell():
def check_aspell():
res, out, err = Shell.get_res_stdout_stderr(
"./ci_v2/jobs/scripts/check_style/check_aspell.sh"
"./ci/jobs/scripts/check_style/check_aspell.sh"
)
if err:
out += err
@ -239,7 +219,7 @@ def check_aspell():
def check_mypy():
res, out, err = Shell.get_res_stdout_stderr(
"./ci_v2/jobs/scripts/check_style/check-mypy"
"./ci/jobs/scripts/check_style/check-mypy"
)
if err:
out += err
@ -248,7 +228,7 @@ def check_mypy():
def check_pylint():
res, out, err = Shell.get_res_stdout_stderr(
"./ci_v2/jobs/scripts/check_style/check-pylint"
"./ci/jobs/scripts/check_style/check-pylint"
)
if err:
out += err
@ -345,66 +325,58 @@ if __name__ == "__main__":
)
)
results.append(
run_check(
check_name="Check Tests Numbers",
check_function=check_gaps_in_tests_numbers,
files=functional_test_files,
Result.create_from_command_execution(
name="Check Tests Numbers",
command=check_gaps_in_tests_numbers,
command_args=[functional_test_files],
)
)
results.append(
run_simple_check(
check_name="Check Broken Symlinks",
check_function=check_broken_links,
path="./",
exclude_paths=["contrib/", "metadata/", "programs/server/data"],
Result.create_from_command_execution(
name="Check Broken Symlinks",
command=check_broken_links,
command_kwargs={
"path": "./",
"exclude_paths": ["contrib/", "metadata/", "programs/server/data"],
},
)
)
results.append(
run_simple_check(
check_name="Check CPP code",
check_function=check_cpp_code,
Result.create_from_command_execution(
name="Check CPP code",
command=check_cpp_code,
)
)
results.append(
run_simple_check(
check_name="Check Submodules",
check_function=check_repo_submodules,
Result.create_from_command_execution(
name="Check Submodules",
command=check_repo_submodules,
)
)
results.append(
run_check(
check_name="Check File Names",
check_function=check_file_names,
files=all_files,
Result.create_from_command_execution(
name="Check File Names",
command=check_file_names,
command_args=[all_files],
)
)
results.append(
run_simple_check(
check_name="Check Many Different Things",
check_function=check_other,
Result.create_from_command_execution(
name="Check Many Different Things",
command=check_other,
)
)
results.append(
run_simple_check(
check_name="Check Codespell",
check_function=check_codespell,
Result.create_from_command_execution(
name="Check Codespell",
command=check_codespell,
)
)
results.append(
run_simple_check(
check_name="Check Aspell",
check_function=check_aspell,
Result.create_from_command_execution(
name="Check Aspell",
command=check_aspell,
)
)
res = Result.create_from(results=results, stopwatch=stop_watch).dump()
if not res.is_ok():
print("Style check: failed")
for result in results:
if not result.is_ok():
print("Failed check:")
print(" | ", result)
sys.exit(1)
else:
print("Style check: ok")
Result.create_from(results=results, stopwatch=stop_watch).finish_job_accordingly()

337
ci/jobs/fast_test.py Normal file
View File

@ -0,0 +1,337 @@
import argparse
import threading
from pathlib import Path
from praktika.result import Result
from praktika.settings import Settings
from praktika.utils import MetaClasses, Shell, Utils
from ci.jobs.scripts.functional_tests_results import FTResultsProcessor
class ClickHouseProc:
def __init__(self):
self.ch_config_dir = f"{Settings.TEMP_DIR}/etc/clickhouse-server"
self.pid_file = f"{self.ch_config_dir}/clickhouse-server.pid"
self.config_file = f"{self.ch_config_dir}/config.xml"
self.user_files_path = f"{self.ch_config_dir}/user_files"
self.test_output_file = f"{Settings.OUTPUT_DIR}/test_result.txt"
self.command = f"clickhouse-server --config-file {self.config_file} --pid-file {self.pid_file} -- --path {self.ch_config_dir} --user_files_path {self.user_files_path} --top_level_domains_path {self.ch_config_dir}/top_level_domains --keeper_server.storage_path {self.ch_config_dir}/coordination"
self.proc = None
self.pid = 0
nproc = int(Utils.cpu_count() / 2)
self.fast_test_command = f"clickhouse-test --hung-check --fast-tests-only --no-random-settings --no-random-merge-tree-settings --no-long --testname --shard --zookeeper --check-zookeeper-session --order random --print-time --report-logs-stats --jobs {nproc} -- '' | ts '%Y-%m-%d %H:%M:%S' \
| tee -a \"{self.test_output_file}\""
# TODO: store info in case of failure
self.info = ""
self.info_file = ""
Utils.set_env("CLICKHOUSE_CONFIG_DIR", self.ch_config_dir)
Utils.set_env("CLICKHOUSE_CONFIG", self.config_file)
Utils.set_env("CLICKHOUSE_USER_FILES", self.user_files_path)
Utils.set_env("CLICKHOUSE_SCHEMA_FILES", f"{self.ch_config_dir}/format_schemas")
def start(self):
print("Starting ClickHouse server")
Shell.check(f"rm {self.pid_file}")
def run_clickhouse():
self.proc = Shell.run_async(
self.command, verbose=True, suppress_output=True
)
thread = threading.Thread(target=run_clickhouse)
thread.daemon = True # Allow program to exit even if thread is still running
thread.start()
# self.proc = Shell.run_async(self.command, verbose=True)
started = False
try:
for _ in range(5):
pid = Shell.get_output(f"cat {self.pid_file}").strip()
if not pid:
Utils.sleep(1)
continue
started = True
print(f"Got pid from fs [{pid}]")
_ = int(pid)
break
except Exception:
pass
if not started:
stdout = self.proc.stdout.read().strip() if self.proc.stdout else ""
stderr = self.proc.stderr.read().strip() if self.proc.stderr else ""
Utils.print_formatted_error("Failed to start ClickHouse", stdout, stderr)
return False
print(f"ClickHouse server started successfully, pid [{pid}]")
return True
def wait_ready(self):
res, out, err = 0, "", ""
attempts = 30
delay = 2
for attempt in range(attempts):
res, out, err = Shell.get_res_stdout_stderr(
'clickhouse-client --query "select 1"', verbose=True
)
if out.strip() == "1":
print("Server ready")
break
else:
print(f"Server not ready, wait")
Utils.sleep(delay)
else:
Utils.print_formatted_error(
f"Server not ready after [{attempts*delay}s]", out, err
)
return False
return True
def run_fast_test(self):
if Path(self.test_output_file).exists():
Path(self.test_output_file).unlink()
exit_code = Shell.run(self.fast_test_command)
return exit_code == 0
def terminate(self):
print("Terminate ClickHouse process")
timeout = 10
if self.proc:
Utils.terminate_process_group(self.proc.pid)
self.proc.terminate()
try:
self.proc.wait(timeout=10)
print(f"Process {self.proc.pid} terminated gracefully.")
except Exception:
print(
f"Process {self.proc.pid} did not terminate in {timeout} seconds, killing it..."
)
Utils.terminate_process_group(self.proc.pid, force=True)
self.proc.wait() # Wait for the process to be fully killed
print(f"Process {self.proc} was killed.")
def clone_submodules():
submodules_to_update = [
"contrib/sysroot",
"contrib/magic_enum",
"contrib/abseil-cpp",
"contrib/boost",
"contrib/zlib-ng",
"contrib/libxml2",
"contrib/libunwind",
"contrib/fmtlib",
"contrib/aklomp-base64",
"contrib/cctz",
"contrib/libcpuid",
"contrib/libdivide",
"contrib/double-conversion",
"contrib/llvm-project",
"contrib/lz4",
"contrib/zstd",
"contrib/fastops",
"contrib/rapidjson",
"contrib/re2",
"contrib/sparsehash-c11",
"contrib/croaring",
"contrib/miniselect",
"contrib/xz",
"contrib/dragonbox",
"contrib/fast_float",
"contrib/NuRaft",
"contrib/jemalloc",
"contrib/replxx",
"contrib/wyhash",
"contrib/c-ares",
"contrib/morton-nd",
"contrib/xxHash",
"contrib/expected",
"contrib/simdjson",
"contrib/liburing",
"contrib/libfiu",
"contrib/incbin",
"contrib/yaml-cpp",
]
res = Shell.check("git submodule sync", verbose=True, strict=True)
res = res and Shell.check("git submodule init", verbose=True, strict=True)
res = res and Shell.check(
command=f"xargs --max-procs={min([Utils.cpu_count(), 20])} --null --no-run-if-empty --max-args=1 git submodule update --depth 1 --single-branch",
stdin_str="\0".join(submodules_to_update) + "\0",
timeout=120,
retries=3,
verbose=True,
)
res = res and Shell.check("git submodule foreach git reset --hard", verbose=True)
res = res and Shell.check("git submodule foreach git checkout @ -f", verbose=True)
res = res and Shell.check("git submodule foreach git clean -xfd", verbose=True)
return res
def update_path_ch_config(config_file_path=""):
print("Updating path in clickhouse config")
config_file_path = (
config_file_path or f"{Settings.TEMP_DIR}/etc/clickhouse-server/config.xml"
)
ssl_config_file_path = (
f"{Settings.TEMP_DIR}/etc/clickhouse-server/config.d/ssl_certs.xml"
)
try:
with open(config_file_path, "r", encoding="utf-8") as file:
content = file.read()
with open(ssl_config_file_path, "r", encoding="utf-8") as file:
ssl_config_content = file.read()
content = content.replace(">/var/", f">{Settings.TEMP_DIR}/var/")
content = content.replace(">/etc/", f">{Settings.TEMP_DIR}/etc/")
ssl_config_content = ssl_config_content.replace(
">/etc/", f">{Settings.TEMP_DIR}/etc/"
)
with open(config_file_path, "w", encoding="utf-8") as file:
file.write(content)
with open(ssl_config_file_path, "w", encoding="utf-8") as file:
file.write(ssl_config_content)
except Exception as e:
print(f"ERROR: failed to update config, exception: {e}")
return False
return True
class JobStages(metaclass=MetaClasses.WithIter):
CHECKOUT_SUBMODULES = "checkout"
CMAKE = "cmake"
BUILD = "build"
CONFIG = "config"
TEST = "test"
def parse_args():
parser = argparse.ArgumentParser(description="ClickHouse Fast Test Job")
parser.add_argument("--param", help="Optional custom job start stage", default=None)
return parser.parse_args()
def main():
args = parse_args()
stop_watch = Utils.Stopwatch()
stages = list(JobStages)
stage = args.param or JobStages.CHECKOUT_SUBMODULES
if stage:
assert stage in JobStages, f"--param must be one of [{list(JobStages)}]"
print(f"Job will start from stage [{stage}]")
while stage in stages:
stages.pop(0)
stages.insert(0, stage)
current_directory = Utils.cwd()
build_dir = f"{Settings.TEMP_DIR}/build"
Utils.add_to_PATH(f"{build_dir}/programs:{current_directory}/tests")
res = True
results = []
if res and JobStages.CHECKOUT_SUBMODULES in stages:
Shell.check(f"rm -rf {build_dir} && mkdir -p {build_dir}")
results.append(
Result.create_from_command_execution(
name="Checkout Submodules for Minimal Build",
command=clone_submodules,
)
)
res = results[-1].is_ok()
if res and JobStages.CMAKE in stages:
results.append(
Result.create_from_command_execution(
name="Cmake configuration",
command=f"cmake {current_directory} -DCMAKE_CXX_COMPILER=clang++-18 -DCMAKE_C_COMPILER=clang-18 \
-DCMAKE_TOOLCHAIN_FILE={current_directory}/cmake/linux/toolchain-x86_64-musl.cmake -DENABLE_LIBRARIES=0 \
-DENABLE_TESTS=0 -DENABLE_UTILS=0 -DENABLE_THINLTO=0 -DENABLE_NURAFT=1 -DENABLE_SIMDJSON=1 \
-DENABLE_JEMALLOC=1 -DENABLE_LIBURING=1 -DENABLE_YAML_CPP=1 -DCOMPILER_CACHE=sccache",
workdir=build_dir,
with_log=True,
)
)
res = results[-1].is_ok()
if res and JobStages.BUILD in stages:
Shell.check("sccache --show-stats")
results.append(
Result.create_from_command_execution(
name="Build ClickHouse",
command="ninja clickhouse-bundle clickhouse-stripped",
workdir=build_dir,
with_log=True,
)
)
Shell.check("sccache --show-stats")
res = results[-1].is_ok()
if res and JobStages.BUILD in stages:
commands = [
f"mkdir -p {Settings.OUTPUT_DIR}/binaries",
f"cp ./programs/clickhouse {Settings.OUTPUT_DIR}/binaries/clickhouse",
f"zstd --threads=0 --force programs/clickhouse-stripped -o {Settings.OUTPUT_DIR}/binaries/clickhouse-stripped.zst",
"sccache --show-stats",
"clickhouse-client --version",
"clickhouse-test --help",
]
results.append(
Result.create_from_command_execution(
name="Check and Compress binary",
command=commands,
workdir=build_dir,
with_log=True,
)
)
res = results[-1].is_ok()
if res and JobStages.CONFIG in stages:
commands = [
f"rm -rf {Settings.TEMP_DIR}/etc/ && mkdir -p {Settings.TEMP_DIR}/etc/clickhouse-client {Settings.TEMP_DIR}/etc/clickhouse-server",
f"cp {current_directory}/programs/server/config.xml {current_directory}/programs/server/users.xml {Settings.TEMP_DIR}/etc/clickhouse-server/",
f"{current_directory}/tests/config/install.sh {Settings.TEMP_DIR}/etc/clickhouse-server {Settings.TEMP_DIR}/etc/clickhouse-client",
# f"cp -a {current_directory}/programs/server/config.d/log_to_console.xml {Settings.TEMP_DIR}/etc/clickhouse-server/config.d/",
f"rm -f {Settings.TEMP_DIR}/etc/clickhouse-server/config.d/secure_ports.xml",
update_path_ch_config,
]
results.append(
Result.create_from_command_execution(
name="Install ClickHouse Config",
command=commands,
with_log=True,
)
)
res = results[-1].is_ok()
CH = ClickHouseProc()
if res and JobStages.TEST in stages:
stop_watch_ = Utils.Stopwatch()
step_name = "Start ClickHouse Server"
print(step_name)
res = CH.start()
res = res and CH.wait_ready()
results.append(
Result.create_from(name=step_name, status=res, stopwatch=stop_watch_)
)
if res and JobStages.TEST in stages:
step_name = "Tests"
print(step_name)
res = res and CH.run_fast_test()
if res:
results.append(FTResultsProcessor(wd=Settings.OUTPUT_DIR).run())
CH.terminate()
Result.create_from(results=results, stopwatch=stop_watch).finish_job_accordingly()
if __name__ == "__main__":
main()

View File

@ -14,7 +14,8 @@
LC_ALL="en_US.UTF-8"
ROOT_PATH="."
EXCLUDE_DIRS='build/|integration/|widechar_width/|glibc-compatibility/|poco/|memcpy/|consistent-hashing|benchmark|tests/.*.cpp|utils/keeper-bench/example.yaml'
EXCLUDE='build/|integration/|widechar_width/|glibc-compatibility/|poco/|memcpy/|consistent-hashing|benchmark|tests/.*.cpp|utils/keeper-bench/example.yaml'
EXCLUDE_DOCS='Settings\.cpp|FormatFactorySettingsDeclaration\.h'
# From [1]:
# But since array_to_string_internal() in array.c still loops over array
@ -31,7 +32,8 @@ function in_array()
}
find $ROOT_PATH/{src,base,programs,utils} -name '*.h' -or -name '*.cpp' 2>/dev/null |
grep -vP $EXCLUDE_DIRS |
grep -vP $EXCLUDE |
grep -vP $EXCLUDE_DOCS |
xargs grep $@ -P '((class|struct|namespace|enum|if|for|while|else|throw|switch).*|\)(\s*const)?(\s*override)?\s*)\{$|\s$|^ {1,3}[^\* ]\S|\t|^\s*(if|else if|if constexpr|else if constexpr|for|while|catch|switch)\(|\( [^\s\\]|\S \)' |
# a curly brace not in a new line, but not for the case of C++11 init or agg. initialization | trailing whitespace | number of ws not a multiple of 4, but not in the case of comment continuation | missing whitespace after for/if/while... before opening brace | whitespaces inside braces
grep -v -P '(//|:\s+\*|\$\(\()| \)"'
@ -39,39 +41,19 @@ find $ROOT_PATH/{src,base,programs,utils} -name '*.h' -or -name '*.cpp' 2>/dev/n
# Tabs
find $ROOT_PATH/{src,base,programs,utils} -name '*.h' -or -name '*.cpp' 2>/dev/null |
grep -vP $EXCLUDE_DIRS |
xargs grep $@ -F $'\t'
grep -vP $EXCLUDE |
xargs grep $@ -F $'\t' && echo '^ tabs are not allowed'
# // namespace comments are unneeded
find $ROOT_PATH/{src,base,programs,utils} -name '*.h' -or -name '*.cpp' 2>/dev/null |
grep -vP $EXCLUDE_DIRS |
grep -vP $EXCLUDE |
xargs grep $@ -P '}\s*//+\s*namespace\s*'
# Broken symlinks
find -L $ROOT_PATH -type l 2>/dev/null | grep -v contrib && echo "^ Broken symlinks found"
# Duplicated or incorrect setting declarations
SETTINGS_FILE=$(mktemp)
cat $ROOT_PATH/src/Core/Settings.cpp $ROOT_PATH/src/Core/FormatFactorySettingsDeclaration.h | grep "M(" | awk '{print substr($2, 0, length($2) - 1) " " substr($1, 3, length($1) - 3) " SettingsDeclaration" }' > ${SETTINGS_FILE}
find $ROOT_PATH/{src,base,programs,utils} -name '*.h' -or -name '*.cpp' | xargs grep "extern const Settings" -T | awk '{print substr($5, 0, length($5) -1) " " substr($4, 9) " " substr($1, 0, length($1) - 1)}' >> ${SETTINGS_FILE}
# Duplicate extern declarations for settings
awk '{if (seen[$0]++) print $3 " -> " $1 ;}' ${SETTINGS_FILE} | while read line;
do
echo "Found duplicated setting declaration in: $line"
done
# Incorrect declarations for settings
for setting in $(awk '{print $1 " " $2}' ${SETTINGS_FILE} | sort | uniq | awk '{ print $1 }' | sort | uniq -d);
do
expected=$(grep "^$setting " ${SETTINGS_FILE} | grep SettingsDeclaration | awk '{ print $2 }')
grep "^$setting " ${SETTINGS_FILE} | grep -v " $expected" | awk '{ print $3 " found setting " $1 " with type " $2 }' | while read line;
do
echo "In $line but it should be $expected"
done
done
rm ${SETTINGS_FILE}
bash $ROOT_PATH/utils/check-style/check-settings-style
# Unused/Undefined/Duplicates ErrorCodes/ProfileEvents/CurrentMetrics
declare -A EXTERN_TYPES
@ -91,12 +73,14 @@ EXTERN_TYPES_EXCLUDES=(
ProfileEvents::Timer
ProfileEvents::Type
ProfileEvents::TypeEnum
ProfileEvents::ValueType
ProfileEvents::dumpToMapColumn
ProfileEvents::getProfileEvents
ProfileEvents::ThreadIdToCountersSnapshot
ProfileEvents::LOCAL_NAME
ProfileEvents::keeper_profile_events
ProfileEvents::CountersIncrement
ProfileEvents::size
CurrentMetrics::add
CurrentMetrics::sub
@ -108,6 +92,7 @@ EXTERN_TYPES_EXCLUDES=(
CurrentMetrics::values
CurrentMetrics::Value
CurrentMetrics::keeper_metrics
CurrentMetrics::size
ErrorCodes::ErrorCode
ErrorCodes::getName
@ -130,7 +115,7 @@ for extern_type in ${!EXTERN_TYPES[@]}; do
# and this matches with zkutil::CreateMode
grep -v -e 'src/Common/ZooKeeper/Types.h' -e 'src/Coordination/KeeperConstants.cpp'
} | {
grep -vP $EXCLUDE_DIRS | xargs grep -l -P "extern const $type_of_extern $allowed_chars"
grep -vP $EXCLUDE | xargs grep -l -P "extern const $type_of_extern $allowed_chars"
} | while read file; do
grep -P "extern const $type_of_extern $allowed_chars;" $file | sed -r -e "s/^.*?extern const $type_of_extern ($allowed_chars);.*?$/\1/" | while read val; do
if ! grep -q "$extern_type::$val" $file; then
@ -148,7 +133,7 @@ for extern_type in ${!EXTERN_TYPES[@]}; do
# sed -i -r "0,/(\s*)extern const $type_of_extern [$allowed_chars]+/s//\1extern const $type_of_extern $val;\n&/" $file || \
# awk '{ print; if (ns == 1) { ns = 2 }; if (ns == 2) { ns = 0; print "namespace $extern_type\n{\n extern const $type_of_extern '$val';\n}" } }; /namespace DB/ { ns = 1; };' < $file > ${file}.tmp && mv ${file}.tmp $file )
find $ROOT_PATH/{src,base,programs,utils} -name '*.h' -or -name '*.cpp' | {
grep -vP $EXCLUDE_DIRS | xargs grep -l -P "$extern_type::$allowed_chars"
grep -vP $EXCLUDE | xargs grep -l -P "$extern_type::$allowed_chars"
} | while read file; do
grep -P "$extern_type::$allowed_chars" $file | grep -P -v '^\s*//' | sed -r -e "s/^.*?$extern_type::($allowed_chars).*?$/\1/" | while read val; do
if ! grep -q "extern const $type_of_extern $val" $file; then
@ -161,7 +146,7 @@ for extern_type in ${!EXTERN_TYPES[@]}; do
# Duplicates
find $ROOT_PATH/{src,base,programs,utils} -name '*.h' -or -name '*.cpp' | {
grep -vP $EXCLUDE_DIRS | xargs grep -l -P "$extern_type::$allowed_chars"
grep -vP $EXCLUDE | xargs grep -l -P "$extern_type::$allowed_chars"
} | while read file; do
grep -P "extern const $type_of_extern $allowed_chars;" $file | sort | uniq -c | grep -v -P ' +1 ' && echo "Duplicate $extern_type in file $file"
done
@ -169,32 +154,32 @@ done
# Three or more consecutive empty lines
find $ROOT_PATH/{src,base,programs,utils} -name '*.h' -or -name '*.cpp' |
grep -vP $EXCLUDE_DIRS |
grep -vP $EXCLUDE |
while read file; do awk '/^$/ { ++i; if (i > 2) { print "More than two consecutive empty lines in file '$file'" } } /./ { i = 0 }' $file; done
# Check that every header file has #pragma once in first line
find $ROOT_PATH/{src,programs,utils} -name '*.h' |
grep -vP $EXCLUDE_DIRS |
grep -vP $EXCLUDE |
while read file; do [[ $(head -n1 $file) != '#pragma once' ]] && echo "File $file must have '#pragma once' in first line"; done
# Too many exclamation marks
find $ROOT_PATH/{src,base,programs,utils} -name '*.h' -or -name '*.cpp' |
grep -vP $EXCLUDE_DIRS |
grep -vP $EXCLUDE |
xargs grep -F '!!!' | grep -P '.' && echo "Too many exclamation marks (looks dirty, unconfident)."
# Exclamation mark in a message
find $ROOT_PATH/{src,base,programs,utils} -name '*.h' -or -name '*.cpp' |
grep -vP $EXCLUDE_DIRS |
grep -vP $EXCLUDE |
xargs grep -F '!",' | grep -P '.' && echo "No need for an exclamation mark (looks dirty, unconfident)."
# Trailing whitespaces
find $ROOT_PATH/{src,base,programs,utils} -name '*.h' -or -name '*.cpp' |
grep -vP $EXCLUDE_DIRS |
grep -vP $EXCLUDE |
xargs grep -n -P ' $' | grep -n -P '.' && echo "^ Trailing whitespaces."
# Forbid stringstream because it's easy to use them incorrectly and hard to debug possible issues
find $ROOT_PATH/{src,programs,utils} -name '*.h' -or -name '*.cpp' |
grep -vP $EXCLUDE_DIRS |
grep -vP $EXCLUDE |
xargs grep -P 'std::[io]?stringstream' | grep -v "STYLE_CHECK_ALLOW_STD_STRING_STREAM" && echo "Use WriteBufferFromOwnString or ReadBufferFromString instead of std::stringstream"
# Forbid std::cerr/std::cout in src (fine in programs/utils)
@ -204,6 +189,7 @@ std_cerr_cout_excludes=(
_fuzzer
# OK
src/Common/ProgressIndication.cpp
src/Common/ProgressTable.cpp
# only under #ifdef DBMS_HASH_MAP_DEBUG_RESIZES, that is used only in tests
src/Common/HashTable/HashTable.h
# SensitiveDataMasker::printStats()
@ -230,11 +216,10 @@ std_cerr_cout_excludes=(
)
sources_with_std_cerr_cout=( $(
find $ROOT_PATH/{src,base} -name '*.h' -or -name '*.cpp' | \
grep -vP $EXCLUDE_DIRS | \
grep -vP $EXCLUDE | \
grep -F -v $(printf -- "-e %s " "${std_cerr_cout_excludes[@]}") | \
xargs grep -F --with-filename -e std::cerr -e std::cout | cut -d: -f1 | sort -u
) )
# Exclude comments
for src in "${sources_with_std_cerr_cout[@]}"; do
# suppress stderr, since it may contain warning for #pargma once in headers
@ -279,23 +264,23 @@ fi
# Forbid std::filesystem::is_symlink and std::filesystem::read_symlink, because it's easy to use them incorrectly
find $ROOT_PATH/{src,programs,utils} -name '*.h' -or -name '*.cpp' |
grep -vP $EXCLUDE_DIRS |
grep -vP $EXCLUDE |
xargs grep -P '::(is|read)_symlink' | grep -v "STYLE_CHECK_ALLOW_STD_FS_SYMLINK" && echo "Use DB::FS::isSymlink and DB::FS::readSymlink instead"
# Forbid __builtin_unreachable(), because it's hard to debug when it becomes reachable
find $ROOT_PATH/{src,programs,utils} -name '*.h' -or -name '*.cpp' |
grep -vP $EXCLUDE_DIRS |
grep -vP $EXCLUDE |
xargs grep -P '__builtin_unreachable' && echo "Use UNREACHABLE() from defines.h instead"
# Forbid mt19937() and random_device() which are outdated and slow
find $ROOT_PATH/{src,programs,utils} -name '*.h' -or -name '*.cpp' |
grep -vP $EXCLUDE_DIRS |
grep -vP $EXCLUDE |
xargs grep -P '(std::mt19937|std::mersenne_twister_engine|std::random_device)' && echo "Use pcg64_fast (from pcg_random.h) and randomSeed (from Common/randomSeed.h) instead"
# Require checking return value of close(),
# since it can hide fd misuse and break other places.
find $ROOT_PATH/{src,programs,utils} -name '*.h' -or -name '*.cpp' |
grep -vP $EXCLUDE_DIRS |
grep -vP $EXCLUDE |
xargs grep -e ' close(.*fd' -e ' ::close(' | grep -v = && echo "Return value of close() should be checked"
# A small typo can lead to debug code in release builds, see https://github.com/ClickHouse/ClickHouse/pull/47647
@ -322,18 +307,15 @@ ls -1d $ROOT_PATH/contrib/*-cmake | xargs -I@ find @ -name 'CMakeLists.txt' -or
# Wrong spelling of abbreviations, e.g. SQL is right, Sql is wrong. XMLHttpRequest is very wrong.
find $ROOT_PATH/{src,base,programs,utils} -name '*.h' -or -name '*.cpp' |
grep -vP $EXCLUDE_DIRS |
grep -vP $EXCLUDE |
xargs grep -P 'Sql|Html|Xml|Cpu|Tcp|Udp|Http|Db|Json|Yaml' | grep -v -P 'RabbitMQ|Azure|Aws|aws|Avro|IO/S3' &&
echo "Abbreviations such as SQL, XML, HTTP, should be in all caps. For example, SQL is right, Sql is wrong. XMLHttpRequest is very wrong."
find $ROOT_PATH/{src,base,programs,utils} -name '*.h' -or -name '*.cpp' |
grep -vP $EXCLUDE_DIRS |
grep -vP $EXCLUDE |
xargs grep -F -i 'ErrorCodes::LOGICAL_ERROR, "Logical error:' &&
echo "If an exception has LOGICAL_ERROR code, there is no need to include the text 'Logical error' in the exception message, because then the phrase 'Logical error' will be printed twice."
# There shouldn't be any code snippets under GPL or LGPL
find $ROOT_PATH/{src,base,programs} -name '*.h' -or -name '*.cpp' 2>/dev/null | xargs grep -i -F 'General Public License' && echo "There shouldn't be any code snippets under GPL or LGPL"
PATTERN="allow_";
DIFF=$(comm -3 <(grep -o "\b$PATTERN\w*\b" $ROOT_PATH/src/Core/Settings.cpp | sort -u) <(grep -o -h "\b$PATTERN\w*\b" $ROOT_PATH/src/Databases/enableAllExperimentalSettings.cpp $ROOT_PATH/utils/check-style/experimental_settings_ignore.txt | sort -u));
[ -n "$DIFF" ] && echo "$DIFF" && echo "^^ Detected 'allow_*' settings that might need to be included in src/Databases/enableAllExperimentalSettings.cpp" && echo "Alternatively, consider adding an exception to utils/check-style/experimental_settings_ignore.txt"

View File

@ -0,0 +1,284 @@
import dataclasses
from typing import List
from praktika.environment import Environment
from praktika.result import Result
OK_SIGN = "[ OK "
FAIL_SIGN = "[ FAIL "
TIMEOUT_SIGN = "[ Timeout! "
UNKNOWN_SIGN = "[ UNKNOWN "
SKIPPED_SIGN = "[ SKIPPED "
HUNG_SIGN = "Found hung queries in processlist"
SERVER_DIED_SIGN = "Server died, terminating all processes"
SERVER_DIED_SIGN2 = "Server does not respond to health check"
DATABASE_SIGN = "Database: "
SUCCESS_FINISH_SIGNS = ["All tests have finished", "No tests were run"]
RETRIES_SIGN = "Some tests were restarted"
# def write_results(results_file, status_file, results, status):
# with open(results_file, "w", encoding="utf-8") as f:
# out = csv.writer(f, delimiter="\t")
# out.writerows(results)
# with open(status_file, "w", encoding="utf-8") as f:
# out = csv.writer(f, delimiter="\t")
# out.writerow(status)
BROKEN_TESTS_ANALYZER_TECH_DEBT = [
"01624_soft_constraints",
# Check after ConstantNode refactoring
"02944_variant_as_common_type",
]
class FTResultsProcessor:
@dataclasses.dataclass
class Summary:
total: int
skipped: int
unknown: int
failed: int
success: int
test_results: List[Result]
hung: bool = False
server_died: bool = False
retries: bool = False
success_finish: bool = False
test_end: bool = True
def __init__(self, wd):
self.tests_output_file = f"{wd}/test_result.txt"
# self.test_results_parsed_file = f"{wd}/test_result.tsv"
# self.status_file = f"{wd}/check_status.tsv"
self.broken_tests = BROKEN_TESTS_ANALYZER_TECH_DEBT
def _process_test_output(self):
total = 0
skipped = 0
unknown = 0
failed = 0
success = 0
hung = False
server_died = False
retries = False
success_finish = False
test_results = []
test_end = True
with open(self.tests_output_file, "r", encoding="utf-8") as test_file:
for line in test_file:
original_line = line
line = line.strip()
if any(s in line for s in SUCCESS_FINISH_SIGNS):
success_finish = True
# Ignore hung check report, since it may be quite large.
# (and may break python parser which has limit of 128KiB for each row).
if HUNG_SIGN in line:
hung = True
break
if SERVER_DIED_SIGN in line or SERVER_DIED_SIGN2 in line:
server_died = True
if RETRIES_SIGN in line:
retries = True
if any(
sign in line
for sign in (OK_SIGN, FAIL_SIGN, UNKNOWN_SIGN, SKIPPED_SIGN)
):
test_name = line.split(" ")[2].split(":")[0]
test_time = ""
try:
time_token = line.split("]")[1].strip().split()[0]
float(time_token)
test_time = time_token
except:
pass
total += 1
if TIMEOUT_SIGN in line:
if test_name in self.broken_tests:
success += 1
test_results.append((test_name, "BROKEN", test_time, []))
else:
failed += 1
test_results.append((test_name, "Timeout", test_time, []))
elif FAIL_SIGN in line:
if test_name in self.broken_tests:
success += 1
test_results.append((test_name, "BROKEN", test_time, []))
else:
failed += 1
test_results.append((test_name, "FAIL", test_time, []))
elif UNKNOWN_SIGN in line:
unknown += 1
test_results.append((test_name, "FAIL", test_time, []))
elif SKIPPED_SIGN in line:
skipped += 1
test_results.append((test_name, "SKIPPED", test_time, []))
else:
if OK_SIGN in line and test_name in self.broken_tests:
skipped += 1
test_results.append(
(
test_name,
"NOT_FAILED",
test_time,
[
"This test passed. Update analyzer_tech_debt.txt.\n"
],
)
)
else:
success += int(OK_SIGN in line)
test_results.append((test_name, "OK", test_time, []))
test_end = False
elif (
len(test_results) > 0
and test_results[-1][1] == "FAIL"
and not test_end
):
test_results[-1][3].append(original_line)
# Database printed after everything else in case of failures,
# so this is a stop marker for capturing test output.
#
# And it is handled after everything else to include line with database into the report.
if DATABASE_SIGN in line:
test_end = True
test_results = [
Result(
name=test[0],
status=test[1],
start_time=None,
duration=float(test[2]),
info="".join(test[3])[:8192],
)
for test in test_results
]
s = self.Summary(
total=total,
skipped=skipped,
unknown=unknown,
failed=failed,
success=success,
test_results=test_results,
hung=hung,
server_died=server_died,
success_finish=success_finish,
retries=retries,
)
return s
def run(self):
state = Result.Status.SUCCESS
s = self._process_test_output()
test_results = s.test_results
# # Check test_results.tsv for sanitizer asserts, crashes and other critical errors.
# # If the file is present, it's expected to be generated by stress_test.lib check for critical errors
# # In the end this file will be fully regenerated, including both results from critical errors check and
# # functional test results.
# if test_results_path and os.path.exists(test_results_path):
# with open(test_results_path, "r", encoding="utf-8") as test_results_file:
# existing_test_results = list(
# csv.reader(test_results_file, delimiter="\t")
# )
# for test in existing_test_results:
# if len(test) < 2:
# unknown += 1
# else:
# test_results.append(test)
#
# if test[1] != "OK":
# failed += 1
# else:
# success += 1
# is_flaky_check = 1 < int(os.environ.get("NUM_TRIES", 1))
# logging.info("Is flaky check: %s", is_flaky_check)
# # If no tests were run (success == 0) it indicates an error (e.g. server did not start or crashed immediately)
# # But it's Ok for "flaky checks" - they can contain just one test for check which is marked as skipped.
# if failed != 0 or unknown != 0 or (success == 0 and (not is_flaky_check)):
if s.failed != 0 or s.unknown != 0:
state = Result.Status.FAILED
if s.hung:
state = Result.Status.FAILED
test_results.append(
Result("Some queries hung", "FAIL", info="Some queries hung")
)
elif s.server_died:
state = Result.Status.FAILED
# When ClickHouse server crashes, some tests are still running
# and fail because they cannot connect to server
for result in test_results:
if result.status == "FAIL":
result.status = "SERVER_DIED"
test_results.append(Result("Server died", "FAIL", info="Server died"))
elif not s.success_finish:
state = Result.Status.FAILED
test_results.append(
Result("Tests are not finished", "FAIL", info="Tests are not finished")
)
elif s.retries:
test_results.append(
Result("Some tests restarted", "SKIPPED", info="Some tests restarted")
)
else:
pass
# TODO: !!!
# def test_result_comparator(item):
# # sort by status then by check name
# order = {
# "FAIL": 0,
# "SERVER_DIED": 1,
# "Timeout": 2,
# "NOT_FAILED": 3,
# "BROKEN": 4,
# "OK": 5,
# "SKIPPED": 6,
# }
# return order.get(item[1], 10), str(item[0]), item[1]
#
# test_results.sort(key=test_result_comparator)
return Result.create_from(
name=Environment.JOB_NAME,
results=test_results,
status=state,
files=[self.tests_output_file],
with_info_from_results=False,
)
# if __name__ == "__main__":
# logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s")
# parser = argparse.ArgumentParser(
# description="ClickHouse script for parsing results of functional tests"
# )
#
# parser.add_argument("--out-results-file", default="/test_output/test_results.tsv")
# parser.add_argument("--out-status-file", default="/test_output/check_status.tsv")
# args = parser.parse_args()
#
# broken_tests = []
# state, description, test_results = process_result(
# args.in_results_dir,
# broken_tests,
# args.in_test_result_file,
# args.in_results_file,
# )
# logging.info("Result parsed")
# status = (state, description)
#
#
#
# write_results(args.out_results_file, args.out_status_file, test_results, status)
# logging.info("Result written")

5
ci/praktika/__init__.py Normal file
View File

@ -0,0 +1,5 @@
from .artifact import Artifact
from .docker import Docker
from .job import Job
from .secret import Secret
from .workflow import Workflow

94
ci/praktika/__main__.py Normal file
View File

@ -0,0 +1,94 @@
import argparse
import sys
from praktika.html_prepare import Html
from praktika.utils import Utils
from praktika.validator import Validator
from praktika.yaml_generator import YamlGenerator
def create_parser():
parser = argparse.ArgumentParser(prog="python3 -m praktika")
subparsers = parser.add_subparsers(dest="command", help="Available subcommands")
run_parser = subparsers.add_parser("run", help="Job Runner")
run_parser.add_argument("--job", help="Job Name", type=str, required=True)
run_parser.add_argument(
"--workflow",
help="Workflow Name (required if job name is not uniq per config)",
type=str,
default="",
)
run_parser.add_argument(
"--no-docker",
help="Do not run job in docker even if job config says so, for local test",
action="store_true",
)
run_parser.add_argument(
"--docker",
help="Custom docker image for job run, for local test",
type=str,
default="",
)
run_parser.add_argument(
"--param",
help="Custom parameter to pass into a job script, it's up to job script how to use it, for local test",
type=str,
default=None,
)
run_parser.add_argument(
"--ci",
help="When not set - dummy env will be generated, for local test",
action="store_true",
default="",
)
_yaml_parser = subparsers.add_parser("yaml", help="Generates Yaml Workflows")
_html_parser = subparsers.add_parser("html", help="Uploads HTML page for reports")
return parser
if __name__ == "__main__":
parser = create_parser()
args = parser.parse_args()
if args.command == "yaml":
Validator().validate()
YamlGenerator().generate()
elif args.command == "html":
Html.prepare()
elif args.command == "run":
from praktika.mangle import _get_workflows
from praktika.runner import Runner
workflows = _get_workflows(name=args.workflow or None)
job_workflow_pairs = []
for workflow in workflows:
job = workflow.find_job(args.job, lazy=True)
if job:
job_workflow_pairs.append((job, workflow))
if not job_workflow_pairs:
Utils.raise_with_error(
f"Failed to find job [{args.job}] workflow [{args.workflow}]"
)
elif len(job_workflow_pairs) > 1:
Utils.raise_with_error(
f"More than one job [{args.job}] found - try specifying workflow name with --workflow"
)
else:
job, workflow = job_workflow_pairs[0][0], job_workflow_pairs[0][1]
print(f"Going to run job [{job.name}], workflow [{workflow.name}]")
Runner().run(
workflow=workflow,
job=job,
docker=args.docker,
dummy_env=not args.ci,
no_docker=args.no_docker,
param=args.param,
)
else:
parser.print_help()
sys.exit(1)

198
ci/praktika/_environment.py Normal file
View File

@ -0,0 +1,198 @@
import dataclasses
import json
import os
from pathlib import Path
from types import SimpleNamespace
from typing import Any, Dict, List, Type
from praktika import Workflow
from praktika._settings import _Settings
from praktika.utils import MetaClasses, T
@dataclasses.dataclass
class _Environment(MetaClasses.Serializable):
WORKFLOW_NAME: str
JOB_NAME: str
REPOSITORY: str
BRANCH: str
SHA: str
PR_NUMBER: int
EVENT_TYPE: str
JOB_OUTPUT_STREAM: str
EVENT_FILE_PATH: str
CHANGE_URL: str
COMMIT_URL: str
BASE_BRANCH: str
RUN_ID: str
RUN_URL: str
INSTANCE_TYPE: str
INSTANCE_ID: str
INSTANCE_LIFE_CYCLE: str
LOCAL_RUN: bool = False
PARAMETER: Any = None
REPORT_INFO: List[str] = dataclasses.field(default_factory=list)
name = "environment"
@classmethod
def file_name_static(cls, _name=""):
return f"{_Settings.TEMP_DIR}/{cls.name}.json"
@classmethod
def from_dict(cls: Type[T], obj: Dict[str, Any]) -> T:
JOB_OUTPUT_STREAM = os.getenv("GITHUB_OUTPUT", "")
obj["JOB_OUTPUT_STREAM"] = JOB_OUTPUT_STREAM
if "PARAMETER" in obj:
obj["PARAMETER"] = _to_object(obj["PARAMETER"])
return cls(**obj)
def add_info(self, info):
self.REPORT_INFO.append(info)
self.dump()
@classmethod
def get(cls):
if Path(cls.file_name_static()).is_file():
return cls.from_fs("environment")
else:
print("WARNING: Environment: get from env")
env = cls.from_env()
env.dump()
return env
def set_job_name(self, job_name):
self.JOB_NAME = job_name
self.dump()
return self
@staticmethod
def get_needs_statuses():
if Path(_Settings.WORKFLOW_STATUS_FILE).is_file():
with open(_Settings.WORKFLOW_STATUS_FILE, "r", encoding="utf8") as f:
return json.load(f)
else:
print(
f"ERROR: Status file [{_Settings.WORKFLOW_STATUS_FILE}] does not exist"
)
raise RuntimeError()
@classmethod
def from_env(cls) -> "_Environment":
WORKFLOW_NAME = os.getenv("GITHUB_WORKFLOW", "")
JOB_NAME = os.getenv("JOB_NAME", "")
REPOSITORY = os.getenv("GITHUB_REPOSITORY", "")
BRANCH = os.getenv("GITHUB_HEAD_REF", "")
EVENT_FILE_PATH = os.getenv("GITHUB_EVENT_PATH", "")
JOB_OUTPUT_STREAM = os.getenv("GITHUB_OUTPUT", "")
RUN_ID = os.getenv("GITHUB_RUN_ID", "0")
RUN_URL = f"https://github.com/{REPOSITORY}/actions/runs/{RUN_ID}"
BASE_BRANCH = os.getenv("GITHUB_BASE_REF", "")
if EVENT_FILE_PATH:
with open(EVENT_FILE_PATH, "r", encoding="utf-8") as f:
github_event = json.load(f)
if "pull_request" in github_event:
EVENT_TYPE = Workflow.Event.PULL_REQUEST
PR_NUMBER = github_event["pull_request"]["number"]
SHA = github_event["pull_request"]["head"]["sha"]
CHANGE_URL = github_event["pull_request"]["html_url"]
COMMIT_URL = CHANGE_URL + f"/commits/{SHA}"
elif "commits" in github_event:
EVENT_TYPE = Workflow.Event.PUSH
SHA = github_event["after"]
CHANGE_URL = github_event["head_commit"]["url"] # commit url
PR_NUMBER = 0
COMMIT_URL = CHANGE_URL
else:
assert False, "TODO: not supported"
else:
print("WARNING: Local execution - dummy Environment will be generated")
SHA = "TEST"
PR_NUMBER = -1
EVENT_TYPE = Workflow.Event.PUSH
CHANGE_URL = ""
COMMIT_URL = ""
INSTANCE_TYPE = (
os.getenv("INSTANCE_TYPE", None)
# or Shell.get_output("ec2metadata --instance-type")
or ""
)
INSTANCE_ID = (
os.getenv("INSTANCE_ID", None)
# or Shell.get_output("ec2metadata --instance-id")
or ""
)
INSTANCE_LIFE_CYCLE = (
os.getenv("INSTANCE_LIFE_CYCLE", None)
# or Shell.get_output(
# "curl -s --fail http://169.254.169.254/latest/meta-data/instance-life-cycle"
# )
or ""
)
return _Environment(
WORKFLOW_NAME=WORKFLOW_NAME,
JOB_NAME=JOB_NAME,
REPOSITORY=REPOSITORY,
BRANCH=BRANCH,
EVENT_FILE_PATH=EVENT_FILE_PATH,
JOB_OUTPUT_STREAM=JOB_OUTPUT_STREAM,
SHA=SHA,
EVENT_TYPE=EVENT_TYPE,
PR_NUMBER=PR_NUMBER,
RUN_ID=RUN_ID,
CHANGE_URL=CHANGE_URL,
COMMIT_URL=COMMIT_URL,
RUN_URL=RUN_URL,
BASE_BRANCH=BASE_BRANCH,
INSTANCE_TYPE=INSTANCE_TYPE,
INSTANCE_ID=INSTANCE_ID,
INSTANCE_LIFE_CYCLE=INSTANCE_LIFE_CYCLE,
REPORT_INFO=[],
)
def get_s3_prefix(self, latest=False):
return self.get_s3_prefix_static(self.PR_NUMBER, self.BRANCH, self.SHA, latest)
@classmethod
def get_s3_prefix_static(cls, pr_number, branch, sha, latest=False):
prefix = ""
if pr_number > 0:
prefix += f"{pr_number}"
else:
prefix += f"{branch}"
if latest:
prefix += f"/latest"
elif sha:
prefix += f"/{sha}"
return prefix
# TODO: find a better place for the function. This file should not import praktika.settings
# as it's requires reading users config, that's why imports nested inside the function
def get_report_url(self):
import urllib
from praktika.settings import Settings
from praktika.utils import Utils
path = Settings.HTML_S3_PATH
for bucket, endpoint in Settings.S3_BUCKET_TO_HTTP_ENDPOINT.items():
if bucket in path:
path = path.replace(bucket, endpoint)
break
REPORT_URL = f"https://{path}/{Path(Settings.HTML_PAGE_FILE).name}?PR={self.PR_NUMBER}&sha={self.SHA}&name_0={urllib.parse.quote(self.WORKFLOW_NAME, safe='')}&name_1={urllib.parse.quote(self.JOB_NAME, safe='')}"
return REPORT_URL
def is_local_run(self):
return self.LOCAL_RUN
def _to_object(data):
if isinstance(data, dict):
return SimpleNamespace(**{k: _to_object(v) for k, v in data.items()})
elif isinstance(data, list):
return [_to_object(i) for i in data]
else:
return data

124
ci/praktika/_settings.py Normal file
View File

@ -0,0 +1,124 @@
import dataclasses
from pathlib import Path
from typing import Dict, Iterable, List, Optional
@dataclasses.dataclass
class _Settings:
######################################
# Pipeline generation settings #
######################################
CI_PATH = "./ci"
WORKFLOW_PATH_PREFIX: str = "./.github/workflows"
WORKFLOWS_DIRECTORY: str = f"{CI_PATH}/workflows"
SETTINGS_DIRECTORY: str = f"{CI_PATH}/settings"
CI_CONFIG_JOB_NAME = "Config Workflow"
DOCKER_BUILD_JOB_NAME = "Docker Builds"
FINISH_WORKFLOW_JOB_NAME = "Finish Workflow"
READY_FOR_MERGE_STATUS_NAME = "Ready for Merge"
CI_CONFIG_RUNS_ON: Optional[List[str]] = None
DOCKER_BUILD_RUNS_ON: Optional[List[str]] = None
VALIDATE_FILE_PATHS: bool = True
######################################
# Runtime Settings #
######################################
MAX_RETRIES_S3 = 3
MAX_RETRIES_GH = 3
######################################
# S3 (artifact storage) settings #
######################################
S3_ARTIFACT_PATH: str = ""
######################################
# CI workspace settings #
######################################
TEMP_DIR: str = "/tmp/praktika"
OUTPUT_DIR: str = f"{TEMP_DIR}/output"
INPUT_DIR: str = f"{TEMP_DIR}/input"
PYTHON_INTERPRETER: str = "python3"
PYTHON_PACKET_MANAGER: str = "pip3"
PYTHON_VERSION: str = "3.9"
INSTALL_PYTHON_FOR_NATIVE_JOBS: bool = False
INSTALL_PYTHON_REQS_FOR_NATIVE_JOBS: str = "./ci/requirements.txt"
ENVIRONMENT_VAR_FILE: str = f"{TEMP_DIR}/environment.json"
RUN_LOG: str = f"{TEMP_DIR}/praktika_run.log"
SECRET_GH_APP_ID: str = "GH_APP_ID"
SECRET_GH_APP_PEM_KEY: str = "GH_APP_PEM_KEY"
ENV_SETUP_SCRIPT: str = "/tmp/praktika_setup_env.sh"
WORKFLOW_STATUS_FILE: str = f"{TEMP_DIR}/workflow_status.json"
######################################
# CI Cache settings #
######################################
CACHE_VERSION: int = 1
CACHE_DIGEST_LEN: int = 20
CACHE_S3_PATH: str = ""
CACHE_LOCAL_PATH: str = f"{TEMP_DIR}/ci_cache"
######################################
# Report settings #
######################################
HTML_S3_PATH: str = ""
HTML_PAGE_FILE: str = "./praktika/json.html"
TEXT_CONTENT_EXTENSIONS: Iterable[str] = frozenset([".txt", ".log"])
S3_BUCKET_TO_HTTP_ENDPOINT: Optional[Dict[str, str]] = None
DOCKERHUB_USERNAME: str = ""
DOCKERHUB_SECRET: str = ""
DOCKER_WD: str = "/wd"
######################################
# CI DB Settings #
######################################
SECRET_CI_DB_URL: str = "CI_DB_URL"
SECRET_CI_DB_PASSWORD: str = "CI_DB_PASSWORD"
CI_DB_DB_NAME = ""
CI_DB_TABLE_NAME = ""
CI_DB_INSERT_TIMEOUT_SEC = 5
_USER_DEFINED_SETTINGS = [
"S3_ARTIFACT_PATH",
"CACHE_S3_PATH",
"HTML_S3_PATH",
"S3_BUCKET_TO_HTTP_ENDPOINT",
"TEXT_CONTENT_EXTENSIONS",
"TEMP_DIR",
"OUTPUT_DIR",
"INPUT_DIR",
"CI_CONFIG_RUNS_ON",
"DOCKER_BUILD_RUNS_ON",
"CI_CONFIG_JOB_NAME",
"PYTHON_INTERPRETER",
"PYTHON_VERSION",
"PYTHON_PACKET_MANAGER",
"INSTALL_PYTHON_FOR_NATIVE_JOBS",
"INSTALL_PYTHON_REQS_FOR_NATIVE_JOBS",
"MAX_RETRIES_S3",
"MAX_RETRIES_GH",
"VALIDATE_FILE_PATHS",
"DOCKERHUB_USERNAME",
"DOCKERHUB_SECRET",
"READY_FOR_MERGE_STATUS_NAME",
"SECRET_CI_DB_URL",
"SECRET_CI_DB_PASSWORD",
"CI_DB_DB_NAME",
"CI_DB_TABLE_NAME",
"CI_DB_INSERT_TIMEOUT_SEC",
"SECRET_GH_APP_PEM_KEY",
"SECRET_GH_APP_ID",
]
class GHRunners:
ubuntu = "ubuntu-latest"
if __name__ == "__main__":
for setting in _USER_DEFINED_SETTINGS:
print(_Settings().__getattribute__(setting))
# print(dataclasses.asdict(_Settings()))

33
ci/praktika/artifact.py Normal file
View File

@ -0,0 +1,33 @@
from dataclasses import dataclass
class Artifact:
class Type:
GH = "github"
S3 = "s3"
PHONY = "phony"
@dataclass
class Config:
"""
name - artifact name
type - artifact type, see Artifact.Type
path - file path or glob, e.g. "path/**/[abc]rtifac?/*"
"""
name: str
type: str
path: str
_provided_by: str = ""
_s3_path: str = ""
def is_s3_artifact(self):
return self.type == Artifact.Type.S3
@classmethod
def define_artifact(cls, name, type, path):
return cls.Config(name=name, type=type, path=path)
@classmethod
def define_gh_artifact(cls, name, path):
return cls.define_artifact(name=name, type=cls.Type.GH, path=path)

127
ci/praktika/cache.py Normal file
View File

@ -0,0 +1,127 @@
import dataclasses
import json
from pathlib import Path
from praktika import Artifact, Job, Workflow
from praktika._environment import _Environment
from praktika.digest import Digest
from praktika.s3 import S3
from praktika.settings import Settings
from praktika.utils import Utils
class Cache:
@dataclasses.dataclass
class CacheRecord:
class Type:
SUCCESS = "success"
type: str
sha: str
pr_number: int
branch: str
def dump(self, path):
with open(path, "w", encoding="utf8") as f:
json.dump(dataclasses.asdict(self), f)
@classmethod
def from_fs(cls, path):
with open(path, "r", encoding="utf8") as f:
return Cache.CacheRecord(**json.load(f))
@classmethod
def from_dict(cls, obj):
return Cache.CacheRecord(**obj)
def __init__(self):
self.digest = Digest()
self.success = {} # type Dict[str, Any]
@classmethod
def push_success_record(cls, job_name, job_digest, sha):
type_ = Cache.CacheRecord.Type.SUCCESS
record = Cache.CacheRecord(
type=type_,
sha=sha,
pr_number=_Environment.get().PR_NUMBER,
branch=_Environment.get().BRANCH,
)
assert (
Settings.CACHE_S3_PATH
), f"Setting CACHE_S3_PATH must be defined with enabled CI Cache"
record_path = f"{Settings.CACHE_S3_PATH}/v{Settings.CACHE_VERSION}/{Utils.normalize_string(job_name)}/{job_digest}"
record_file = Path(Settings.TEMP_DIR) / type_
record.dump(record_file)
S3.copy_file_to_s3(s3_path=record_path, local_path=record_file)
record_file.unlink()
def fetch_success(self, job_name, job_digest):
type_ = Cache.CacheRecord.Type.SUCCESS
assert (
Settings.CACHE_S3_PATH
), f"Setting CACHE_S3_PATH must be defined with enabled CI Cache"
record_path = f"{Settings.CACHE_S3_PATH}/v{Settings.CACHE_VERSION}/{Utils.normalize_string(job_name)}/{job_digest}/{type_}"
record_file_local_dir = (
f"{Settings.CACHE_LOCAL_PATH}/{Utils.normalize_string(job_name)}/"
)
Path(record_file_local_dir).mkdir(parents=True, exist_ok=True)
if S3.head_object(record_path):
res = S3.copy_file_from_s3(
s3_path=record_path, local_path=record_file_local_dir
)
else:
res = None
if res:
print(f"Cache record found, job [{job_name}], digest [{job_digest}]")
self.success[job_name] = True
return Cache.CacheRecord.from_fs(Path(record_file_local_dir) / type_)
return None
if __name__ == "__main__":
# test
c = Cache()
workflow = Workflow.Config(
name="TEST",
event=Workflow.Event.PULL_REQUEST,
jobs=[
Job.Config(
name="JobA",
runs_on=["some"],
command="python -m unittest ./ci/tests/example_1/test_example_produce_artifact.py",
provides=["greet"],
job_requirements=Job.Requirements(
python_requirements_txt="./ci/requirements.txt"
),
digest_config=Job.CacheDigestConfig(
# example: use glob to include files
include_paths=["./ci/tests/example_1/test_example_consume*.py"],
),
),
Job.Config(
name="JobB",
runs_on=["some"],
command="python -m unittest ./ci/tests/example_1/test_example_consume_artifact.py",
requires=["greet"],
job_requirements=Job.Requirements(
python_requirements_txt="./ci/requirements.txt"
),
digest_config=Job.CacheDigestConfig(
# example: use dir to include files recursively
include_paths=["./ci/tests/example_1"],
# example: use glob to exclude files from digest
exclude_paths=[
"./ci/tests/example_1/test_example_consume*",
"./**/*.pyc",
],
),
),
],
artifacts=[Artifact.Config(type="s3", name="greet", path="hello")],
enable_cache=True,
)
for job in workflow.jobs:
print(c.digest.calc_job_digest(job))

136
ci/praktika/cidb.py Normal file
View File

@ -0,0 +1,136 @@
import copy
import dataclasses
import json
from typing import Optional
import requests
from praktika._environment import _Environment
from praktika.result import Result
from praktika.settings import Settings
from praktika.utils import Utils
class CIDB:
@dataclasses.dataclass
class TableRecord:
pull_request_number: int
commit_sha: str
commit_url: str
check_name: str
check_status: str
check_duration_ms: int
check_start_time: int
report_url: str
pull_request_url: str
base_ref: str
base_repo: str
head_ref: str
head_repo: str
task_url: str
instance_type: str
instance_id: str
test_name: str
test_status: str
test_duration_ms: Optional[int]
test_context_raw: str
def __init__(self, url, passwd):
self.url = url
self.auth = {
"X-ClickHouse-User": "default",
"X-ClickHouse-Key": passwd,
}
@classmethod
def json_data_generator(cls, result: Result):
env = _Environment.get()
base_record = cls.TableRecord(
pull_request_number=env.PR_NUMBER,
commit_sha=env.SHA,
commit_url=env.COMMIT_URL,
check_name=result.name,
check_status=result.status,
check_duration_ms=int(result.duration * 1000),
check_start_time=Utils.timestamp_to_str(result.start_time),
report_url=env.get_report_url(),
pull_request_url=env.CHANGE_URL,
base_ref=env.BASE_BRANCH,
base_repo=env.REPOSITORY,
head_ref=env.BRANCH,
# TODO: remove from table?
head_repo=env.REPOSITORY,
# TODO: remove from table?
task_url="",
instance_type=",".join([env.INSTANCE_TYPE, env.INSTANCE_LIFE_CYCLE]),
instance_id=env.INSTANCE_ID,
test_name="",
test_status="",
test_duration_ms=None,
test_context_raw=result.info,
)
yield json.dumps(dataclasses.asdict(base_record))
for result_ in result.results:
record = copy.deepcopy(base_record)
record.test_name = result_.name
if result_.start_time:
record.check_start_time = (Utils.timestamp_to_str(result.start_time),)
record.test_status = result_.status
record.test_duration_ms = int(result_.duration * 1000)
record.test_context_raw = result_.info
yield json.dumps(dataclasses.asdict(record))
def insert(self, result: Result):
# Create a session object
params = {
"database": Settings.CI_DB_DB_NAME,
"query": f"INSERT INTO {Settings.CI_DB_TABLE_NAME} FORMAT JSONEachRow",
"date_time_input_format": "best_effort",
"send_logs_level": "warning",
}
session = requests.Session()
for json_str in self.json_data_generator(result):
try:
response1 = session.post(
url=self.url,
params=params,
data=json_str,
headers=self.auth,
timeout=Settings.CI_DB_INSERT_TIMEOUT_SEC,
)
except Exception as ex:
raise ex
session.close()
def check(self):
# Create a session object
params = {
"database": Settings.CI_DB_DB_NAME,
"query": f"SELECT 1",
}
try:
response = requests.post(
url=self.url,
params=params,
data="",
headers=self.auth,
timeout=Settings.CI_DB_INSERT_TIMEOUT_SEC,
)
if not response.ok:
print("ERROR: No connection to CI DB")
return (
False,
f"ERROR: No connection to CI DB [{response.status_code}/{response.reason}]",
)
if not response.json() == 1:
print("ERROR: CI DB smoke test failed select 1 == 1")
return (
False,
f"ERROR: CI DB smoke test failed [select 1 ==> {response.json()}]",
)
except Exception as ex:
print(f"ERROR: Exception [{ex}]")
return False, "CIDB: ERROR: Exception [{ex}]"
return True, ""

112
ci/praktika/digest.py Normal file
View File

@ -0,0 +1,112 @@
import dataclasses
import hashlib
import os
from hashlib import md5
from pathlib import Path
from typing import List
from praktika import Job
from praktika.docker import Docker
from praktika.settings import Settings
from praktika.utils import Utils
class Digest:
def __init__(self):
self.digest_cache = {}
@staticmethod
def _hash_digest_config(digest_config: Job.CacheDigestConfig) -> str:
data_dict = dataclasses.asdict(digest_config)
hash_obj = md5()
hash_obj.update(str(data_dict).encode())
hash_string = hash_obj.hexdigest()
return hash_string
def calc_job_digest(self, job_config: Job.Config):
config = job_config.digest_config
if not config:
return "f" * Settings.CACHE_DIGEST_LEN
cache_key = self._hash_digest_config(config)
if cache_key in self.digest_cache:
return self.digest_cache[cache_key]
included_files = Utils.traverse_paths(
job_config.digest_config.include_paths,
job_config.digest_config.exclude_paths,
sorted=True,
)
print(
f"calc digest for job [{job_config.name}]: hash_key [{cache_key}], include [{len(included_files)}] files"
)
# Sort files to ensure consistent hash calculation
included_files.sort()
# Calculate MD5 hash
res = ""
if not included_files:
res = "f" * Settings.CACHE_DIGEST_LEN
print(f"NOTE: empty digest config [{config}] - return dummy digest")
else:
hash_md5 = hashlib.md5()
for file_path in included_files:
res = self._calc_file_digest(file_path, hash_md5)
assert res
self.digest_cache[cache_key] = res
return res
def calc_docker_digest(
self,
docker_config: Docker.Config,
dependency_configs: List[Docker.Config],
hash_md5=None,
):
"""
:param hash_md5:
:param dependency_configs: list of Docker.Config(s) that :param docker_config: depends on
:param docker_config: Docker.Config to calculate digest for
:return:
"""
print(f"Calculate digest for docker [{docker_config.name}]")
paths = Utils.traverse_path(docker_config.path, sorted=True)
if not hash_md5:
hash_md5 = hashlib.md5()
dependencies = []
for dependency_name in docker_config.depends_on:
for dependency_config in dependency_configs:
if dependency_config.name == dependency_name:
print(
f"Add docker [{dependency_config.name}] as dependency for docker [{docker_config.name}] digest calculation"
)
dependencies.append(dependency_config)
for dependency in dependencies:
_ = self.calc_docker_digest(dependency, dependency_configs, hash_md5)
for path in paths:
_ = self._calc_file_digest(path, hash_md5=hash_md5)
return hash_md5.hexdigest()[: Settings.CACHE_DIGEST_LEN]
@staticmethod
def _calc_file_digest(file_path, hash_md5):
# Resolve file path if it's a symbolic link
resolved_path = file_path
if Path(file_path).is_symlink():
resolved_path = os.path.realpath(file_path)
if not Path(resolved_path).is_file():
print(
f"WARNING: No valid file resolved by link {file_path} -> {resolved_path} - skipping digest calculation"
)
return hash_md5.hexdigest()[: Settings.CACHE_DIGEST_LEN]
with open(resolved_path, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hash_md5.update(chunk)
return hash_md5.hexdigest()[: Settings.CACHE_DIGEST_LEN]

60
ci/praktika/docker.py Normal file
View File

@ -0,0 +1,60 @@
import dataclasses
from typing import List
from praktika.utils import Shell
class Docker:
class Platforms:
ARM = "linux/arm64"
AMD = "linux/amd64"
arm_amd = [ARM, AMD]
@dataclasses.dataclass
class Config:
name: str
path: str
depends_on: List[str]
platforms: List[str]
@classmethod
def build(cls, config: "Docker.Config", log_file, digests, add_latest):
tags_substr = f" -t {config.name}:{digests[config.name]}"
if add_latest:
tags_substr = f" -t {config.name}:latest"
from_tag = ""
if config.depends_on:
assert (
len(config.depends_on) == 1
), f"Only one dependency in depends_on is currently supported, docker [{config}]"
from_tag = f" --build-arg FROM_TAG={digests[config.depends_on[0]]}"
command = f"docker buildx build --platform {','.join(config.platforms)} {tags_substr} {from_tag} --cache-to type=inline --cache-from type=registry,ref={config.name} --push {config.path}"
return Shell.run(command, log_file=log_file, verbose=True)
@classmethod
def sort_in_build_order(cls, dockers: List["Docker.Config"]):
ready_names = []
i = 0
while i < len(dockers):
docker = dockers[i]
if not docker.depends_on or all(
dep in ready_names for dep in docker.depends_on
):
ready_names.append(docker.name)
i += 1
else:
dockers.append(dockers.pop(i))
return dockers
@classmethod
def login(cls, user_name, user_password):
print("Docker: log in to dockerhub")
return Shell.check(
f"docker login --username '{user_name}' --password-stdin",
strict=True,
stdin_str=user_password,
encoding="utf-8",
verbose=True,
)

View File

@ -0,0 +1,3 @@
from praktika._environment import _Environment
Environment = _Environment.get()

View File

@ -0,0 +1,4 @@
from praktika.execution.machine_init import run
if __name__ == "__main__":
run()

View File

@ -0,0 +1,31 @@
import os
from praktika.utils import MetaClasses
class ScalingType(metaclass=MetaClasses.WithIter):
DISABLED = "disabled"
AUTOMATIC_SCALE_DOWN = "scale_down"
AUTOMATIC_SCALE_UP_DOWN = "scale"
class DefaultExecutionSettings:
GH_ACTIONS_DIRECTORY: str = "/home/ubuntu/gh_actions"
RUNNER_SCALING_TYPE: str = ScalingType.AUTOMATIC_SCALE_UP_DOWN
MAX_WAIT_TIME_BEFORE_SCALE_DOWN_SEC: int = 30
class ExecutionSettings:
GH_ACTIONS_DIRECTORY = os.getenv(
"GH_ACTIONS_DIRECTORY", DefaultExecutionSettings.GH_ACTIONS_DIRECTORY
)
RUNNER_SCALING_TYPE = os.getenv(
"RUNNER_SCALING_TYPE", DefaultExecutionSettings.RUNNER_SCALING_TYPE
)
MAX_WAIT_TIME_BEFORE_SCALE_DOWN_SEC = int(
os.getenv(
"MAX_WAIT_TIME_BEFORE_SCALE_DOWN_SEC",
DefaultExecutionSettings.MAX_WAIT_TIME_BEFORE_SCALE_DOWN_SEC,
)
)
LOCAL_EXECUTION = bool(os.getenv("CLOUD", "0") == "0")

View File

@ -0,0 +1,338 @@
import os
import platform
import signal
import time
import traceback
import requests
from praktika.execution.execution_settings import ExecutionSettings, ScalingType
from praktika.utils import ContextManager, Shell
class StateMachine:
class StateNames:
INIT = "init"
WAIT = "wait"
RUN = "run"
def __init__(self):
self.state = self.StateNames.INIT
self.scale_type = ExecutionSettings.RUNNER_SCALING_TYPE
self.machine = Machine(scaling_type=self.scale_type).update_instance_info()
self.state_updated_at = int(time.time())
self.forked = False
def kick(self):
if self.state == self.StateNames.INIT:
self.machine.config_actions().run_actions_async()
print("State Machine: INIT -> WAIT")
self.state = self.StateNames.WAIT
self.state_updated_at = int(time.time())
# TODO: add monitoring
if not self.machine.is_actions_process_healthy():
print(f"ERROR: GH runner process unexpectedly died")
self.machine.self_terminate(decrease_capacity=False)
elif self.state == self.StateNames.WAIT:
res = self.machine.check_job_assigned()
if res:
print("State Machine: WAIT -> RUN")
self.state = self.StateNames.RUN
self.state_updated_at = int(time.time())
self.check_scale_up()
else:
self.check_scale_down()
elif self.state == self.StateNames.RUN:
res = self.machine.check_job_running()
if res:
pass
else:
print("State Machine: RUN -> INIT")
self.state = self.StateNames.INIT
self.state_updated_at = int(time.time())
def check_scale_down(self):
if self.scale_type not in (
ScalingType.AUTOMATIC_SCALE_DOWN,
ScalingType.AUTOMATIC_SCALE_UP_DOWN,
):
return
if ScalingType.AUTOMATIC_SCALE_UP_DOWN and not self.forked:
print(
f"Scaling type is AUTOMATIC_SCALE_UP_DOWN and machine has not run a job - do not scale down"
)
return
if (
int(time.time()) - self.state_updated_at
> ExecutionSettings.MAX_WAIT_TIME_BEFORE_SCALE_DOWN_SEC
):
print(
f"No job assigned for more than MAX_WAIT_TIME_BEFORE_SCALE_DOWN_SEC [{ExecutionSettings.MAX_WAIT_TIME_BEFORE_SCALE_DOWN_SEC}] - scale down the instance"
)
if not ExecutionSettings.LOCAL_EXECUTION:
self.machine.self_terminate(decrease_capacity=True)
else:
print("Local execution - skip scaling operation")
def check_scale_up(self):
if self.scale_type not in (ScalingType.AUTOMATIC_SCALE_UP_DOWN,):
return
if self.forked:
print("This instance already forked once - do not scale up")
return
self.machine.self_fork()
self.forked = True
def run(self):
self.machine.unconfig_actions()
while True:
self.kick()
time.sleep(5)
def terminate(self):
try:
self.machine.unconfig_actions()
except:
print("WARNING: failed to unconfig runner")
if not ExecutionSettings.LOCAL_EXECUTION:
if self.machine is not None:
self.machine.self_terminate(decrease_capacity=False)
time.sleep(10)
# wait termination
print("ERROR: failed to terminate instance via aws cli - try os call")
os.system("sudo shutdown now")
else:
print("NOTE: Local execution - machine won't be terminated")
class Machine:
@staticmethod
def get_latest_gh_actions_release():
url = f"https://api.github.com/repos/actions/runner/releases/latest"
response = requests.get(url, timeout=5)
if response.status_code == 200:
latest_release = response.json()
return latest_release["tag_name"].removeprefix("v")
else:
print(f"Failed to get the latest release: {response.status_code}")
return None
def __init__(self, scaling_type):
self.os_name = platform.system().lower()
assert self.os_name == "linux", f"Unsupported OS [{self.os_name}]"
if platform.machine() == "x86_64":
self.arch = "x64"
elif "aarch64" in platform.machine().lower():
self.arch = "arm64"
else:
assert False, f"Unsupported arch [{platform.machine()}]"
self.instance_id = None
self.asg_name = None
self.runner_api_endpoint = None
self.runner_type = None
self.labels = []
self.proc = None
assert scaling_type in ScalingType
self.scaling_type = scaling_type
def install_gh_actions_runner(self):
gh_actions_version = self.get_latest_gh_actions_release()
assert self.os_name and gh_actions_version and self.arch
Shell.check(
f"rm -rf {ExecutionSettings.GH_ACTIONS_DIRECTORY}",
strict=True,
verbose=True,
)
Shell.check(
f"mkdir {ExecutionSettings.GH_ACTIONS_DIRECTORY}", strict=True, verbose=True
)
with ContextManager.cd(ExecutionSettings.GH_ACTIONS_DIRECTORY):
Shell.check(
f"curl -O -L https://github.com/actions/runner/releases/download/v{gh_actions_version}/actions-runner-{self.os_name}-{self.arch}-{gh_actions_version}.tar.gz",
strict=True,
verbose=True,
)
Shell.check(f"tar xzf *tar.gz", strict=True, verbose=True)
Shell.check(f"rm -f *tar.gz", strict=True, verbose=True)
Shell.check(f"sudo ./bin/installdependencies.sh", strict=True, verbose=True)
Shell.check(
f"chown -R ubuntu:ubuntu {ExecutionSettings.GH_ACTIONS_DIRECTORY}",
strict=True,
verbose=True,
)
def _get_gh_token_from_ssm(self):
gh_token = Shell.get_output_or_raise(
"/usr/local/bin/aws ssm get-parameter --name github_runner_registration_token --with-decryption --output text --query Parameter.Value"
)
return gh_token
def update_instance_info(self):
self.instance_id = Shell.get_output_or_raise("ec2metadata --instance-id")
assert self.instance_id
self.asg_name = Shell.get_output(
f"aws ec2 describe-instances --instance-id {self.instance_id} --query \"Reservations[].Instances[].Tags[?Key=='aws:autoscaling:groupName'].Value\" --output text"
)
# self.runner_type = Shell.get_output_or_raise(
# f'/usr/local/bin/aws ec2 describe-tags --filters "Name=resource-id,Values={self.instance_id}" --query "Tags[?Key==\'github:runner-type\'].Value" --output text'
# )
self.runner_type = self.asg_name
if (
self.scaling_type != ScalingType.DISABLED
and not ExecutionSettings.LOCAL_EXECUTION
):
assert (
self.asg_name and self.runner_type
), f"Failed to retrieve ASG name, which is required for scaling_type [{self.scaling_type}]"
org = os.getenv("MY_ORG", "")
assert (
org
), "MY_ORG env variable myst be set to use init script for runner machine"
self.runner_api_endpoint = f"https://github.com/{org}"
self.labels = ["self-hosted", self.runner_type]
return self
@classmethod
def check_job_assigned(cls):
runner_pid = Shell.get_output_or_raise("pgrep Runner.Listener")
if not runner_pid:
print("check_job_assigned: No runner pid")
return False
log_file = Shell.get_output_or_raise(
f"lsof -p {runner_pid} | grep -o {ExecutionSettings.GH_ACTIONS_DIRECTORY}/_diag/Runner.*log"
)
if not log_file:
print("check_job_assigned: No log file")
return False
return Shell.check(f"grep -q 'Terminal] .* Running job:' {log_file}")
def check_job_running(self):
if self.proc is None:
print(f"WARNING: No job started")
return False
exit_code = self.proc.poll()
if exit_code is None:
return True
else:
print(f"Job runner finished with exit code [{exit_code}]")
self.proc = None
return False
def config_actions(self):
if not self.instance_id:
self.update_instance_info()
token = self._get_gh_token_from_ssm()
assert token and self.instance_id and self.runner_api_endpoint and self.labels
command = f"sudo -u ubuntu {ExecutionSettings.GH_ACTIONS_DIRECTORY}/config.sh --token {token} \
--url {self.runner_api_endpoint} --ephemeral --unattended --replace \
--runnergroup Default --labels {','.join(self.labels)} --work wd --name {self.instance_id}"
res = 1
i = 0
while i < 10 and res != 0:
res = Shell.run(command)
i += 1
if res != 0:
print(
f"ERROR: failed to configure GH actions runner after [{i}] attempts, exit code [{res}], retry after 10s"
)
time.sleep(10)
self._get_gh_token_from_ssm()
if res == 0:
print("GH action runner has been configured")
else:
assert False, "GH actions runner configuration failed"
return self
def unconfig_actions(self):
token = self._get_gh_token_from_ssm()
command = f"sudo -u ubuntu {ExecutionSettings.GH_ACTIONS_DIRECTORY}/config.sh remove --token {token}"
Shell.check(command, strict=True)
return self
def run_actions_async(self):
command = f"sudo -u ubuntu {ExecutionSettings.GH_ACTIONS_DIRECTORY}/run.sh"
self.proc = Shell.run_async(command)
assert self.proc is not None
return self
def is_actions_process_healthy(self):
try:
if self.proc.poll() is None:
return True
stdout, stderr = self.proc.communicate()
if self.proc.returncode != 0:
# Handle failure
print(
f"GH Action process failed with return code {self.proc.returncode}"
)
print(f"Error output: {stderr}")
return False
else:
print(f"GH Action process is not running")
return False
except Exception as e:
print(f"GH Action process exception: {e}")
return False
def self_terminate(self, decrease_capacity):
print(
f"WARNING: Self terminate is called, decrease_capacity [{decrease_capacity}]"
)
traceback.print_stack()
if not self.instance_id:
self.update_instance_info()
assert self.instance_id
command = f"aws autoscaling terminate-instance-in-auto-scaling-group --instance-id {self.instance_id}"
if decrease_capacity:
command += " --should-decrement-desired-capacity"
else:
command += " --no-should-decrement-desired-capacity"
Shell.check(
command=command,
verbose=True,
)
def self_fork(self):
current_capacity = Shell.get_output(
f'aws autoscaling describe-auto-scaling-groups --auto-scaling-group-name {self.asg_name} \
--query "AutoScalingGroups[0].DesiredCapacity" --output text'
)
current_capacity = int(current_capacity)
if not current_capacity:
print("ERROR: failed to get current capacity - cannot scale up")
return
desired_capacity = current_capacity + 1
command = f"aws autoscaling set-desired-capacity --auto-scaling-group-name {self.asg_name} --desired-capacity {desired_capacity}"
print(f"Increase capacity [{current_capacity} -> {desired_capacity}]")
res = Shell.check(
command=command,
verbose=True,
)
if not res:
print("ERROR: failed to increase capacity - cannot scale up")
def handle_signal(signum, _frame):
print(f"FATAL: Received signal {signum}")
raise RuntimeError(f"killed by signal {signum}")
def run():
signal.signal(signal.SIGINT, handle_signal)
signal.signal(signal.SIGTERM, handle_signal)
m = None
try:
m = StateMachine()
m.run()
except Exception as e:
print(f"FATAL: Exception [{e}] - terminate instance")
time.sleep(10)
if m:
m.terminate()
raise e
if __name__ == "__main__":
run()

View File

@ -0,0 +1,102 @@
import base64
import random
import struct
import zlib
def create_favicon():
# Image dimensions
width = 32
height = 32
# Initialize a transparent background image (RGBA: 4 bytes per pixel)
image_data = bytearray(
[0, 0, 0, 0] * width * height
) # Set alpha to 0 for transparency
# Draw 4 vertical lines with color #FAFF68 (RGB: 250, 255, 104)
line_color = [250, 255, 104, 255] # RGBA for #FAFF68 with full opacity
line_width = 4
space_width = 3
x_start = space_width
line_number = 4
line_height = height - space_width
for i in range(line_number):
# Randomly pick a starting y position for each line
y_start = random.randint(0, height - 1)
# Draw the line with random shift along Y-axis
for y in range(line_height):
y_pos = (y + y_start) % height
for x in range(line_width):
pixel_index = (y_pos * width + x_start + x) * 4
image_data[pixel_index : pixel_index + 4] = line_color
x_start += line_width + space_width
# Convert the RGBA image to PNG format
png_data = create_png(width, height, image_data)
# Convert PNG to ICO format
ico_data = create_ico(png_data)
return ico_data
def create_png(width, height, image_data):
def write_chunk(chunk_type, data):
chunk_len = struct.pack(">I", len(data))
chunk_crc = struct.pack(">I", zlib.crc32(chunk_type + data) & 0xFFFFFFFF)
return chunk_len + chunk_type + data + chunk_crc
png_signature = b"\x89PNG\r\n\x1a\n"
ihdr_chunk = struct.pack(">IIBBBBB", width, height, 8, 6, 0, 0, 0)
idat_data = zlib.compress(
b"".join(
b"\x00" + image_data[y * width * 4 : (y + 1) * width * 4]
for y in range(height)
),
9,
)
idat_chunk = write_chunk(b"IDAT", idat_data)
iend_chunk = write_chunk(b"IEND", b"")
return png_signature + write_chunk(b"IHDR", ihdr_chunk) + idat_chunk + iend_chunk
def create_ico(png_data):
# ICO header: reserved (2 bytes), type (2 bytes), image count (2 bytes)
ico_header = struct.pack("<HHH", 0, 1, 1)
# ICO entry: width, height, color count, reserved, color planes, bits per pixel, size, offset
ico_entry = struct.pack("<BBBBHHII", 32, 32, 0, 0, 1, 32, len(png_data), 22)
return ico_header + ico_entry + png_data
def save_favicon_to_disk(ico_data, file_path="favicon.ico"):
with open(file_path, "wb") as f:
f.write(ico_data)
print(f"Favicon saved to {file_path}")
def lambda_handler(event, context):
# Generate the favicon
favicon_data = create_favicon()
# Return the favicon as a binary response
return {
"statusCode": 200,
"headers": {
"Content-Type": "image/x-icon",
"Content-Disposition": 'inline; filename="favicon.ico"',
},
"body": base64.b64encode(favicon_data).decode("utf-8"),
"isBase64Encoded": True,
}
# Optional: Call the function directly to generate and save favicon locally (if running outside Lambda)
if __name__ == "__main__":
favicon_data = create_favicon()
save_favicon_to_disk(favicon_data)

105
ci/praktika/gh.py Normal file
View File

@ -0,0 +1,105 @@
import json
import time
from praktika._environment import _Environment
from praktika.result import Result
from praktika.settings import Settings
from praktika.utils import Shell
class GH:
@classmethod
def do_command_with_retries(cls, command):
res = False
retry_count = 0
out, err = "", ""
while retry_count < Settings.MAX_RETRIES_GH and not res:
ret_code, out, err = Shell.get_res_stdout_stderr(command, verbose=True)
res = ret_code == 0
if not res and "Validation Failed" in err:
print("ERROR: GH command validation error")
break
if not res and "Bad credentials" in err:
print("ERROR: GH credentials/auth failure")
break
if not res:
retry_count += 1
time.sleep(5)
if not res:
print(
f"ERROR: Failed to execute gh command [{command}] out:[{out}] err:[{err}] after [{retry_count}] attempts"
)
return res
@classmethod
def post_pr_comment(
cls, comment_body, or_update_comment_with_substring, repo=None, pr=None
):
if not repo:
repo = _Environment.get().REPOSITORY
if not pr:
pr = _Environment.get().PR_NUMBER
if or_update_comment_with_substring:
print(f"check comment [{comment_body}] created")
cmd_check_created = f'gh api -H "Accept: application/vnd.github.v3+json" \
"/repos/{repo}/issues/{pr}/comments" \
--jq \'.[] | {{id: .id, body: .body}}\' | grep -F "{or_update_comment_with_substring}"'
output = Shell.get_output(cmd_check_created)
if output:
comment_ids = []
try:
comment_ids = [
json.loads(item.strip())["id"] for item in output.split("\n")
]
except Exception as ex:
print(f"Failed to retrieve PR comments with [{ex}]")
for id in comment_ids:
cmd = f'gh api \
-X PATCH \
-H "Accept: application/vnd.github.v3+json" \
"/repos/{repo}/issues/comments/{id}" \
-f body=\'{comment_body}\''
print(f"Update existing comments [{id}]")
return cls.do_command_with_retries(cmd)
cmd = f'gh pr comment {pr} --body "{comment_body}"'
return cls.do_command_with_retries(cmd)
@classmethod
def post_commit_status(cls, name, status, description, url):
status = cls.convert_to_gh_status(status)
command = (
f"gh api -X POST -H 'Accept: application/vnd.github.v3+json' "
f"/repos/{_Environment.get().REPOSITORY}/statuses/{_Environment.get().SHA} "
f"-f state='{status}' -f target_url='{url}' "
f"-f description='{description}' -f context='{name}'"
)
return cls.do_command_with_retries(command)
@classmethod
def convert_to_gh_status(cls, status):
if status in (
Result.Status.PENDING,
Result.Status.SUCCESS,
Result.Status.FAILED,
Result.Status.ERROR,
):
return status
if status in Result.Status.RUNNING:
return Result.Status.PENDING
else:
assert (
False
), f"Invalid status [{status}] to be set as GH commit status.state"
if __name__ == "__main__":
# test
GH.post_pr_comment(
comment_body="foobar",
or_update_comment_with_substring="CI",
repo="ClickHouse/praktika",
pr=15,
)

71
ci/praktika/gh_auth.py Normal file
View File

@ -0,0 +1,71 @@
import sys
import time
from typing import List
import requests
from jwt import JWT, jwk_from_pem
from praktika import Workflow
from praktika.mangle import _get_workflows
from praktika.settings import Settings
from praktika.utils import Shell
class GHAuth:
@staticmethod
def _generate_jwt(client_id, pem):
pem = str.encode(pem)
signing_key = jwk_from_pem(pem)
payload = {
"iat": int(time.time()),
"exp": int(time.time()) + 600,
"iss": client_id,
}
# Create JWT
jwt_instance = JWT()
encoded_jwt = jwt_instance.encode(payload, signing_key, alg="RS256")
return encoded_jwt
@staticmethod
def _get_installation_id(jwt_token):
headers = {
"Authorization": f"Bearer {jwt_token}",
"Accept": "application/vnd.github.v3+json",
}
response = requests.get(
"https://api.github.com/app/installations", headers=headers, timeout=10
)
response.raise_for_status()
installations = response.json()
assert installations, "No installations found for the GitHub App"
return installations[0]["id"]
@staticmethod
def _get_access_token(jwt_token, installation_id):
headers = {
"Authorization": f"Bearer {jwt_token}",
"Accept": "application/vnd.github.v3+json",
}
url = (
f"https://api.github.com/app/installations/{installation_id}/access_tokens"
)
response = requests.post(url, headers=headers, timeout=10)
response.raise_for_status()
return response.json()["token"]
@classmethod
def auth(cls, workflow_name) -> None:
wf = _get_workflows(workflow_name) # type: List[Workflow.Config]
pem = wf[0].get_secret(Settings.SECRET_GH_APP_PEM_KEY).get_value()
assert pem
app_id = wf[0].get_secret(Settings.SECRET_GH_APP_ID).get_value()
# Generate JWT
jwt_token = cls._generate_jwt(app_id, pem)
# Get Installation ID
installation_id = cls._get_installation_id(jwt_token)
# Get Installation Access Token
access_token = cls._get_access_token(jwt_token, installation_id)
Shell.check(f"echo {access_token} | gh auth login --with-token", strict=True)
if __name__ == "__main__":
GHAuth.auth(sys.argv[1])

124
ci/praktika/hook_cache.py Normal file
View File

@ -0,0 +1,124 @@
from praktika._environment import _Environment
from praktika.cache import Cache
from praktika.mangle import _get_workflows
from praktika.runtime import RunConfig
from praktika.settings import Settings
from praktika.utils import Utils
class CacheRunnerHooks:
@classmethod
def configure(cls, _workflow):
workflow_config = RunConfig.from_fs(_workflow.name)
cache = Cache()
assert _Environment.get().WORKFLOW_NAME
workflow = _get_workflows(name=_Environment.get().WORKFLOW_NAME)[0]
print(f"Workflow Configure, workflow [{workflow.name}]")
assert (
workflow.enable_cache
), f"Outdated yaml pipelines or BUG. Configuration must be run only for workflow with enabled cache, workflow [{workflow.name}]"
artifact_digest_map = {}
job_digest_map = {}
for job in workflow.jobs:
if not job.digest_config:
print(
f"NOTE: job [{job.name}] has no Config.digest_config - skip cache check, always run"
)
digest = cache.digest.calc_job_digest(job_config=job)
job_digest_map[job.name] = digest
if job.provides:
# assign the job digest also to the artifacts it provides
for artifact in job.provides:
artifact_digest_map[artifact] = digest
for job in workflow.jobs:
digests_combined_list = []
if job.requires:
# include digest of required artifact to the job digest, so that they affect job state
for artifact_name in job.requires:
if artifact_name not in [
artifact.name for artifact in workflow.artifacts
]:
# phony artifact assumed to be not affecting jobs that depend on it
continue
digests_combined_list.append(artifact_digest_map[artifact_name])
digests_combined_list.append(job_digest_map[job.name])
final_digest = "-".join(digests_combined_list)
workflow_config.digest_jobs[job.name] = final_digest
assert (
workflow_config.digest_jobs
), f"BUG, Workflow with enabled cache must have job digests after configuration, wf [{workflow.name}]"
print("Check remote cache")
job_to_cache_record = {}
for job_name, job_digest in workflow_config.digest_jobs.items():
record = cache.fetch_success(job_name=job_name, job_digest=job_digest)
if record:
assert (
Utils.normalize_string(job_name)
not in workflow_config.cache_success
)
workflow_config.cache_success.append(job_name)
workflow_config.cache_success_base64.append(Utils.to_base64(job_name))
job_to_cache_record[job_name] = record
print("Check artifacts to reuse")
for job in workflow.jobs:
if job.name in workflow_config.cache_success:
if job.provides:
for artifact_name in job.provides:
workflow_config.cache_artifacts[artifact_name] = (
job_to_cache_record[job.name]
)
print(f"Write config to GH's job output")
with open(_Environment.get().JOB_OUTPUT_STREAM, "a", encoding="utf8") as f:
print(
f"DATA={workflow_config.to_json()}",
file=f,
)
print(f"WorkflowRuntimeConfig: [{workflow_config.to_json(pretty=True)}]")
print(
"Dump WorkflowConfig to fs, the next hooks in this job might want to see it"
)
workflow_config.dump()
return workflow_config
@classmethod
def pre_run(cls, _workflow, _job, _required_artifacts=None):
path_prefixes = []
if _job.name == Settings.CI_CONFIG_JOB_NAME:
# SPECIAL handling
return path_prefixes
env = _Environment.get()
runtime_config = RunConfig.from_fs(_workflow.name)
required_artifacts = []
if _required_artifacts:
required_artifacts = _required_artifacts
for artifact in required_artifacts:
if artifact.name in runtime_config.cache_artifacts:
record = runtime_config.cache_artifacts[artifact.name]
print(f"Reuse artifact [{artifact.name}] from [{record}]")
path_prefixes.append(
env.get_s3_prefix_static(
record.pr_number, record.branch, record.sha
)
)
else:
path_prefixes.append(env.get_s3_prefix())
return path_prefixes
@classmethod
def run(cls, workflow, job):
pass
@classmethod
def post_run(cls, workflow, job):
if job.name == Settings.CI_CONFIG_JOB_NAME:
return
if job.digest_config:
# cache is enabled, and it's a job that supposed to be cached (has defined digest config)
workflow_runtime = RunConfig.from_fs(workflow.name)
job_digest = workflow_runtime.digest_jobs[job.name]
Cache.push_success_record(job.name, job_digest, workflow_runtime.sha)

198
ci/praktika/hook_html.py Normal file
View File

@ -0,0 +1,198 @@
import dataclasses
import json
import urllib.parse
from pathlib import Path
from typing import List
from praktika._environment import _Environment
from praktika.gh import GH
from praktika.parser import WorkflowConfigParser
from praktika.result import Result, ResultInfo
from praktika.runtime import RunConfig
from praktika.s3 import S3
from praktika.settings import Settings
from praktika.utils import Shell, Utils
@dataclasses.dataclass
class GitCommit:
date: str
message: str
sha: str
@staticmethod
def from_json(json_data: str) -> List["GitCommit"]:
commits = []
try:
data = json.loads(json_data)
commits = [
GitCommit(
message=commit["messageHeadline"],
sha=commit["oid"],
date=commit["committedDate"],
)
for commit in data.get("commits", [])
]
except Exception as e:
print(
f"ERROR: Failed to deserialize commit's data: [{json_data}], ex: [{e}]"
)
return commits
class HtmlRunnerHooks:
@classmethod
def configure(cls, _workflow):
def _get_pr_commits(pr_number):
res = []
if not pr_number:
return res
output = Shell.get_output(f"gh pr view {pr_number} --json commits")
if output:
res = GitCommit.from_json(output)
return res
# generate pending Results for all jobs in the workflow
if _workflow.enable_cache:
skip_jobs = RunConfig.from_fs(_workflow.name).cache_success
else:
skip_jobs = []
env = _Environment.get()
results = []
for job in _workflow.jobs:
if job.name not in skip_jobs:
result = Result.generate_pending(job.name)
else:
result = Result.generate_skipped(job.name)
results.append(result)
summary_result = Result.generate_pending(_workflow.name, results=results)
summary_result.aux_links.append(env.CHANGE_URL)
summary_result.aux_links.append(env.RUN_URL)
summary_result.start_time = Utils.timestamp()
page_url = "/".join(
["https:/", Settings.HTML_S3_PATH, str(Path(Settings.HTML_PAGE_FILE).name)]
)
for bucket, endpoint in Settings.S3_BUCKET_TO_HTTP_ENDPOINT.items():
page_url = page_url.replace(bucket, endpoint)
# TODO: add support for non-PRs (use branch?)
page_url += f"?PR={env.PR_NUMBER}&sha=latest&name_0={urllib.parse.quote(env.WORKFLOW_NAME, safe='')}"
summary_result.html_link = page_url
# clean the previous latest results in PR if any
if env.PR_NUMBER:
S3.clean_latest_result()
S3.copy_result_to_s3(
summary_result,
unlock=False,
)
print(f"CI Status page url [{page_url}]")
res1 = GH.post_commit_status(
name=_workflow.name,
status=Result.Status.PENDING,
description="",
url=page_url,
)
res2 = GH.post_pr_comment(
comment_body=f"Workflow [[{_workflow.name}]({page_url})], commit [{_Environment.get().SHA[:8]}]",
or_update_comment_with_substring=f"Workflow [",
)
if not (res1 or res2):
Utils.raise_with_error(
"Failed to set both GH commit status and PR comment with Workflow Status, cannot proceed"
)
if env.PR_NUMBER:
commits = _get_pr_commits(env.PR_NUMBER)
# TODO: upload commits data to s3 to visualise it on a report page
print(commits)
@classmethod
def pre_run(cls, _workflow, _job):
result = Result.from_fs(_job.name)
S3.copy_result_from_s3(
Result.file_name_static(_workflow.name),
)
workflow_result = Result.from_fs(_workflow.name)
workflow_result.update_sub_result(result)
S3.copy_result_to_s3(
workflow_result,
unlock=True,
)
@classmethod
def run(cls, _workflow, _job):
pass
@classmethod
def post_run(cls, _workflow, _job, info_errors):
result = Result.from_fs(_job.name)
env = _Environment.get()
S3.copy_result_from_s3(
Result.file_name_static(_workflow.name),
lock=True,
)
workflow_result = Result.from_fs(_workflow.name)
print(f"Workflow info [{workflow_result.info}], info_errors [{info_errors}]")
env_info = env.REPORT_INFO
if env_info:
print(
f"WARNING: some info lines are set in Environment - append to report [{env_info}]"
)
info_errors += env_info
if info_errors:
info_errors = [f" | {error}" for error in info_errors]
info_str = f"{_job.name}:\n"
info_str += "\n".join(info_errors)
print("Update workflow results with new info")
workflow_result.set_info(info_str)
old_status = workflow_result.status
S3.upload_result_files_to_s3(result)
workflow_result.update_sub_result(result)
skipped_job_results = []
if not result.is_ok():
print(
"Current job failed - find dependee jobs in the workflow and set their statuses to skipped"
)
workflow_config_parsed = WorkflowConfigParser(_workflow).parse()
for dependee_job in workflow_config_parsed.workflow_yaml_config.jobs:
if _job.name in dependee_job.needs:
if _workflow.get_job(dependee_job.name).run_unless_cancelled:
continue
print(
f"NOTE: Set job [{dependee_job.name}] status to [{Result.Status.SKIPPED}] due to current failure"
)
skipped_job_results.append(
Result(
name=dependee_job.name,
status=Result.Status.SKIPPED,
info=ResultInfo.SKIPPED_DUE_TO_PREVIOUS_FAILURE
+ f" [{_job.name}]",
)
)
for skipped_job_result in skipped_job_results:
workflow_result.update_sub_result(skipped_job_result)
S3.copy_result_to_s3(
workflow_result,
unlock=True,
)
if workflow_result.status != old_status:
print(
f"Update GH commit status [{result.name}]: [{old_status} -> {workflow_result.status}], link [{workflow_result.html_link}]"
)
GH.post_commit_status(
name=workflow_result.name,
status=GH.convert_to_gh_status(workflow_result.status),
description="",
url=workflow_result.html_link,
)

View File

@ -0,0 +1,43 @@
from abc import ABC, abstractmethod
from praktika import Workflow
class HookInterface(ABC):
@abstractmethod
def pre_run(self, _workflow, _job):
"""
runs in pre-run step
:param _workflow:
:param _job:
:return:
"""
pass
@abstractmethod
def run(self, _workflow, _job):
"""
runs in run step
:param _workflow:
:param _job:
:return:
"""
pass
@abstractmethod
def post_run(self, _workflow, _job):
"""
runs in post-run step
:param _workflow:
:param _job:
:return:
"""
pass
@abstractmethod
def configure(self, _workflow: Workflow.Config):
"""
runs in initial WorkflowConfig job in run step
:return:
"""
pass

View File

@ -0,0 +1,10 @@
from praktika.s3 import S3
from praktika.settings import Settings
class Html:
@classmethod
def prepare(cls):
S3.copy_file_to_s3(
s3_path=Settings.HTML_S3_PATH, local_path=Settings.HTML_PAGE_FILE
)

102
ci/praktika/job.py Normal file
View File

@ -0,0 +1,102 @@
import copy
import json
from dataclasses import dataclass, field
from typing import Any, List, Optional
class Job:
@dataclass
class Requirements:
python: bool = False
python_requirements_txt: str = ""
@dataclass
class CacheDigestConfig:
include_paths: List[str] = field(default_factory=list)
exclude_paths: List[str] = field(default_factory=list)
@dataclass
class Config:
# Job Name
name: str
# Machine's label to run job on. For instance [ubuntu-latest] for free gh runner
runs_on: List[str]
# Job Run Command
command: str
# What job requires
# May be phony or physical names
requires: List[str] = field(default_factory=list)
# What job provides
# May be phony or physical names
provides: List[str] = field(default_factory=list)
job_requirements: Optional["Job.Requirements"] = None
timeout: int = 1 * 3600
digest_config: Optional["Job.CacheDigestConfig"] = None
run_in_docker: str = ""
run_unless_cancelled: bool = False
allow_merge_on_failure: bool = False
parameter: Any = None
def parametrize(
self,
parameter: Optional[List[Any]] = None,
runs_on: Optional[List[List[str]]] = None,
timeout: Optional[List[int]] = None,
):
assert (
parameter or runs_on
), "Either :parameter or :runs_on must be non empty list for parametrisation"
if not parameter:
parameter = [None] * len(runs_on)
if not runs_on:
runs_on = [None] * len(parameter)
if not timeout:
timeout = [None] * len(parameter)
assert (
len(parameter) == len(runs_on) == len(timeout)
), "Parametrization lists must be of the same size"
res = []
for parameter_, runs_on_, timeout_ in zip(parameter, runs_on, timeout):
obj = copy.deepcopy(self)
if parameter_:
obj.parameter = parameter_
if runs_on_:
obj.runs_on = runs_on_
if timeout_:
obj.timeout = timeout_
obj.name = obj.get_job_name_with_parameter()
res.append(obj)
return res
def get_job_name_with_parameter(self):
name, parameter, runs_on = self.name, self.parameter, self.runs_on
res = name
name_params = []
if isinstance(parameter, list) or isinstance(parameter, dict):
name_params.append(json.dumps(parameter))
elif parameter is not None:
name_params.append(parameter)
if runs_on:
assert isinstance(runs_on, list)
name_params.append(json.dumps(runs_on))
if name_params:
name_params = [str(param) for param in name_params]
res += f" ({', '.join(name_params)})"
self.name = res
return res
def __repr__(self):
return self.name

745
ci/praktika/json.html Normal file
View File

@ -0,0 +1,745 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>praktika report</title>
<link rel="icon" href="https://w4z3pajszlbkfcw2wcylfei5km0xmwag.lambda-url.us-east-1.on.aws/" type="image/x-icon">
<style>
/* Default (Day Theme) */
:root {
--background-color: white;
--text-color: #000;
--tile-background: #f9f9f9;
--footer-background: #f1f1f1;
--footer-text-color: #000;
--status-width: 300px;
}
body {
background-color: var(--background-color);
color: var(--text-color);
height: 100%;
margin: 0;
display: flex;
flex-direction: column;
font-family: 'IBM Plex Mono Condensed', monospace, sans-serif;
--header-background-color: #f4f4f4;
}
body.night-theme {
--background-color: #1F1F1C;
--text-color: #fff;
--tile-background: black;
--header-background-color: #1F1F1C;
}
#info-container {
margin-left: calc(var(--status-width) + 20px);
margin-bottom: 10px;
background-color: var(--tile-background);
padding: 10px;
text-align: left;
}
#status-container {
position: fixed;
top: 0;
bottom: 0;
left: 0;
width: var(--status-width);
background-color: var(--tile-background);
padding: 20px;
box-sizing: border-box;
font-size: 18px;
margin: 0;
}
#status-container a {
color: #007bff;
text-decoration: underline;
font-weight: bold;
cursor: pointer;
display: inline-block;
margin-top: 5px;
margin-left: 20px;
padding: 2px 0;
font-size: 0.8em;
}
#status-container a:hover {
color: #0056b3;
text-decoration: none;
}
.key-value-pair {
display: flex; /* Enable Flexbox for alignment */
justify-content: space-between; /* Distribute space between key and value */
margin-bottom: 20px; /* Add space between each pair */
}
.json-key {
font-weight: bold;
}
.json-value {
font-weight: normal;
font-family: 'Source Code Pro', monospace, sans-serif;
letter-spacing: -0.5px;
}
#result-container {
background-color: var(--tile-background);
margin-left: calc(var(--status-width) + 20px);
padding: 20px;
box-sizing: border-box;
text-align: center;
font-size: 18px;
font-weight: normal;
flex-grow: 1;
}
#footer {
padding: 10px;
position: fixed;
bottom: 0;
left: 0;
right: 0;
background-color: #1F1F1C;
color: white;
font-size: 14px;
display: flex;
justify-content: space-between; /* Ensure the .left expands, and .right and .settings are aligned to the right */
align-items: center;
}
#footer a {
color: white;
text-decoration: none;
}
#footer .left {
flex-grow: 1; /* Takes up all the available space */
}
/* make some space around '/' in the navigation line */
#footer .left span.separator {
margin-left: 5px;
margin-right: 5px;
}
#footer .right, #footer .settings {
display: flex;
align-items: center;
}
#footer .right a::before {
content: "#";
margin-left: 10px;
color: #e0e0e0;
}
#footer .right::before, #footer .settings::before {
content: "|"; /* Add separator before right and settings sections */
margin-left: 10px;
margin-right: 10px;
color: #e0e0e0;
}
#theme-toggle {
cursor: pointer;
font-size: 20px;
color: white;
}
#theme-toggle:hover {
color: #e0e0e0;
}
#footer a:hover {
text-decoration: underline;
}
#links {
margin-top: 10px;
padding: 15px;
border: 1px solid #ccc;
border-radius: 5px;
background-color: #f9f9f9;
}
#links a {
display: block;
margin-bottom: 5px;
padding: 5px 10px;
background-color: #D5D5D5;
color: black;
text-decoration: none;
border-radius: 5px;
}
#links a:hover {
background-color: #D5D5D5;
}
table {
width: 100%;
border-collapse: collapse;
}
th.name-column, td.name-column {
max-width: 400px; /* Set the maximum width for the column */
white-space: nowrap; /* Prevent text from wrapping */
overflow: hidden; /* Hide the overflowed text */
text-overflow: ellipsis; /* Show ellipsis (...) for overflowed text */
}
th.status-column, td.status-column {
max-width: 100px; /* Set the maximum width for the column */
white-space: nowrap; /* Prevent text from wrapping */
overflow: hidden; /* Hide the overflowed text */
text-overflow: ellipsis; /* Show ellipsis (...) for overflowed text */
}
th.time-column, td.time-column {
max-width: 120px; /* Set the maximum width for the column */
white-space: nowrap; /* Prevent text from wrapping */
text-align: right;
}
th.info-column, td.info-column {
width: 100%; /* Allow the column to take all the remaining space */
}
th, td {
padding: 8px;
border: 1px solid #ddd;
text-align: left;
}
th {
background-color: var(--header-background-color);
}
.status-success {
color: green;
font-weight: bold;
}
.status-fail {
color: red;
font-weight: bold;
}
.status-pending {
color: #d4a017;
font-weight: bold;
}
.status-broken {
color: purple;
font-weight: bold;
}
.status-run {
color: blue;
font-weight: bold;
}
.status-error {
color: darkred;
font-weight: bold;
}
.status-other {
color: grey;
font-weight: bold;
}
</style>
</head>
<body>
<div id="info-container"></div>
<div id="status-container"></div>
<div id="result-container"></div>
<footer id="footer">
<div class="left"></div>
<div class="right"></div>
<div class="settings">
<span id="theme-toggle">☀️</span>
</div>
</footer>
<script>
function toggleTheme() {
document.body.classList.toggle('night-theme');
const toggleIcon = document.getElementById('theme-toggle');
if (document.body.classList.contains('night-theme')) {
toggleIcon.textContent = '☾'; // Moon for night mode
} else {
toggleIcon.textContent = '☀️'; // Sun for day mode
}
}
// Attach the toggle function to the click event of the icon
document.getElementById('theme-toggle').addEventListener('click', toggleTheme);
function formatTimestamp(timestamp, showDate = true) {
const date = new Date(timestamp * 1000);
const day = String(date.getDate()).padStart(2, '0');
const monthNames = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"];
const month = monthNames[date.getMonth()];
const year = date.getFullYear();
const hours = String(date.getHours()).padStart(2, '0');
const minutes = String(date.getMinutes()).padStart(2, '0');
const seconds = String(date.getSeconds()).padStart(2, '0');
//const milliseconds = String(date.getMilliseconds()).padStart(2, '0');
return showDate
? `${day}-${month}-${year} ${hours}:${minutes}:${seconds}`
: `${hours}:${minutes}:${seconds}`;
}
function formatDuration(durationInSeconds, detailed = false) {
// Check if the duration is empty, null, or not a number
if (!durationInSeconds || isNaN(durationInSeconds)) {
return '';
}
// Ensure duration is a floating-point number
const duration = parseFloat(durationInSeconds);
if (detailed) {
// Format in the detailed format with hours, minutes, and seconds
const hours = Math.floor(duration / 3600);
const minutes = Math.floor((duration % 3600) / 60);
const seconds = Math.floor(duration % 60);
const formattedHours = hours > 0 ? `${hours}h ` : '';
const formattedMinutes = minutes > 0 ? `${minutes}m ` : '';
const formattedSeconds = `${String(seconds).padStart(2, '0')}s`;
return `${formattedHours}${formattedMinutes}${formattedSeconds}`.trim();
} else {
// Format in the default format with seconds and milliseconds
const seconds = Math.floor(duration);
const milliseconds = Math.floor((duration % 1) * 1000);
const formattedSeconds = String(seconds);
const formattedMilliseconds = String(milliseconds).padStart(3, '0');
return `${formattedSeconds}.${formattedMilliseconds}`;
}
}
// Function to determine status class based on value
function getStatusClass(status) {
const lowerStatus = status.toLowerCase();
if (lowerStatus.includes('success') || lowerStatus === 'ok') return 'status-success';
if (lowerStatus.includes('fail')) return 'status-fail';
if (lowerStatus.includes('pending')) return 'status-pending';
if (lowerStatus.includes('broken')) return 'status-broken';
if (lowerStatus.includes('run')) return 'status-run';
if (lowerStatus.includes('error')) return 'status-error';
return 'status-other';
}
function addKeyValueToStatus(key, value) {
const statusContainer = document.getElementById('status-container');
let keyValuePair = document.createElement('div');
keyValuePair.className = 'key-value-pair';
const keyElement = document.createElement('div');
keyElement.className = 'json-key';
keyElement.textContent = key + ':';
const valueElement = document.createElement('div');
valueElement.className = 'json-value';
valueElement.textContent = value;
keyValuePair.appendChild(keyElement)
keyValuePair.appendChild(valueElement)
statusContainer.appendChild(keyValuePair);
}
function addFileButtonToStatus(key, links) {
if (links == null) {
return
}
const statusContainer = document.getElementById('status-container');
const keyElement = document.createElement('div');
keyElement.className = 'json-key';
keyElement.textContent = columnSymbols[key] + ':' || key;
statusContainer.appendChild(keyElement);
if (Array.isArray(links) && links.length > 0) {
links.forEach(link => {
const textLink = document.createElement('a');
textLink.href = link;
textLink.textContent = link.split('/').pop();
textLink.target = '_blank';
statusContainer.appendChild(textLink);
statusContainer.appendChild(document.createElement('br'));
});
}
}
function addStatusToStatus(status, start_time, duration) {
const statusContainer = document.getElementById('status-container')
let keyValuePair = document.createElement('div');
keyValuePair.className = 'key-value-pair';
let keyElement = document.createElement('div');
let valueElement = document.createElement('div');
keyElement.className = 'json-key';
valueElement.className = 'json-value';
keyElement.textContent = columnSymbols['status'] + ':' || 'status:';
valueElement.classList.add('status-value');
valueElement.classList.add(getStatusClass(status));
valueElement.textContent = status;
keyValuePair.appendChild(keyElement);
keyValuePair.appendChild(valueElement);
statusContainer.appendChild(keyValuePair);
keyValuePair = document.createElement('div');
keyValuePair.className = 'key-value-pair';
keyElement = document.createElement('div');
valueElement = document.createElement('div');
keyElement.className = 'json-key';
valueElement.className = 'json-value';
keyElement.textContent = columnSymbols['start_time'] + ':' || 'start_time:';
valueElement.textContent = formatTimestamp(start_time);
keyValuePair.appendChild(keyElement);
keyValuePair.appendChild(valueElement);
statusContainer.appendChild(keyValuePair);
keyValuePair = document.createElement('div');
keyValuePair.className = 'key-value-pair';
keyElement = document.createElement('div');
valueElement = document.createElement('div');
keyElement.className = 'json-key';
valueElement.className = 'json-value';
keyElement.textContent = columnSymbols['duration'] + ':' || 'duration:';
if (duration === null) {
// Set initial value to 0 and add a unique ID or data attribute to identify the duration element
valueElement.textContent = '00:00:00';
valueElement.setAttribute('id', 'duration-value');
} else {
// Format the duration if it's a valid number
valueElement.textContent = formatDuration(duration, true);
}
keyValuePair.appendChild(keyElement);
keyValuePair.appendChild(valueElement);
statusContainer.appendChild(keyValuePair);
}
function navigatePath(jsonObj, nameArray) {
let baseParams = new URLSearchParams(window.location.search);
let keysToDelete = [];
baseParams.forEach((value, key) => {
if (key.startsWith('name_')) {
keysToDelete.push(key); // Collect the keys to delete
}
});
keysToDelete.forEach((key) => baseParams.delete(key));
let pathNames = [];
let pathLinks = [];
let currentObj = jsonObj;
// Add the first entry (root level)
baseParams.set(`name_0`, currentObj.name);
pathNames.push(currentObj.name);
pathLinks.push(`<span class="separator">/</span><a href="${window.location.pathname}?${baseParams.toString()}">${currentObj.name}</a>`);
// Iterate through the nameArray starting at index 0
for (const [index, name] of nameArray.entries()) {
if (index === 0) continue;
if (currentObj && Array.isArray(currentObj.results)) {
const nextResult = currentObj.results.find(result => result.name === name);
if (nextResult) {
baseParams.set(`name_${index}`, nextResult.name);
pathNames.push(nextResult.name); // Correctly push nextResult name, not currentObj.name
pathLinks.push(`<span class="separator">/</span><a href="${window.location.pathname}?${baseParams.toString()}">${nextResult.name}</a>`);
currentObj = nextResult; // Move to the next object in the hierarchy
} else {
console.error(`Name "${name}" not found in results array.`);
return null; // Name not found in results array
}
} else {
console.error(`Current object is not structured as expected.`);
return null; // Current object is not structured as expected
}
}
const footerLeft = document.querySelector('#footer .left');
footerLeft.innerHTML = pathLinks.join('');
return currentObj;
}
// Define the fixed columns globally, so both functions can use it
const columns = ['name', 'status', 'start_time', 'duration', 'info'];
const columnSymbols = {
name: '📂',
status: '✔️',
start_time: '🕒',
duration: '⏳',
info: '',
files: '📄'
};
function createResultsTable(results, nest_level) {
if (results && Array.isArray(results) && results.length > 0) {
const table = document.createElement('table');
const thead = document.createElement('thead');
const tbody = document.createElement('tbody');
// Get the current URL parameters
const currentUrl = new URL(window.location.href);
// Create table headers based on the fixed columns
const headerRow = document.createElement('tr');
columns.forEach(column => {
const th = document.createElement('th');
th.textContent = th.textContent = columnSymbols[column] || column;
th.style.cursor = 'pointer'; // Make headers clickable
th.addEventListener('click', () => sortTable(results, column, tbody, nest_level)); // Add click event to sort the table
headerRow.appendChild(th);
});
thead.appendChild(headerRow);
// Create table rows
populateTableRows(tbody, results, columns, nest_level);
table.appendChild(thead);
table.appendChild(tbody);
return table;
}
return null;
}
function populateTableRows(tbody, results, columns, nest_level) {
const currentUrl = new URL(window.location.href); // Get the current URL
// Clear existing rows if re-rendering (used in sorting)
tbody.innerHTML = '';
results.forEach((result, index) => {
const row = document.createElement('tr');
columns.forEach(column => {
const td = document.createElement('td');
const value = result[column];
if (column === 'name') {
// Create a link for the name field, using name_X
const link = document.createElement('a');
const newUrl = new URL(currentUrl); // Create a fresh copy of the URL for each row
newUrl.searchParams.set(`name_${nest_level}`, value); // Use backticks for string interpolation
link.href = newUrl.toString();
link.textContent = value;
td.classList.add('name-column');
td.appendChild(link);
} else if (column === 'status') {
// Apply status formatting
const span = document.createElement('span');
span.className = getStatusClass(value);
span.textContent = value;
td.classList.add('status-column');
td.appendChild(span);
} else if (column === 'start_time') {
td.classList.add('time-column');
td.textContent = value ? formatTimestamp(value, false) : '';
} else if (column === 'duration') {
td.classList.add('time-column');
td.textContent = value ? formatDuration(value) : '';
} else if (column === 'info') {
// For info and other columns, just display the value
td.textContent = value || '';
td.classList.add('info-column');
}
row.appendChild(td);
});
tbody.appendChild(row);
});
}
function sortTable(results, key, tbody, nest_level) {
// Find the table header element for the given key
let th = null;
const tableHeaders = document.querySelectorAll('th'); // Select all table headers
tableHeaders.forEach(header => {
if (header.textContent.trim().toLowerCase() === key.toLowerCase()) {
th = header;
}
});
if (!th) {
console.error(`No table header found for key: ${key}`);
return;
}
// Determine the current sort direction
let ascending = th.getAttribute('data-sort-direction') === 'asc' ? false : true;
// Toggle the sort direction for the next click
th.setAttribute('data-sort-direction', ascending ? 'asc' : 'desc');
// Sort the results array by the given key
results.sort((a, b) => {
if (a[key] < b[key]) return ascending ? -1 : 1;
if (a[key] > b[key]) return ascending ? 1 : -1;
return 0;
});
// Re-populate the table with sorted data
populateTableRows(tbody, results, columns, nest_level);
}
function loadJSON(PR, sha, nameParams) {
const infoElement = document.getElementById('info-container');
let lastModifiedTime = null;
const task = nameParams[0].toLowerCase();
// Construct the URL dynamically based on PR, sha, and name_X
const baseUrl = window.location.origin + window.location.pathname.replace('/json.html', '');
const path = `${baseUrl}/${encodeURIComponent(PR)}/${encodeURIComponent(sha)}/result_${task}.json`;
fetch(path, {cache: "no-cache"})
.then(response => {
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
lastModifiedTime = response.headers.get('Last-Modified');
return response.json();
})
.then(data => {
const linksDiv = document.getElementById('links');
const resultsDiv = document.getElementById('result-container');
const footerRight = document.querySelector('#footer .right');
let targetData = navigatePath(data, nameParams);
let nest_level = nameParams.length;
if (targetData) {
infoElement.style.display = 'none';
// Handle footer links if present
if (Array.isArray(data.aux_links) && data.aux_links.length > 0) {
data.aux_links.forEach(link => {
const a = document.createElement('a');
a.href = link;
a.textContent = link.split('/').pop();
a.target = '_blank';
footerRight.appendChild(a);
});
}
addStatusToStatus(targetData.status, targetData.start_time, targetData.duration)
// Handle links
addFileButtonToStatus('files', targetData.links)
// Handle duration update if duration is null and start_time exists
if (targetData.duration === null && targetData.start_time) {
let duration = Math.floor(Date.now() / 1000 - targetData.start_time);
const durationElement = document.getElementById('duration-value');
const intervalId = setInterval(() => {
duration++;
durationElement.textContent = formatDuration(duration, true);
}, 1000);
}
// If 'results' exists and is non-empty, create the table
const resultsData = targetData.results;
if (Array.isArray(resultsData) && resultsData.length > 0) {
const table = createResultsTable(resultsData, nest_level);
if (table) {
resultsDiv.appendChild(table);
}
}
} else {
infoElement.textContent = 'Object Not Found';
infoElement.style.display = 'block';
}
// Set up auto-reload if Last-Modified header is present
if (lastModifiedTime) {
setInterval(() => {
checkForUpdate(path, lastModifiedTime);
}, 30000); // 30000 milliseconds = 30 seconds
}
})
.catch(error => {
console.error('Error loading JSON:', error);
infoElement.textContent = 'Error loading data';
infoElement.style.display = 'block';
});
}
// Function to check if the JSON file is updated
function checkForUpdate(path, lastModifiedTime) {
fetch(path, {method: 'HEAD'})
.then(response => {
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
const newLastModifiedTime = response.headers.get('Last-Modified');
if (newLastModifiedTime && new Date(newLastModifiedTime) > new Date(lastModifiedTime)) {
// If the JSON file has been updated, reload the page
window.location.reload();
}
})
.catch(error => {
console.error('Error checking for update:', error);
});
}
// Initialize the page and load JSON from URL parameter
function init() {
const urlParams = new URLSearchParams(window.location.search);
const PR = urlParams.get('PR');
const sha = urlParams.get('sha');
const root_name = urlParams.get('name_0');
const nameParams = [];
urlParams.forEach((value, key) => {
if (key.startsWith('name_')) {
const index = parseInt(key.split('_')[1], 10);
nameParams[index] = value;
}
});
if (PR) {
addKeyValueToStatus("PR", PR)
} else {
console.error("TODO")
}
addKeyValueToStatus("sha", sha);
if (nameParams[1]) {
addKeyValueToStatus("job", nameParams[1]);
}
addKeyValueToStatus("workflow", nameParams[0]);
if (PR && sha && root_name) {
loadJSON(PR, sha, nameParams);
} else {
document.getElementById('title').textContent = 'Error: Missing required URL parameters: PR, sha, or name_0';
}
}
window.onload = init;
</script>
</body>
</html>

137
ci/praktika/mangle.py Normal file
View File

@ -0,0 +1,137 @@
import copy
import importlib.util
from pathlib import Path
from typing import Any, Dict
from praktika import Job
from praktika._settings import _USER_DEFINED_SETTINGS, _Settings
from praktika.utils import ContextManager, Utils
def _get_workflows(name=None, file=None):
"""
Gets user's workflow configs
"""
res = []
with ContextManager.cd():
directory = Path(_Settings.WORKFLOWS_DIRECTORY)
for py_file in directory.glob("*.py"):
if file and file not in str(py_file):
continue
module_name = py_file.name.removeprefix(".py")
spec = importlib.util.spec_from_file_location(
module_name, f"{_Settings.WORKFLOWS_DIRECTORY}/{module_name}"
)
assert spec
foo = importlib.util.module_from_spec(spec)
assert spec.loader
spec.loader.exec_module(foo)
try:
for workflow in foo.WORKFLOWS:
if name:
if name == workflow.name:
print(f"Read workflow [{name}] config from [{module_name}]")
res = [workflow]
break
else:
continue
else:
res += foo.WORKFLOWS
print(f"Read workflow configs from [{module_name}]")
except Exception as e:
print(
f"WARNING: Failed to add WORKFLOWS config from [{module_name}], exception [{e}]"
)
if not res:
Utils.raise_with_error(f"Failed to find workflow [{name or file}]")
for workflow in res:
# add native jobs
_update_workflow_with_native_jobs(workflow)
# fill in artifact properties, e.g. _provided_by
_update_workflow_artifacts(workflow)
return res
def _update_workflow_artifacts(workflow):
artifact_job = {}
for job in workflow.jobs:
for artifact_name in job.provides:
assert artifact_name not in artifact_job
artifact_job[artifact_name] = job.name
for artifact in workflow.artifacts:
artifact._provided_by = artifact_job[artifact.name]
def _update_workflow_with_native_jobs(workflow):
if workflow.dockers:
from praktika.native_jobs import _docker_build_job
print(f"Enable native job [{_docker_build_job.name}] for [{workflow.name}]")
aux_job = copy.deepcopy(_docker_build_job)
if workflow.enable_cache:
print(
f"Add automatic digest config for [{aux_job.name}] job since cache is enabled"
)
docker_digest_config = Job.CacheDigestConfig()
for docker_config in workflow.dockers:
docker_digest_config.include_paths.append(docker_config.path)
aux_job.digest_config = docker_digest_config
workflow.jobs.insert(0, aux_job)
for job in workflow.jobs[1:]:
if not job.requires:
job.requires = []
job.requires.append(aux_job.name)
if (
workflow.enable_cache
or workflow.enable_report
or workflow.enable_merge_ready_status
):
from praktika.native_jobs import _workflow_config_job
print(f"Enable native job [{_workflow_config_job.name}] for [{workflow.name}]")
aux_job = copy.deepcopy(_workflow_config_job)
workflow.jobs.insert(0, aux_job)
for job in workflow.jobs[1:]:
if not job.requires:
job.requires = []
job.requires.append(aux_job.name)
if workflow.enable_merge_ready_status:
from praktika.native_jobs import _final_job
print(f"Enable native job [{_final_job.name}] for [{workflow.name}]")
aux_job = copy.deepcopy(_final_job)
for job in workflow.jobs:
aux_job.requires.append(job.name)
workflow.jobs.append(aux_job)
def _get_user_settings() -> Dict[str, Any]:
"""
Gets user's settings
"""
res = {} # type: Dict[str, Any]
directory = Path(_Settings.SETTINGS_DIRECTORY)
for py_file in directory.glob("*.py"):
module_name = py_file.name.removeprefix(".py")
spec = importlib.util.spec_from_file_location(
module_name, f"{_Settings.SETTINGS_DIRECTORY}/{module_name}"
)
assert spec
foo = importlib.util.module_from_spec(spec)
assert spec.loader
spec.loader.exec_module(foo)
for setting in _USER_DEFINED_SETTINGS:
try:
value = getattr(foo, setting)
res[setting] = value
print(f"Apply user defined setting [{setting} = {value}]")
except Exception as e:
pass
return res

378
ci/praktika/native_jobs.py Normal file
View File

@ -0,0 +1,378 @@
import sys
from typing import Dict
from praktika import Job, Workflow
from praktika._environment import _Environment
from praktika.cidb import CIDB
from praktika.digest import Digest
from praktika.docker import Docker
from praktika.gh import GH
from praktika.hook_cache import CacheRunnerHooks
from praktika.hook_html import HtmlRunnerHooks
from praktika.mangle import _get_workflows
from praktika.result import Result, ResultInfo
from praktika.runtime import RunConfig
from praktika.s3 import S3
from praktika.settings import Settings
from praktika.utils import Shell, Utils
assert Settings.CI_CONFIG_RUNS_ON
_workflow_config_job = Job.Config(
name=Settings.CI_CONFIG_JOB_NAME,
runs_on=Settings.CI_CONFIG_RUNS_ON,
job_requirements=(
Job.Requirements(
python=Settings.INSTALL_PYTHON_FOR_NATIVE_JOBS,
python_requirements_txt=Settings.INSTALL_PYTHON_REQS_FOR_NATIVE_JOBS,
)
if Settings.INSTALL_PYTHON_REQS_FOR_NATIVE_JOBS
else None
),
command=f"{Settings.PYTHON_INTERPRETER} -m praktika.native_jobs '{Settings.CI_CONFIG_JOB_NAME}'",
)
_docker_build_job = Job.Config(
name=Settings.DOCKER_BUILD_JOB_NAME,
runs_on=Settings.DOCKER_BUILD_RUNS_ON,
job_requirements=Job.Requirements(
python=Settings.INSTALL_PYTHON_FOR_NATIVE_JOBS,
python_requirements_txt="",
),
timeout=4 * 3600,
command=f"{Settings.PYTHON_INTERPRETER} -m praktika.native_jobs '{Settings.DOCKER_BUILD_JOB_NAME}'",
)
_final_job = Job.Config(
name=Settings.FINISH_WORKFLOW_JOB_NAME,
runs_on=Settings.CI_CONFIG_RUNS_ON,
job_requirements=Job.Requirements(
python=Settings.INSTALL_PYTHON_FOR_NATIVE_JOBS,
python_requirements_txt="",
),
command=f"{Settings.PYTHON_INTERPRETER} -m praktika.native_jobs '{Settings.FINISH_WORKFLOW_JOB_NAME}'",
run_unless_cancelled=True,
)
def _build_dockers(workflow, job_name):
print(f"Start [{job_name}], workflow [{workflow.name}]")
dockers = workflow.dockers
ready = []
results = []
job_status = Result.Status.SUCCESS
job_info = ""
dockers = Docker.sort_in_build_order(dockers)
docker_digests = {} # type: Dict[str, str]
for docker in dockers:
docker_digests[docker.name] = Digest().calc_docker_digest(docker, dockers)
if not Shell.check(
"docker buildx inspect --bootstrap | grep -q docker-container", verbose=True
):
print("Install docker container driver")
if not Shell.check(
"docker buildx create --use --name mybuilder --driver docker-container",
verbose=True,
):
job_status = Result.Status.FAILED
job_info = "Failed to install docker buildx driver"
if job_status == Result.Status.SUCCESS:
if not Docker.login(
Settings.DOCKERHUB_USERNAME,
user_password=workflow.get_secret(Settings.DOCKERHUB_SECRET).get_value(),
):
job_status = Result.Status.FAILED
job_info = "Failed to login to dockerhub"
if job_status == Result.Status.SUCCESS:
for docker in dockers:
assert (
docker.name not in ready
), f"All docker names must be uniq [{dockers}]"
stopwatch = Utils.Stopwatch()
info = f"{docker.name}:{docker_digests[docker.name]}"
log_file = f"{Settings.OUTPUT_DIR}/docker_{Utils.normalize_string(docker.name)}.log"
files = []
code, out, err = Shell.get_res_stdout_stderr(
f"docker manifest inspect {docker.name}:{docker_digests[docker.name]}"
)
print(
f"Docker inspect results for {docker.name}:{docker_digests[docker.name]}: exit code [{code}], out [{out}], err [{err}]"
)
if "no such manifest" in err:
ret_code = Docker.build(
docker, log_file=log_file, digests=docker_digests, add_latest=False
)
if ret_code == 0:
status = Result.Status.SUCCESS
else:
status = Result.Status.FAILED
job_status = Result.Status.FAILED
info += f", failed with exit code: {ret_code}, see log"
files.append(log_file)
else:
print(
f"Docker image [{docker.name}:{docker_digests[docker.name]} exists - skip build"
)
status = Result.Status.SKIPPED
ready.append(docker.name)
results.append(
Result(
name=docker.name,
status=status,
info=info,
duration=stopwatch.duration,
start_time=stopwatch.start_time,
files=files,
)
)
Result.from_fs(job_name).set_status(job_status).set_results(results).set_info(
job_info
)
if job_status != Result.Status.SUCCESS:
sys.exit(1)
def _config_workflow(workflow: Workflow.Config, job_name):
def _check_yaml_up_to_date():
print("Check workflows are up to date")
stop_watch = Utils.Stopwatch()
exit_code, output, err = Shell.get_res_stdout_stderr(
f"git diff-index HEAD -- {Settings.WORKFLOW_PATH_PREFIX}"
)
info = ""
status = Result.Status.SUCCESS
if exit_code != 0:
info = f"workspace has uncommitted files unexpectedly [{output}]"
status = Result.Status.ERROR
print("ERROR: ", info)
else:
Shell.check(f"{Settings.PYTHON_INTERPRETER} -m praktika --generate")
exit_code, output, err = Shell.get_res_stdout_stderr(
f"git diff-index HEAD -- {Settings.WORKFLOW_PATH_PREFIX}"
)
if exit_code != 0:
info = f"workspace has outdated workflows [{output}] - regenerate with [python -m praktika --generate]"
status = Result.Status.ERROR
print("ERROR: ", info)
return (
Result(
name="Check Workflows updated",
status=status,
start_time=stop_watch.start_time,
duration=stop_watch.duration,
info=info,
),
info,
)
def _check_secrets(secrets):
print("Check Secrets")
stop_watch = Utils.Stopwatch()
infos = []
for secret_config in secrets:
value = secret_config.get_value()
if not value:
info = f"ERROR: Failed to read secret [{secret_config.name}]"
infos.append(info)
print(info)
info = "\n".join(infos)
return (
Result(
name="Check Secrets",
status=(Result.Status.FAILED if infos else Result.Status.SUCCESS),
start_time=stop_watch.start_time,
duration=stop_watch.duration,
info=info,
),
info,
)
def _check_db(workflow):
stop_watch = Utils.Stopwatch()
res, info = CIDB(
workflow.get_secret(Settings.SECRET_CI_DB_URL).get_value(),
workflow.get_secret(Settings.SECRET_CI_DB_PASSWORD).get_value(),
).check()
return (
Result(
name="Check CI DB",
status=(Result.Status.FAILED if not res else Result.Status.SUCCESS),
start_time=stop_watch.start_time,
duration=stop_watch.duration,
info=info,
),
info,
)
print(f"Start [{job_name}], workflow [{workflow.name}]")
results = []
files = []
info_lines = []
job_status = Result.Status.SUCCESS
workflow_config = RunConfig(
name=workflow.name,
digest_jobs={},
digest_dockers={},
sha=_Environment.get().SHA,
cache_success=[],
cache_success_base64=[],
cache_artifacts={},
).dump()
# checks:
result_, info = _check_yaml_up_to_date()
if result_.status != Result.Status.SUCCESS:
print("ERROR: yaml files are outdated - regenerate, commit and push")
job_status = Result.Status.ERROR
info_lines.append(job_name + ": " + info)
results.append(result_)
if workflow.secrets:
result_, info = _check_secrets(workflow.secrets)
if result_.status != Result.Status.SUCCESS:
print(f"ERROR: Invalid secrets in workflow [{workflow.name}]")
job_status = Result.Status.ERROR
info_lines.append(job_name + ": " + info)
results.append(result_)
if workflow.enable_cidb:
result_, info = _check_db(workflow)
if result_.status != Result.Status.SUCCESS:
job_status = Result.Status.ERROR
info_lines.append(job_name + ": " + info)
results.append(result_)
# config:
if workflow.dockers:
print("Calculate docker's digests")
dockers = workflow.dockers
dockers = Docker.sort_in_build_order(dockers)
for docker in dockers:
workflow_config.digest_dockers[docker.name] = Digest().calc_docker_digest(
docker, dockers
)
workflow_config.dump()
if workflow.enable_cache:
print("Cache Lookup")
stop_watch = Utils.Stopwatch()
workflow_config = CacheRunnerHooks.configure(workflow)
results.append(
Result(
name="Cache Lookup",
status=Result.Status.SUCCESS,
start_time=stop_watch.start_time,
duration=stop_watch.duration,
)
)
files.append(RunConfig.file_name_static(workflow.name))
workflow_config.dump()
if workflow.enable_report:
print("Init report")
stop_watch = Utils.Stopwatch()
HtmlRunnerHooks.configure(workflow)
results.append(
Result(
name="Init Report",
status=Result.Status.SUCCESS,
start_time=stop_watch.start_time,
duration=stop_watch.duration,
)
)
files.append(Result.file_name_static(workflow.name))
Result.from_fs(job_name).set_status(job_status).set_results(results).set_files(
files
).set_info("\n".join(info_lines))
if job_status != Result.Status.SUCCESS:
sys.exit(1)
def _finish_workflow(workflow, job_name):
print(f"Start [{job_name}], workflow [{workflow.name}]")
env = _Environment.get()
print("Check Actions statuses")
print(env.get_needs_statuses())
print("Check Workflow results")
S3.copy_result_from_s3(
Result.file_name_static(workflow.name),
lock=False,
)
workflow_result = Result.from_fs(workflow.name)
ready_for_merge_status = Result.Status.SUCCESS
ready_for_merge_description = ""
failed_results = []
update_final_report = False
for result in workflow_result.results:
if result.name == job_name or result.status in (
Result.Status.SUCCESS,
Result.Status.SKIPPED,
):
continue
if not result.is_completed():
print(
f"ERROR: not finished job [{result.name}] in the workflow - set status to error"
)
result.status = Result.Status.ERROR
# dump workflow result after update - to have an updated result in post
workflow_result.dump()
# add error into env - should apper in the report
env.add_info(ResultInfo.NOT_FINALIZED + f" [{result.name}]")
update_final_report = True
job = workflow.get_job(result.name)
if not job or not job.allow_merge_on_failure:
print(
f"NOTE: Result for [{result.name}] has not ok status [{result.status}]"
)
ready_for_merge_status = Result.Status.FAILED
failed_results.append(result.name.split("(", maxsplit=1)[0]) # cut name
if failed_results:
ready_for_merge_description = f"failed: {', '.join(failed_results)}"
if not GH.post_commit_status(
name=Settings.READY_FOR_MERGE_STATUS_NAME + f" [{workflow.name}]",
status=ready_for_merge_status,
description=ready_for_merge_description,
url="",
):
print(f"ERROR: failed to set status [{Settings.READY_FOR_MERGE_STATUS_NAME}]")
env.add_info(ResultInfo.GH_STATUS_ERROR)
if update_final_report:
S3.copy_result_to_s3(
workflow_result,
unlock=False,
) # no lock - no unlock
Result.from_fs(job_name).set_status(Result.Status.SUCCESS).set_info(
ready_for_merge_description
)
if __name__ == "__main__":
job_name = sys.argv[1]
assert job_name, "Job name must be provided as input argument"
workflow = _get_workflows(name=_Environment.get().WORKFLOW_NAME)[0]
if job_name == Settings.DOCKER_BUILD_JOB_NAME:
_build_dockers(workflow, job_name)
elif job_name == Settings.CI_CONFIG_JOB_NAME:
_config_workflow(workflow, job_name)
elif job_name == Settings.FINISH_WORKFLOW_JOB_NAME:
_finish_workflow(workflow, job_name)
else:
assert False, f"BUG, job name [{job_name}]"

258
ci/praktika/parser.py Normal file
View File

@ -0,0 +1,258 @@
import dataclasses
from typing import Any, Dict, List
from praktika import Artifact, Workflow
from praktika.mangle import _get_workflows
class AddonType:
PY = "py"
@dataclasses.dataclass
class WorkflowYaml:
@dataclasses.dataclass
class JobYaml:
name: str
needs: List[str]
runs_on: List[str]
artifacts_gh_requires: List["WorkflowYaml.ArtifactYaml"]
artifacts_gh_provides: List["WorkflowYaml.ArtifactYaml"]
addons: List["WorkflowYaml.JobAddonYaml"]
gh_app_auth: bool
run_unless_cancelled: bool
parameter: Any
def __repr__(self):
return self.name
@dataclasses.dataclass
class ArtifactYaml:
name: str
provided_by: str
required_by: List[str]
path: str
type: str
def __repr__(self):
return self.name
@dataclasses.dataclass
class JobAddonYaml:
install_python: bool
requirements_txt_path: str
name: str
event: str
branches: List[str]
jobs: List[JobYaml]
job_to_config: Dict[str, JobYaml]
artifact_to_config: Dict[str, ArtifactYaml]
secret_names_gh: List[str]
enable_cache: bool
class WorkflowConfigParser:
def __init__(self, config: Workflow.Config):
self.workflow_name = config.name
self.config = config
self.requires_all = [] # type: List[str]
self.provides_all = [] # type: List[str]
self.job_names_all = [] # type: List[str]
self.artifact_to_providing_job_map = {} # type: Dict[str, List[str]]
self.artifact_to_job_requires_map = {} # type: Dict[str, List[str]]
self.artifact_map = {} # type: Dict[str, List[Artifact.Config]]
self.job_to_provides_artifacts = {} # type: Dict[str, List[Artifact.Config]]
self.job_to_requires_artifacts = {} # type: Dict[str, List[Artifact.Config]]
self.workflow_yaml_config = WorkflowYaml(
name=self.workflow_name,
event=config.event,
branches=[],
jobs=[],
secret_names_gh=[],
job_to_config={},
artifact_to_config={},
enable_cache=False,
)
def parse(self):
self.workflow_yaml_config.enable_cache = self.config.enable_cache
# populate WorkflowYaml.branches
if self.config.event in (Workflow.Event.PUSH,):
assert (
self.config.branches
), f'Workflow.Config.branches (e.g. ["main"]) must be set for workflow with event [{self.config.event}], workflow [{self.workflow_name}]'
assert (
not self.config.base_branches
), f'Workflow.Config.base_branches (e.g. ["main"]) must not be set for workflow with event [{self.config.event}], workflow [{self.workflow_name}]'
assert isinstance(
self.config.branches, list
), f'Workflow.Config.branches must be of type list (e.g. ["main"]), workflow [{self.workflow_name}]'
self.workflow_yaml_config.branches = self.config.branches
elif self.config.event in (Workflow.Event.PULL_REQUEST,):
assert (
self.config.base_branches
), f'Workflow.Config.base_branches (e.g. ["main"]) must be set for workflow with event [{self.config.event}], workflow [{self.workflow_name}]'
assert (
not self.config.branches
), f'Workflow.Config.branches (e.g. ["main"]) must not be set for workflow with event [{self.config.event}], workflow [{self.workflow_name}]'
assert isinstance(
self.config.base_branches, list
), f'Workflow.Config.base_branches must be of type list (e.g. ["main"]), workflow [{self.workflow_name}]'
self.workflow_yaml_config.branches = self.config.base_branches
# populate WorkflowYaml.artifact_to_config with phony artifacts
for job in self.config.jobs:
assert (
job.name not in self.workflow_yaml_config.artifact_to_config
), f"Not uniq Job name [{job.name}], workflow [{self.workflow_name}]"
self.workflow_yaml_config.artifact_to_config[job.name] = (
WorkflowYaml.ArtifactYaml(
name=job.name,
provided_by=job.name,
required_by=[],
path="",
type=Artifact.Type.PHONY,
)
)
# populate jobs
for job in self.config.jobs:
job_yaml_config = WorkflowYaml.JobYaml(
name=job.name,
addons=[],
artifacts_gh_requires=[],
artifacts_gh_provides=[],
needs=[],
runs_on=[],
gh_app_auth=False,
run_unless_cancelled=job.run_unless_cancelled,
parameter=None,
)
self.workflow_yaml_config.jobs.append(job_yaml_config)
assert (
job.name not in self.workflow_yaml_config.job_to_config
), f"Job name [{job.name}] is not uniq, workflow [{self.workflow_name}]"
self.workflow_yaml_config.job_to_config[job.name] = job_yaml_config
# populate WorkflowYaml.artifact_to_config
if self.config.artifacts:
for artifact in self.config.artifacts:
assert (
artifact.name not in self.workflow_yaml_config.artifact_to_config
), f"Artifact name [{artifact.name}] is not uniq, workflow [{self.workflow_name}]"
artifact_yaml_config = WorkflowYaml.ArtifactYaml(
name=artifact.name,
provided_by="",
required_by=[],
path=artifact.path,
type=artifact.type,
)
self.workflow_yaml_config.artifact_to_config[artifact.name] = (
artifact_yaml_config
)
# populate ArtifactYaml.provided_by
for job in self.config.jobs:
if job.provides:
for artifact_name in job.provides:
assert (
artifact_name in self.workflow_yaml_config.artifact_to_config
), f"Artifact [{artifact_name}] has no config, job [{job.name}], workflow [{self.workflow_name}]"
assert not self.workflow_yaml_config.artifact_to_config[
artifact_name
].provided_by, f"Artifact [{artifact_name}] provided by multiple jobs [{self.workflow_yaml_config.artifact_to_config[artifact_name].provided_by}] and [{job.name}]"
self.workflow_yaml_config.artifact_to_config[
artifact_name
].provided_by = job.name
# populate ArtifactYaml.required_by
for job in self.config.jobs:
if job.requires:
for artifact_name in job.requires:
assert (
artifact_name in self.workflow_yaml_config.artifact_to_config
), f"Artifact [{artifact_name}] has no config, job [{job.name}], workflow [{self.workflow_name}]"
assert self.workflow_yaml_config.artifact_to_config[
artifact_name
].provided_by, f"Artifact [{artifact_name}] has no job providing it, required by job [{job.name}], workflow [{self.workflow_name}]"
self.workflow_yaml_config.artifact_to_config[
artifact_name
].required_by.append(job.name)
# populate JobYaml.addons
for job in self.config.jobs:
if job.job_requirements:
addon_yaml = WorkflowYaml.JobAddonYaml(
requirements_txt_path=job.job_requirements.python_requirements_txt,
install_python=job.job_requirements.python,
)
self.workflow_yaml_config.job_to_config[job.name].addons.append(
addon_yaml
)
if self.config.enable_report:
for job in self.config.jobs:
# auth required for every job with enabled HTML, so that workflow summary status can be updated
self.workflow_yaml_config.job_to_config[job.name].gh_app_auth = True
# populate JobYaml.runs_on
for job in self.config.jobs:
self.workflow_yaml_config.job_to_config[job.name].runs_on = job.runs_on
# populate JobYaml.artifacts_gh_requires, JobYaml.artifacts_gh_provides and JobYaml.needs
for (
artifact_name,
artifact,
) in self.workflow_yaml_config.artifact_to_config.items():
# assert (
# artifact.provided_by
# and artifact.provided_by in self.workflow_yaml_config.job_to_config
# ), f"Artifact [{artifact_name}] has no valid job providing it [{artifact.provided_by}]"
for job_name in artifact.required_by:
if (
artifact.provided_by
not in self.workflow_yaml_config.job_to_config[job_name].needs
):
self.workflow_yaml_config.job_to_config[job_name].needs.append(
artifact.provided_by
)
if artifact.type in (Artifact.Type.GH,):
self.workflow_yaml_config.job_to_config[
job_name
].artifacts_gh_requires.append(artifact)
elif artifact.type in (Artifact.Type.PHONY, Artifact.Type.S3):
pass
else:
assert (
False
), f"Artifact [{artifact_name}] has unsupported type [{artifact.type}]"
if not artifact.required_by and artifact.type != Artifact.Type.PHONY:
print(
f"WARNING: Artifact [{artifact_name}] provided by job [{artifact.provided_by}] not required by any job in workflow [{self.workflow_name}]"
)
if artifact.type == Artifact.Type.GH:
self.workflow_yaml_config.job_to_config[
artifact.provided_by
].artifacts_gh_provides.append(artifact)
# populate JobYaml.parametrize
for job in self.config.jobs:
self.workflow_yaml_config.job_to_config[job.name].parameter = job.parameter
# populate secrets
for secret_config in self.config.secrets:
if secret_config.is_gh():
self.workflow_yaml_config.secret_names_gh.append(secret_config.name)
return self
if __name__ == "__main__":
# test
workflows = _get_workflows()
for workflow in workflows:
WorkflowConfigParser(workflow).parse()

354
ci/praktika/result.py Normal file
View File

@ -0,0 +1,354 @@
import dataclasses
import datetime
import sys
from collections.abc import Container
from pathlib import Path
from typing import Any, Dict, List, Optional
from praktika._environment import _Environment
from praktika._settings import _Settings
from praktika.utils import ContextManager, MetaClasses, Shell, Utils
@dataclasses.dataclass
class Result(MetaClasses.Serializable):
"""
Represents the outcome of a workflow/job/task or any operation, along with associated metadata.
This class supports nesting of results to represent tasks with sub-tasks, and includes
various attributes to track status, timing, files, and links.
Attributes:
name (str): The name of the task.
status (str): The current status of the task. Should be one of the values defined in the Status class.
start_time (Optional[float]): The start time of the task in Unix timestamp format. None if not started.
duration (Optional[float]): The duration of the task in seconds. None if not completed.
results (List[Result]): A list of sub-results representing nested tasks.
files (List[str]): A list of file paths or names related to the result.
links (List[str]): A list of URLs related to the result (e.g., links to reports or resources).
info (str): Additional information about the result. Free-form text.
# TODO: rename
aux_links (List[str]): A list of auxiliary links that provide additional context for the result.
# TODO: remove
html_link (str): A direct link to an HTML representation of the result (e.g., a detailed report page).
Inner Class:
Status: Defines possible statuses for the task, such as "success", "failure", etc.
"""
class Status:
SKIPPED = "skipped"
SUCCESS = "success"
FAILED = "failure"
PENDING = "pending"
RUNNING = "running"
ERROR = "error"
name: str
status: str
start_time: Optional[float] = None
duration: Optional[float] = None
results: List["Result"] = dataclasses.field(default_factory=list)
files: List[str] = dataclasses.field(default_factory=list)
links: List[str] = dataclasses.field(default_factory=list)
info: str = ""
aux_links: List[str] = dataclasses.field(default_factory=list)
html_link: str = ""
@staticmethod
def create_from(
name="",
results: List["Result"] = None,
stopwatch: Utils.Stopwatch = None,
status="",
files=None,
info="",
with_info_from_results=True,
):
if isinstance(status, bool):
status = Result.Status.SUCCESS if status else Result.Status.FAILED
if not results and not status:
print("ERROR: Either .results or .status must be provided")
raise
if not name:
name = _Environment.get().JOB_NAME
if not name:
print("ERROR: Failed to guess the .name")
raise
result_status = status or Result.Status.SUCCESS
infos = []
if info:
if isinstance(info, Container):
infos += info
else:
infos.append(info)
if results and not status:
for result in results:
if result.status not in (Result.Status.SUCCESS, Result.Status.FAILED):
Utils.raise_with_error(
f"Unexpected result status [{result.status}] for Result.create_from call"
)
if result.status != Result.Status.SUCCESS:
result_status = Result.Status.FAILED
if results:
for result in results:
if result.info and with_info_from_results:
infos.append(f"{result.name}: {result.info}")
return Result(
name=name,
status=result_status,
start_time=stopwatch.start_time if stopwatch else None,
duration=stopwatch.duration if stopwatch else None,
info="\n".join(infos) if infos else "",
results=results or [],
files=files or [],
)
@staticmethod
def get():
return Result.from_fs(_Environment.get().JOB_NAME)
def is_completed(self):
return self.status not in (Result.Status.PENDING, Result.Status.RUNNING)
def is_running(self):
return self.status not in (Result.Status.RUNNING,)
def is_ok(self):
return self.status in (Result.Status.SKIPPED, Result.Status.SUCCESS)
def set_status(self, status) -> "Result":
self.status = status
self.dump()
return self
def set_success(self) -> "Result":
return self.set_status(Result.Status.SUCCESS)
def set_results(self, results: List["Result"]) -> "Result":
self.results = results
self.dump()
return self
def set_files(self, files) -> "Result":
for file in files:
assert Path(
file
).is_file(), f"Not valid file [{file}] from file list [{files}]"
if not self.files:
self.files = []
self.files += files
self.dump()
return self
def set_info(self, info: str) -> "Result":
if self.info:
self.info += "\n"
self.info += info
self.dump()
return self
def set_link(self, link) -> "Result":
self.links.append(link)
self.dump()
return self
@classmethod
def file_name_static(cls, name):
return f"{_Settings.TEMP_DIR}/result_{Utils.normalize_string(name)}.json"
@classmethod
def from_dict(cls, obj: Dict[str, Any]) -> "Result":
sub_results = []
for result_dict in obj["results"] or []:
sub_res = cls.from_dict(result_dict)
sub_results.append(sub_res)
obj["results"] = sub_results
return Result(**obj)
def update_duration(self):
if not self.duration and self.start_time:
self.duration = datetime.datetime.utcnow().timestamp() - self.start_time
else:
if not self.duration:
print(
f"NOTE: duration is set for job [{self.name}] Result - do not update by CI"
)
else:
print(
f"NOTE: start_time is not set for job [{self.name}] Result - do not update duration"
)
return self
def update_sub_result(self, result: "Result"):
assert self.results, "BUG?"
for i, result_ in enumerate(self.results):
if result_.name == result.name:
self.results[i] = result
self._update_status()
return self
def _update_status(self):
was_pending = False
was_running = False
if self.status == self.Status.PENDING:
was_pending = True
if self.status == self.Status.RUNNING:
was_running = True
has_pending, has_running, has_failed = False, False, False
for result_ in self.results:
if result_.status in (self.Status.RUNNING,):
has_running = True
if result_.status in (self.Status.PENDING,):
has_pending = True
if result_.status in (self.Status.ERROR, self.Status.FAILED):
has_failed = True
if has_running:
self.status = self.Status.RUNNING
elif has_pending:
self.status = self.Status.PENDING
elif has_failed:
self.status = self.Status.FAILED
else:
self.status = self.Status.SUCCESS
if (was_pending or was_running) and self.status not in (
self.Status.PENDING,
self.Status.RUNNING,
):
print("Pipeline finished")
self.update_duration()
@classmethod
def generate_pending(cls, name, results=None):
return Result(
name=name,
status=Result.Status.PENDING,
start_time=None,
duration=None,
results=results or [],
files=[],
links=[],
info="",
)
@classmethod
def generate_skipped(cls, name, results=None):
return Result(
name=name,
status=Result.Status.SKIPPED,
start_time=None,
duration=None,
results=results or [],
files=[],
links=[],
info="from cache",
)
@classmethod
def create_from_command_execution(
cls,
name,
command,
with_log=False,
fail_fast=True,
workdir=None,
command_args=None,
command_kwargs=None,
):
"""
Executes shell commands or Python callables, optionally logging output, and handles errors.
:param name: Check name
:param command: Shell command (str) or Python callable, or list of them.
:param workdir: Optional working directory.
:param with_log: Boolean flag to log output to a file.
:param fail_fast: Boolean flag to stop execution if one command fails.
:param command_args: Positional arguments for the callable command.
:param command_kwargs: Keyword arguments for the callable command.
:return: Result object with status and optional log file.
"""
# Stopwatch to track execution time
stop_watch_ = Utils.Stopwatch()
command_args = command_args or []
command_kwargs = command_kwargs or {}
# Set log file path if logging is enabled
log_file = (
f"{_Settings.TEMP_DIR}/{Utils.normalize_string(name)}.log"
if with_log
else None
)
# Ensure the command is a list for consistent iteration
if not isinstance(command, list):
fail_fast = False
command = [command]
print(f"> Starting execution for [{name}]")
res = True # Track success/failure status
error_infos = []
for command_ in command:
if callable(command_):
# If command is a Python function, call it with provided arguments
result = command_(*command_args, **command_kwargs)
if isinstance(result, bool):
res = result
elif result:
error_infos.append(str(result))
res = False
else:
# Run shell command in a specified directory with logging and verbosity
with ContextManager.cd(workdir):
exit_code = Shell.run(command_, verbose=True, log_file=log_file)
res = exit_code == 0
# If fail_fast is enabled, stop on first failure
if not res and fail_fast:
print(f"Execution stopped due to failure in [{command_}]")
break
# Create and return the result object with status and log file (if any)
return Result.create_from(
name=name,
status=res,
stopwatch=stop_watch_,
info=error_infos,
files=[log_file] if log_file else None,
)
def finish_job_accordingly(self):
self.dump()
if not self.is_ok():
print("ERROR: Job Failed")
for result in self.results:
if not result.is_ok():
print("Failed checks:")
print(" | ", result)
sys.exit(1)
else:
print("ok")
class ResultInfo:
SETUP_ENV_JOB_FAILED = (
"Failed to set up job env, it's praktika bug or misconfiguration"
)
PRE_JOB_FAILED = (
"Failed to do a job pre-run step, it's praktika bug or misconfiguration"
)
KILLED = "Job killed or terminated, no Result provided"
NOT_FOUND_IMPOSSIBLE = (
"No Result file (bug, or job misbehaviour, must not ever happen)"
)
SKIPPED_DUE_TO_PREVIOUS_FAILURE = "Skipped due to previous failure"
TIMEOUT = "Timeout"
GH_STATUS_ERROR = "Failed to set GH commit status"
NOT_FINALIZED = (
"Job did not not provide Result: job script bug, died CI runner or praktika bug"
)
S3_ERROR = "S3 call failure"

348
ci/praktika/runner.py Normal file
View File

@ -0,0 +1,348 @@
import os
import re
import sys
import traceback
from pathlib import Path
from praktika._environment import _Environment
from praktika.artifact import Artifact
from praktika.cidb import CIDB
from praktika.digest import Digest
from praktika.hook_cache import CacheRunnerHooks
from praktika.hook_html import HtmlRunnerHooks
from praktika.result import Result, ResultInfo
from praktika.runtime import RunConfig
from praktika.s3 import S3
from praktika.settings import Settings
from praktika.utils import Shell, TeePopen, Utils
class Runner:
@staticmethod
def generate_dummy_environment(workflow, job):
print("WARNING: Generate dummy env for local test")
Shell.check(
f"mkdir -p {Settings.TEMP_DIR} {Settings.INPUT_DIR} {Settings.OUTPUT_DIR}"
)
_Environment(
WORKFLOW_NAME=workflow.name,
JOB_NAME=job.name,
REPOSITORY="",
BRANCH="",
SHA="",
PR_NUMBER=-1,
EVENT_TYPE="",
JOB_OUTPUT_STREAM="",
EVENT_FILE_PATH="",
CHANGE_URL="",
COMMIT_URL="",
BASE_BRANCH="",
RUN_URL="",
RUN_ID="",
INSTANCE_ID="",
INSTANCE_TYPE="",
INSTANCE_LIFE_CYCLE="",
LOCAL_RUN=True,
).dump()
workflow_config = RunConfig(
name=workflow.name,
digest_jobs={},
digest_dockers={},
sha="",
cache_success=[],
cache_success_base64=[],
cache_artifacts={},
)
for docker in workflow.dockers:
workflow_config.digest_dockers[docker.name] = Digest().calc_docker_digest(
docker, workflow.dockers
)
workflow_config.dump()
Result.generate_pending(job.name).dump()
def _setup_env(self, _workflow, job):
# source env file to write data into fs (workflow config json, workflow status json)
Shell.check(f". {Settings.ENV_SETUP_SCRIPT}", verbose=True, strict=True)
# parse the same env script and apply envs from python so that this process sees them
with open(Settings.ENV_SETUP_SCRIPT, "r") as f:
content = f.read()
export_pattern = re.compile(
r"export (\w+)=\$\(cat<<\'EOF\'\n(.*?)EOF\n\)", re.DOTALL
)
matches = export_pattern.findall(content)
for key, value in matches:
value = value.strip()
os.environ[key] = value
print(f"Set environment variable {key}.")
print("Read GH Environment")
env = _Environment.from_env()
env.JOB_NAME = job.name
env.PARAMETER = job.parameter
env.dump()
print(env)
return 0
def _pre_run(self, workflow, job):
env = _Environment.get()
result = Result(
name=job.name,
status=Result.Status.RUNNING,
start_time=Utils.timestamp(),
)
result.dump()
if workflow.enable_report and job.name != Settings.CI_CONFIG_JOB_NAME:
print("Update Job and Workflow Report")
HtmlRunnerHooks.pre_run(workflow, job)
print("Download required artifacts")
required_artifacts = []
if job.requires and workflow.artifacts:
for requires_artifact_name in job.requires:
for artifact in workflow.artifacts:
if (
artifact.name == requires_artifact_name
and artifact.type == Artifact.Type.S3
):
required_artifacts.append(artifact)
print(f"--- Job requires s3 artifacts [{required_artifacts}]")
if workflow.enable_cache:
prefixes = CacheRunnerHooks.pre_run(
_job=job, _workflow=workflow, _required_artifacts=required_artifacts
)
else:
prefixes = [env.get_s3_prefix()] * len(required_artifacts)
for artifact, prefix in zip(required_artifacts, prefixes):
s3_path = f"{Settings.S3_ARTIFACT_PATH}/{prefix}/{Utils.normalize_string(artifact._provided_by)}/{Path(artifact.path).name}"
assert S3.copy_file_from_s3(s3_path=s3_path, local_path=Settings.INPUT_DIR)
return 0
def _run(self, workflow, job, docker="", no_docker=False, param=None):
if param:
if not isinstance(param, str):
Utils.raise_with_error(
f"Custom param for local tests must be of type str, got [{type(param)}]"
)
env = _Environment.get()
env.dump()
if job.run_in_docker and not no_docker:
# TODO: add support for any image, including not from ci config (e.g. ubuntu:latest)
docker_tag = RunConfig.from_fs(workflow.name).digest_dockers[
job.run_in_docker
]
docker = docker or f"{job.run_in_docker}:{docker_tag}"
cmd = f"docker run --rm --user \"$(id -u):$(id -g)\" -e PYTHONPATH='{Settings.DOCKER_WD}:{Settings.DOCKER_WD}/ci' --volume ./:{Settings.DOCKER_WD} --volume {Settings.TEMP_DIR}:{Settings.TEMP_DIR} --workdir={Settings.DOCKER_WD} {docker} {job.command}"
else:
cmd = job.command
if param:
print(f"Custom --param [{param}] will be passed to job's script")
cmd += f" --param {param}"
print(f"--- Run command [{cmd}]")
with TeePopen(cmd, timeout=job.timeout) as process:
exit_code = process.wait()
result = Result.from_fs(job.name)
if exit_code != 0:
if not result.is_completed():
if process.timeout_exceeded:
print(
f"WARNING: Job timed out: [{job.name}], timeout [{job.timeout}], exit code [{exit_code}]"
)
result.set_status(Result.Status.ERROR).set_info(
ResultInfo.TIMEOUT
)
elif result.is_running():
info = f"ERROR: Job terminated with an error, exit code [{exit_code}] - set status to [{Result.Status.ERROR}]"
print(info)
result.set_status(Result.Status.ERROR).set_info(info)
else:
info = f"ERROR: Invalid status [{result.status}] for exit code [{exit_code}] - switch to [{Result.Status.ERROR}]"
print(info)
result.set_status(Result.Status.ERROR).set_info(info)
result.dump()
return exit_code
def _post_run(
self, workflow, job, setup_env_exit_code, prerun_exit_code, run_exit_code
):
info_errors = []
env = _Environment.get()
result_exist = Result.exist(job.name)
if setup_env_exit_code != 0:
info = f"ERROR: {ResultInfo.SETUP_ENV_JOB_FAILED}"
print(info)
# set Result with error and logs
Result(
name=job.name,
status=Result.Status.ERROR,
start_time=Utils.timestamp(),
duration=0.0,
info=info,
).dump()
elif prerun_exit_code != 0:
info = f"ERROR: {ResultInfo.PRE_JOB_FAILED}"
print(info)
# set Result with error and logs
Result(
name=job.name,
status=Result.Status.ERROR,
start_time=Utils.timestamp(),
duration=0.0,
info=info,
).dump()
elif not result_exist:
info = f"ERROR: {ResultInfo.NOT_FOUND_IMPOSSIBLE}"
print(info)
Result(
name=job.name,
start_time=Utils.timestamp(),
duration=None,
status=Result.Status.ERROR,
info=ResultInfo.NOT_FOUND_IMPOSSIBLE,
).dump()
result = Result.from_fs(job.name)
if not result.is_completed():
info = f"ERROR: {ResultInfo.KILLED}"
print(info)
result.set_info(info).set_status(Result.Status.ERROR).dump()
result.set_files(files=[Settings.RUN_LOG])
result.update_duration().dump()
if result.info and result.status != Result.Status.SUCCESS:
# provide job info to workflow level
info_errors.append(result.info)
if run_exit_code == 0:
providing_artifacts = []
if job.provides and workflow.artifacts:
for provides_artifact_name in job.provides:
for artifact in workflow.artifacts:
if (
artifact.name == provides_artifact_name
and artifact.type == Artifact.Type.S3
):
providing_artifacts.append(artifact)
if providing_artifacts:
print(f"Job provides s3 artifacts [{providing_artifacts}]")
for artifact in providing_artifacts:
try:
assert Shell.check(
f"ls -l {artifact.path}", verbose=True
), f"Artifact {artifact.path} not found"
s3_path = f"{Settings.S3_ARTIFACT_PATH}/{env.get_s3_prefix()}/{Utils.normalize_string(env.JOB_NAME)}"
link = S3.copy_file_to_s3(
s3_path=s3_path, local_path=artifact.path
)
result.set_link(link)
except Exception as e:
error = (
f"ERROR: Failed to upload artifact [{artifact}], ex [{e}]"
)
print(error)
info_errors.append(error)
result.set_status(Result.Status.ERROR)
if workflow.enable_cidb:
print("Insert results to CIDB")
try:
CIDB(
url=workflow.get_secret(Settings.SECRET_CI_DB_URL).get_value(),
passwd=workflow.get_secret(
Settings.SECRET_CI_DB_PASSWORD
).get_value(),
).insert(result)
except Exception as ex:
error = f"ERROR: Failed to insert data into CI DB, exception [{ex}]"
print(error)
info_errors.append(error)
result.dump()
# always in the end
if workflow.enable_cache:
print(f"Run CI cache hook")
if result.is_ok():
CacheRunnerHooks.post_run(workflow, job)
if workflow.enable_report:
print(f"Run html report hook")
HtmlRunnerHooks.post_run(workflow, job, info_errors)
return True
def run(
self, workflow, job, docker="", dummy_env=False, no_docker=False, param=None
):
res = True
setup_env_code = -10
prerun_code = -10
run_code = -10
if res and not dummy_env:
print(
f"\n\n=== Setup env script [{job.name}], workflow [{workflow.name}] ==="
)
try:
setup_env_code = self._setup_env(workflow, job)
# Source the bash script and capture the environment variables
res = setup_env_code == 0
if not res:
print(
f"ERROR: Setup env script failed with exit code [{setup_env_code}]"
)
except Exception as e:
print(f"ERROR: Setup env script failed with exception [{e}]")
traceback.print_exc()
print(f"=== Setup env finished ===\n\n")
else:
self.generate_dummy_environment(workflow, job)
if res and not dummy_env:
res = False
print(f"=== Pre run script [{job.name}], workflow [{workflow.name}] ===")
try:
prerun_code = self._pre_run(workflow, job)
res = prerun_code == 0
if not res:
print(f"ERROR: Pre-run failed with exit code [{prerun_code}]")
except Exception as e:
print(f"ERROR: Pre-run script failed with exception [{e}]")
traceback.print_exc()
print(f"=== Pre run finished ===\n\n")
if res:
res = False
print(f"=== Run script [{job.name}], workflow [{workflow.name}] ===")
try:
run_code = self._run(
workflow, job, docker=docker, no_docker=no_docker, param=param
)
res = run_code == 0
if not res:
print(f"ERROR: Run failed with exit code [{run_code}]")
except Exception as e:
print(f"ERROR: Run script failed with exception [{e}]")
traceback.print_exc()
print(f"=== Run scrip finished ===\n\n")
if not dummy_env:
print(f"=== Post run script [{job.name}], workflow [{workflow.name}] ===")
self._post_run(workflow, job, setup_env_code, prerun_code, run_code)
print(f"=== Post run scrip finished ===")
if not res:
sys.exit(1)

35
ci/praktika/runtime.py Normal file
View File

@ -0,0 +1,35 @@
from dataclasses import dataclass
from typing import Dict, List
from praktika.cache import Cache
from praktika.settings import Settings
from praktika.utils import MetaClasses, Utils
@dataclass
class RunConfig(MetaClasses.Serializable):
name: str
digest_jobs: Dict[str, str]
digest_dockers: Dict[str, str]
cache_success: List[str]
# there are might be issue with special characters in job names if used directly in yaml syntax - create base64 encoded list to avoid this
cache_success_base64: List[str]
cache_artifacts: Dict[str, Cache.CacheRecord]
sha: str
@classmethod
def from_dict(cls, obj):
cache_artifacts = obj["cache_artifacts"]
cache_artifacts_deserialized = {}
for artifact_name, cache_artifact in cache_artifacts.items():
cache_artifacts_deserialized[artifact_name] = Cache.CacheRecord.from_dict(
cache_artifact
)
obj["cache_artifacts"] = cache_artifacts_deserialized
return RunConfig(**obj)
@classmethod
def file_name_static(cls, name):
return (
f"{Settings.TEMP_DIR}/workflow_config_{Utils.normalize_string(name)}.json"
)

295
ci/praktika/s3.py Normal file
View File

@ -0,0 +1,295 @@
import dataclasses
import json
import time
from pathlib import Path
from typing import Dict
from praktika._environment import _Environment
from praktika.settings import Settings
from praktika.utils import Shell, Utils
class S3:
@dataclasses.dataclass
class Object:
AcceptRanges: str
Expiration: str
LastModified: str
ContentLength: int
ETag: str
ContentType: str
ServerSideEncryption: str
Metadata: Dict
def has_tags(self, tags):
meta = self.Metadata
for k, v in tags.items():
if k not in meta or meta[k] != v:
print(f"tag [{k}={v}] does not match meta [{meta}]")
return False
return True
@classmethod
def clean_s3_directory(cls, s3_path):
assert len(s3_path.split("/")) > 2, "check to not delete too much"
cmd = f"aws s3 rm s3://{s3_path} --recursive"
cls.run_command_with_retries(cmd, retries=1)
return
@classmethod
def copy_file_to_s3(cls, s3_path, local_path, text=False):
assert Path(local_path).exists(), f"Path [{local_path}] does not exist"
assert Path(s3_path), f"Invalid S3 Path [{s3_path}]"
assert Path(
local_path
).is_file(), f"Path [{local_path}] is not file. Only files are supported"
file_name = Path(local_path).name
s3_full_path = s3_path
if not s3_full_path.endswith(file_name):
s3_full_path = f"{s3_path}/{Path(local_path).name}"
cmd = f"aws s3 cp {local_path} s3://{s3_full_path}"
if text:
cmd += " --content-type text/plain"
res = cls.run_command_with_retries(cmd)
if not res:
raise
bucket = s3_path.split("/")[0]
endpoint = Settings.S3_BUCKET_TO_HTTP_ENDPOINT[bucket]
assert endpoint
return f"https://{s3_full_path}".replace(bucket, endpoint)
@classmethod
def put(cls, s3_path, local_path, text=False, metadata=None):
assert Path(local_path).exists(), f"Path [{local_path}] does not exist"
assert Path(s3_path), f"Invalid S3 Path [{s3_path}]"
assert Path(
local_path
).is_file(), f"Path [{local_path}] is not file. Only files are supported"
file_name = Path(local_path).name
s3_full_path = s3_path
if not s3_full_path.endswith(file_name):
s3_full_path = f"{s3_path}/{Path(local_path).name}"
s3_full_path = str(s3_full_path).removeprefix("s3://")
bucket, key = s3_full_path.split("/", maxsplit=1)
command = (
f"aws s3api put-object --bucket {bucket} --key {key} --body {local_path}"
)
if metadata:
for k, v in metadata.items():
command += f" --metadata {k}={v}"
cmd = f"aws s3 cp {local_path} s3://{s3_full_path}"
if text:
cmd += " --content-type text/plain"
res = cls.run_command_with_retries(command)
assert res
@classmethod
def run_command_with_retries(cls, command, retries=Settings.MAX_RETRIES_S3):
i = 0
res = False
while not res and i < retries:
i += 1
ret_code, stdout, stderr = Shell.get_res_stdout_stderr(
command, verbose=True
)
if "aws sso login" in stderr:
print("ERROR: aws login expired")
break
elif "does not exist" in stderr:
print("ERROR: requested file does not exist")
break
if ret_code != 0:
print(
f"ERROR: aws s3 cp failed, stdout/stderr err: [{stderr}], out [{stdout}]"
)
res = ret_code == 0
return res
@classmethod
def get_link(cls, s3_path, local_path):
s3_full_path = f"{s3_path}/{Path(local_path).name}"
bucket = s3_path.split("/")[0]
endpoint = Settings.S3_BUCKET_TO_HTTP_ENDPOINT[bucket]
return f"https://{s3_full_path}".replace(bucket, endpoint)
@classmethod
def copy_file_from_s3(cls, s3_path, local_path):
assert Path(s3_path), f"Invalid S3 Path [{s3_path}]"
if Path(local_path).is_dir():
local_path = Path(local_path) / Path(s3_path).name
else:
assert Path(
local_path
).parent.is_dir(), f"Parent path for [{local_path}] does not exist"
cmd = f"aws s3 cp s3://{s3_path} {local_path}"
res = cls.run_command_with_retries(cmd)
return res
@classmethod
def head_object(cls, s3_path):
s3_path = str(s3_path).removeprefix("s3://")
bucket, key = s3_path.split("/", maxsplit=1)
output = Shell.get_output(
f"aws s3api head-object --bucket {bucket} --key {key}", verbose=True
)
if not output:
return None
else:
return cls.Object(**json.loads(output))
@classmethod
def delete(cls, s3_path):
assert Path(s3_path), f"Invalid S3 Path [{s3_path}]"
return Shell.check(
f"aws s3 rm s3://{s3_path}",
verbose=True,
)
# TODO: apparently should be placed into separate file to be used only inside praktika
# keeping this module clean from importing Settings, Environment and etc, making it easy for use externally
@classmethod
def copy_result_to_s3(cls, result, unlock=True):
result.dump()
env = _Environment.get()
s3_path = f"{Settings.HTML_S3_PATH}/{env.get_s3_prefix()}"
s3_path_full = f"{s3_path}/{Path(result.file_name()).name}"
url = S3.copy_file_to_s3(s3_path=s3_path, local_path=result.file_name())
if env.PR_NUMBER:
print("Duplicate Result for latest commit alias in PR")
s3_path = f"{Settings.HTML_S3_PATH}/{env.get_s3_prefix(latest=True)}"
url = S3.copy_file_to_s3(s3_path=s3_path, local_path=result.file_name())
if unlock:
if not cls.unlock(s3_path_full):
print(f"ERROR: File [{s3_path_full}] unlock failure")
assert False # TODO: investigate
return url
@classmethod
def copy_result_from_s3(cls, local_path, lock=True):
env = _Environment.get()
file_name = Path(local_path).name
s3_path = f"{Settings.HTML_S3_PATH}/{env.get_s3_prefix()}/{file_name}"
if lock:
cls.lock(s3_path)
if not S3.copy_file_from_s3(s3_path=s3_path, local_path=local_path):
print(f"ERROR: failed to cp file [{s3_path}] from s3")
raise
@classmethod
def lock(cls, s3_path, level=0):
assert level < 3, "Never"
env = _Environment.get()
s3_path_lock = s3_path + f".lock"
file_path_lock = f"{Settings.TEMP_DIR}/{Path(s3_path_lock).name}"
assert Shell.check(
f"echo '''{env.JOB_NAME}''' > {file_path_lock}", verbose=True
), "Never"
i = 20
meta = S3.head_object(s3_path_lock)
while meta:
print(f"WARNING: Failed to acquire lock, meta [{meta}] - wait")
i -= 5
if i < 0:
info = f"ERROR: lock acquire failure - unlock forcefully"
print(info)
env.add_info(info)
break
time.sleep(5)
metadata = {"job": Utils.to_base64(env.JOB_NAME)}
S3.put(
s3_path=s3_path_lock,
local_path=file_path_lock,
metadata=metadata,
)
time.sleep(1)
obj = S3.head_object(s3_path_lock)
if not obj or not obj.has_tags(tags=metadata):
print(f"WARNING: locked by another job [{obj}]")
env.add_info("S3 lock file failure")
cls.lock(s3_path, level=level + 1)
print("INFO: lock acquired")
@classmethod
def unlock(cls, s3_path):
s3_path_lock = s3_path + ".lock"
env = _Environment.get()
obj = S3.head_object(s3_path_lock)
if not obj:
print("ERROR: lock file is removed")
assert False # investigate
elif not obj.has_tags({"job": Utils.to_base64(env.JOB_NAME)}):
print("ERROR: lock file was acquired by another job")
assert False # investigate
if not S3.delete(s3_path_lock):
print(f"ERROR: File [{s3_path_lock}] delete failure")
print("INFO: lock released")
return True
@classmethod
def get_result_link(cls, result):
env = _Environment.get()
s3_path = f"{Settings.HTML_S3_PATH}/{env.get_s3_prefix(latest=True if env.PR_NUMBER else False)}"
return S3.get_link(s3_path=s3_path, local_path=result.file_name())
@classmethod
def clean_latest_result(cls):
env = _Environment.get()
env.SHA = "latest"
assert env.PR_NUMBER
s3_path = f"{Settings.HTML_S3_PATH}/{env.get_s3_prefix()}"
S3.clean_s3_directory(s3_path=s3_path)
@classmethod
def _upload_file_to_s3(
cls, local_file_path, upload_to_s3: bool, text: bool = False, s3_subprefix=""
) -> str:
if upload_to_s3:
env = _Environment.get()
s3_path = f"{Settings.HTML_S3_PATH}/{env.get_s3_prefix()}"
if s3_subprefix:
s3_subprefix.removeprefix("/").removesuffix("/")
s3_path += f"/{s3_subprefix}"
html_link = S3.copy_file_to_s3(
s3_path=s3_path, local_path=local_file_path, text=text
)
return html_link
return f"file://{Path(local_file_path).absolute()}"
@classmethod
def upload_result_files_to_s3(cls, result):
if result.results:
for result_ in result.results:
cls.upload_result_files_to_s3(result_)
for file in result.files:
if not Path(file).is_file():
print(f"ERROR: Invalid file [{file}] in [{result.name}] - skip upload")
result.info += f"\nWARNING: Result file [{file}] was not found"
file_link = cls._upload_file_to_s3(file, upload_to_s3=False)
else:
is_text = False
for text_file_suffix in Settings.TEXT_CONTENT_EXTENSIONS:
if file.endswith(text_file_suffix):
print(
f"File [{file}] matches Settings.TEXT_CONTENT_EXTENSIONS [{Settings.TEXT_CONTENT_EXTENSIONS}] - add text attribute for s3 object"
)
is_text = True
break
file_link = cls._upload_file_to_s3(
file,
upload_to_s3=True,
text=is_text,
s3_subprefix=Utils.normalize_string(result.name),
)
result.links.append(file_link)
if result.files:
print(
f"Result files [{result.files}] uploaded to s3 [{result.links[-len(result.files):]}] - clean files list"
)
result.files = []
result.dump()

61
ci/praktika/secret.py Normal file
View File

@ -0,0 +1,61 @@
import dataclasses
import os
from praktika.utils import Shell
class Secret:
class Type:
AWS_SSM_VAR = "aws parameter"
AWS_SSM_SECRET = "aws secret"
GH_SECRET = "gh secret"
@dataclasses.dataclass
class Config:
name: str
type: str
def is_gh(self):
return self.type == Secret.Type.GH_SECRET
def get_value(self):
if self.type == Secret.Type.AWS_SSM_VAR:
return self.get_aws_ssm_var()
if self.type == Secret.Type.AWS_SSM_SECRET:
return self.get_aws_ssm_secret()
elif self.type == Secret.Type.GH_SECRET:
return self.get_gh_secret()
else:
assert False, f"Not supported secret type, secret [{self}]"
def get_aws_ssm_var(self):
res = Shell.get_output(
f"aws ssm get-parameter --name {self.name} --with-decryption --output text --query Parameter.Value",
)
if not res:
print(f"ERROR: Failed to get secret [{self.name}]")
raise RuntimeError()
return res
def get_aws_ssm_secret(self):
name, secret_key_name = self.name, ""
if "." in self.name:
name, secret_key_name = self.name.split(".")
cmd = f"aws secretsmanager get-secret-value --secret-id {name} --query SecretString --output text"
if secret_key_name:
cmd += f" | jq -r '.[\"{secret_key_name}\"]'"
res = Shell.get_output(cmd, verbose=True)
if not res:
print(f"ERROR: Failed to get secret [{self.name}]")
raise RuntimeError()
return res
def get_gh_secret(self):
res = os.getenv(f"{self.name}")
if not res:
print(f"ERROR: Failed to get secret [{self.name}]")
raise RuntimeError()
return res
def __repr__(self):
return self.name

8
ci/praktika/settings.py Normal file
View File

@ -0,0 +1,8 @@
from praktika._settings import _Settings
from praktika.mangle import _get_user_settings
Settings = _Settings()
user_settings = _get_user_settings()
for setting, value in user_settings.items():
Settings.__setattr__(setting, value)

597
ci/praktika/utils.py Normal file
View File

@ -0,0 +1,597 @@
import base64
import dataclasses
import glob
import json
import multiprocessing
import os
import re
import signal
import subprocess
import sys
import time
from abc import ABC, abstractmethod
from contextlib import contextmanager
from datetime import datetime
from pathlib import Path
from threading import Thread
from types import SimpleNamespace
from typing import Any, Dict, Iterator, List, Optional, Type, TypeVar, Union
from praktika._settings import _Settings
T = TypeVar("T", bound="Serializable")
class MetaClasses:
class WithIter(type):
def __iter__(cls):
return (v for k, v in cls.__dict__.items() if not k.startswith("_"))
@dataclasses.dataclass
class Serializable(ABC):
@classmethod
def to_dict(cls, obj):
if dataclasses.is_dataclass(obj):
return {k: cls.to_dict(v) for k, v in dataclasses.asdict(obj).items()}
elif isinstance(obj, SimpleNamespace):
return {k: cls.to_dict(v) for k, v in vars(obj).items()}
elif isinstance(obj, list):
return [cls.to_dict(i) for i in obj]
elif isinstance(obj, dict):
return {k: cls.to_dict(v) for k, v in obj.items()}
else:
return obj
@classmethod
def from_dict(cls: Type[T], obj: Dict[str, Any]) -> T:
return cls(**obj)
@classmethod
def from_fs(cls: Type[T], name) -> T:
with open(cls.file_name_static(name), "r", encoding="utf8") as f:
try:
return cls.from_dict(json.load(f))
except json.decoder.JSONDecodeError as ex:
print(f"ERROR: failed to parse json, ex [{ex}]")
print(f"JSON content [{cls.file_name_static(name)}]")
Shell.check(f"cat {cls.file_name_static(name)}")
raise ex
@classmethod
@abstractmethod
def file_name_static(cls, name):
pass
def file_name(self):
return self.file_name_static(self.name)
def dump(self):
with open(self.file_name(), "w", encoding="utf8") as f:
json.dump(self.to_dict(self), f, indent=4)
return self
@classmethod
def exist(cls, name):
return Path(cls.file_name_static(name)).is_file()
def to_json(self, pretty=False):
return json.dumps(dataclasses.asdict(self), indent=4 if pretty else None)
class ContextManager:
@staticmethod
@contextmanager
def cd(to: Optional[Union[Path, str]] = None) -> Iterator[None]:
"""
changes current working directory to @path or `git root` if @path is None
:param to:
:return:
"""
if not to:
try:
to = Shell.get_output_or_raise("git rev-parse --show-toplevel")
except:
pass
if not to:
if Path(_Settings.DOCKER_WD).is_dir():
to = _Settings.DOCKER_WD
if not to:
assert False, "FIX IT"
assert to
old_pwd = os.getcwd()
os.chdir(to)
try:
yield
finally:
os.chdir(old_pwd)
class Shell:
@classmethod
def get_output_or_raise(cls, command, verbose=False):
return cls.get_output(command, verbose=verbose, strict=True).strip()
@classmethod
def get_output(cls, command, strict=False, verbose=False):
if verbose:
print(f"Run command [{command}]")
res = subprocess.run(
command,
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
)
if res.stderr:
print(f"WARNING: stderr: {res.stderr.strip()}")
if strict and res.returncode != 0:
raise RuntimeError(f"command failed with {res.returncode}")
return res.stdout.strip()
@classmethod
def get_res_stdout_stderr(cls, command, verbose=True):
if verbose:
print(f"Run command [{command}]")
res = subprocess.run(
command,
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
)
return res.returncode, res.stdout.strip(), res.stderr.strip()
@classmethod
def check(
cls,
command,
log_file=None,
strict=False,
verbose=False,
dry_run=False,
stdin_str=None,
timeout=None,
retries=0,
**kwargs,
):
return (
cls.run(
command,
log_file,
strict,
verbose,
dry_run,
stdin_str,
retries=retries,
timeout=timeout,
**kwargs,
)
== 0
)
@classmethod
def run(
cls,
command,
log_file=None,
strict=False,
verbose=False,
dry_run=False,
stdin_str=None,
timeout=None,
retries=0,
**kwargs,
):
def _check_timeout(timeout, process) -> None:
if not timeout:
return
time.sleep(timeout)
print(
f"WARNING: Timeout exceeded [{timeout}], sending SIGTERM to process group [{process.pid}]"
)
try:
os.killpg(process.pid, signal.SIGTERM)
except ProcessLookupError:
print("Process already terminated.")
return
time_wait = 0
wait_interval = 5
# Wait for process to terminate
while process.poll() is None and time_wait < 100:
print("Waiting for process to exit...")
time.sleep(wait_interval)
time_wait += wait_interval
# Force kill if still running
if process.poll() is None:
print(f"WARNING: Process still running after SIGTERM, sending SIGKILL")
try:
os.killpg(process.pid, signal.SIGKILL)
except ProcessLookupError:
print("Process already terminated.")
# Dry-run
if dry_run:
print(f"Dry-run. Would run command [{command}]")
return 0 # Return success for dry-run
if verbose:
print(f"Run command: [{command}]")
log_file = log_file or "/dev/null"
proc = None
for retry in range(retries + 1):
try:
with open(log_file, "w") as log_fp:
proc = subprocess.Popen(
command,
shell=True,
stderr=subprocess.STDOUT,
stdout=subprocess.PIPE,
stdin=subprocess.PIPE if stdin_str else None,
universal_newlines=True,
start_new_session=True, # Start a new process group for signal handling
bufsize=1, # Line-buffered
errors="backslashreplace",
**kwargs,
)
# Start the timeout thread if specified
if timeout:
t = Thread(target=_check_timeout, args=(timeout, proc))
t.daemon = True
t.start()
# Write stdin if provided
if stdin_str:
proc.stdin.write(stdin_str)
proc.stdin.close()
# Process output in real-time
if proc.stdout:
for line in proc.stdout:
sys.stdout.write(line)
log_fp.write(line)
proc.wait() # Wait for the process to finish
if proc.returncode == 0:
break # Exit retry loop if success
else:
if verbose:
print(
f"ERROR: command [{command}] failed, exit code: {proc.returncode}, retry: {retry}/{retries}"
)
except Exception as e:
if verbose:
print(
f"ERROR: command failed, exception: {e}, retry: {retry}/{retries}"
)
if proc:
proc.kill()
# Handle strict mode (ensure process success or fail)
if strict:
assert (
proc and proc.returncode == 0
), f"Command failed with return code {proc.returncode}"
return proc.returncode if proc else 1 # Return 1 if process never started
@classmethod
def run_async(
cls,
command,
stdin_str=None,
verbose=False,
suppress_output=False,
**kwargs,
):
if verbose:
print(f"Run command in background [{command}]")
proc = subprocess.Popen(
command,
shell=True,
stderr=subprocess.STDOUT if not suppress_output else subprocess.DEVNULL,
stdout=subprocess.PIPE if not suppress_output else subprocess.DEVNULL,
stdin=subprocess.PIPE if stdin_str else None,
universal_newlines=True,
start_new_session=True,
bufsize=1,
errors="backslashreplace",
**kwargs,
)
if proc.stdout:
for line in proc.stdout:
print(line, end="")
return proc
class Utils:
@staticmethod
def terminate_process_group(pid, force=False):
if not force:
os.killpg(os.getpgid(pid), signal.SIGTERM)
else:
os.killpg(os.getpgid(pid), signal.SIGKILL)
@staticmethod
def set_env(key, val):
os.environ[key] = val
@staticmethod
def print_formatted_error(error_message, stdout="", stderr=""):
stdout_lines = stdout.splitlines() if stdout else []
stderr_lines = stderr.splitlines() if stderr else []
print(f"ERROR: {error_message}")
if stdout_lines:
print(" Out:")
for line in stdout_lines:
print(f" | {line}")
if stderr_lines:
print(" Err:")
for line in stderr_lines:
print(f" | {line}")
@staticmethod
def sleep(seconds):
time.sleep(seconds)
@staticmethod
def cwd():
return Path.cwd()
@staticmethod
def cpu_count():
return multiprocessing.cpu_count()
@staticmethod
def raise_with_error(error_message, stdout="", stderr="", ex=None):
Utils.print_formatted_error(error_message, stdout, stderr)
raise ex or RuntimeError()
@staticmethod
def timestamp():
return datetime.utcnow().timestamp()
@staticmethod
def timestamp_to_str(timestamp):
return datetime.utcfromtimestamp(timestamp).strftime("%Y-%m-%d %H:%M:%S")
@staticmethod
def get_failed_tests_number(description: str) -> Optional[int]:
description = description.lower()
pattern = r"fail:\s*(\d+)\s*(?=,|$)"
match = re.search(pattern, description)
if match:
return int(match.group(1))
return None
@staticmethod
def is_killed_with_oom():
if Shell.check(
"sudo dmesg -T | grep -q -e 'Out of memory: Killed process' -e 'oom_reaper: reaped process' -e 'oom-kill:constraint=CONSTRAINT_NONE'"
):
return True
return False
@staticmethod
def clear_dmesg():
Shell.check("sudo dmesg --clear", verbose=True)
@staticmethod
def to_base64(value):
assert isinstance(value, str), f"TODO: not supported for {type(value)}"
string_bytes = value.encode("utf-8")
base64_bytes = base64.b64encode(string_bytes)
base64_string = base64_bytes.decode("utf-8")
return base64_string
@staticmethod
def is_hex(s):
try:
int(s, 16)
return True
except ValueError:
return False
@staticmethod
def normalize_string(string: str) -> str:
res = string.lower()
for r in (
(" ", "_"),
("(", ""),
(")", ""),
("{", ""),
("}", ""),
("'", ""),
("[", ""),
("]", ""),
(",", ""),
("/", "_"),
("-", "_"),
(":", ""),
('"', ""),
):
res = res.replace(*r)
return res
@staticmethod
def traverse_path(path, file_suffixes=None, sorted=False, not_exists_ok=False):
res = []
def is_valid_file(file):
if file_suffixes is None:
return True
return any(file.endswith(suffix) for suffix in file_suffixes)
if os.path.isfile(path):
if is_valid_file(path):
res.append(path)
elif os.path.isdir(path):
for root, dirs, files in os.walk(path):
for file in files:
full_path = os.path.join(root, file)
if is_valid_file(full_path):
res.append(full_path)
elif "*" in str(path):
res.extend(
[
f
for f in glob.glob(path, recursive=True)
if os.path.isfile(f) and is_valid_file(f)
]
)
else:
if not_exists_ok:
pass
else:
assert False, f"File does not exist or not valid [{path}]"
if sorted:
res.sort(reverse=True)
return res
@classmethod
def traverse_paths(
cls,
include_paths,
exclude_paths,
file_suffixes=None,
sorted=False,
not_exists_ok=False,
) -> List["str"]:
included_files_ = set()
for path in include_paths:
included_files_.update(cls.traverse_path(path, file_suffixes=file_suffixes))
excluded_files = set()
for path in exclude_paths:
res = cls.traverse_path(path, not_exists_ok=not_exists_ok)
if not res:
print(
f"WARNING: Utils.traverse_paths excluded 0 files by path [{path}] in exclude_paths"
)
else:
excluded_files.update(res)
res = [f for f in included_files_ if f not in excluded_files]
if sorted:
res.sort(reverse=True)
return res
@classmethod
def add_to_PATH(cls, path):
path_cur = os.getenv("PATH", "")
if path_cur:
path += ":" + path_cur
os.environ["PATH"] = path
class Stopwatch:
def __init__(self):
self.start_time = datetime.utcnow().timestamp()
@property
def duration(self) -> float:
return datetime.utcnow().timestamp() - self.start_time
class TeePopen:
def __init__(
self,
command: str,
log_file: Union[str, Path] = "",
env: Optional[dict] = None,
timeout: Optional[int] = None,
):
self.command = command
self.log_file_name = log_file
self.log_file = None
self.env = env or os.environ.copy()
self.process = None # type: Optional[subprocess.Popen]
self.timeout = timeout
self.timeout_exceeded = False
self.terminated_by_sigterm = False
self.terminated_by_sigkill = False
def _check_timeout(self) -> None:
if self.timeout is None:
return
time.sleep(self.timeout)
print(
f"WARNING: Timeout exceeded [{self.timeout}], send SIGTERM to [{self.process.pid}] and give a chance for graceful termination"
)
self.send_signal(signal.SIGTERM)
time_wait = 0
self.terminated_by_sigterm = True
self.timeout_exceeded = True
while self.process.poll() is None and time_wait < 100:
print("wait...")
wait = 5
time.sleep(wait)
time_wait += wait
while self.process.poll() is None:
print(f"WARNING: Still running, send SIGKILL to [{self.process.pid}]")
self.send_signal(signal.SIGKILL)
self.terminated_by_sigkill = True
time.sleep(2)
def __enter__(self) -> "TeePopen":
if self.log_file_name:
self.log_file = open(self.log_file_name, "w", encoding="utf-8")
self.process = subprocess.Popen(
self.command,
shell=True,
universal_newlines=True,
env=self.env,
start_new_session=True, # signall will be sent to all children
stderr=subprocess.STDOUT,
stdout=subprocess.PIPE,
bufsize=1,
errors="backslashreplace",
)
time.sleep(1)
print(f"Subprocess started, pid [{self.process.pid}]")
if self.timeout is not None and self.timeout > 0:
t = Thread(target=self._check_timeout)
t.daemon = True # does not block the program from exit
t.start()
return self
def __exit__(self, exc_type, exc_value, traceback):
self.wait()
if self.log_file:
self.log_file.close()
def wait(self) -> int:
if self.process.stdout is not None:
for line in self.process.stdout:
sys.stdout.write(line)
if self.log_file:
self.log_file.write(line)
return self.process.wait()
def poll(self):
return self.process.poll()
def send_signal(self, signal_num):
os.killpg(self.process.pid, signal_num)
if __name__ == "__main__":
@dataclasses.dataclass
class Test(MetaClasses.Serializable):
name: str
@staticmethod
def file_name_static(name):
return f"/tmp/{Utils.normalize_string(name)}.json"
Test(name="dsada").dump()
t = Test.from_fs("dsada")
print(t)

208
ci/praktika/validator.py Normal file
View File

@ -0,0 +1,208 @@
import glob
import sys
from itertools import chain
from pathlib import Path
from praktika import Workflow
from praktika._settings import GHRunners
from praktika.mangle import _get_workflows
from praktika.settings import Settings
from praktika.utils import ContextManager
class Validator:
@classmethod
def validate(cls):
print("---Start validating Pipeline and settings---")
workflows = _get_workflows()
for workflow in workflows:
print(f"Validating workflow [{workflow.name}]")
cls.validate_file_paths_in_run_command(workflow)
cls.validate_file_paths_in_digest_configs(workflow)
cls.validate_requirements_txt_files(workflow)
cls.validate_dockers(workflow)
if workflow.artifacts:
for artifact in workflow.artifacts:
if artifact.is_s3_artifact():
assert (
Settings.S3_ARTIFACT_PATH
), "Provide S3_ARTIFACT_PATH setting in any .py file in ./ci/settings/* to be able to use s3 for artifacts"
for job in workflow.jobs:
if job.requires and workflow.artifacts:
for require in job.requires:
if (
require in workflow.artifacts
and workflow.artifacts[require].is_s3_artifact()
):
assert not any(
[r in GHRunners for r in job.runs_on]
), f"GH runners [{job.name}:{job.runs_on}] must not be used with S3 as artifact storage"
if job.allow_merge_on_failure:
assert (
workflow.enable_merge_ready_status
), f"Job property allow_merge_on_failure must be used only with enabled workflow.enable_merge_ready_status, workflow [{workflow.name}], job [{job.name}]"
if workflow.enable_cache:
assert (
Settings.CI_CONFIG_RUNS_ON
), f"Runner label to run workflow config job must be provided via CACHE_CONFIG_RUNS_ON setting if enable_cache=True, workflow [{workflow.name}]"
assert (
Settings.CACHE_S3_PATH
), f"CACHE_S3_PATH Setting must be defined if enable_cache=True, workflow [{workflow.name}]"
if workflow.dockers:
cls.evaluate_check(
Settings.DOCKER_BUILD_RUNS_ON,
f"DOCKER_BUILD_RUNS_ON settings must be defined if workflow has dockers",
workflow_name=workflow.name,
)
if workflow.enable_report:
assert (
Settings.HTML_S3_PATH
), f"HTML_S3_PATH Setting must be defined if enable_html=True, workflow [{workflow.name}]"
assert (
Settings.S3_BUCKET_TO_HTTP_ENDPOINT
), f"S3_BUCKET_TO_HTTP_ENDPOINT Setting must be defined if enable_html=True, workflow [{workflow.name}]"
assert (
Settings.HTML_S3_PATH.split("/")[0]
in Settings.S3_BUCKET_TO_HTTP_ENDPOINT
), f"S3_BUCKET_TO_HTTP_ENDPOINT Setting must include bucket name [{Settings.HTML_S3_PATH}] from HTML_S3_PATH, workflow [{workflow.name}]"
if workflow.enable_cache:
for artifact in workflow.artifacts or []:
assert (
artifact.is_s3_artifact()
), f"All artifacts must be of S3 type if enable_cache|enable_html=True, artifact [{artifact.name}], type [{artifact.type}], workflow [{workflow.name}]"
if workflow.dockers:
assert (
Settings.DOCKERHUB_USERNAME
), f"Settings.DOCKERHUB_USERNAME must be provided if workflow has dockers, workflow [{workflow.name}]"
assert (
Settings.DOCKERHUB_SECRET
), f"Settings.DOCKERHUB_SECRET must be provided if workflow has dockers, workflow [{workflow.name}]"
assert workflow.get_secret(
Settings.DOCKERHUB_SECRET
), f"Secret [{Settings.DOCKERHUB_SECRET}] must have configuration in workflow.secrets, workflow [{workflow.name}]"
if (
workflow.enable_cache
or workflow.enable_report
or workflow.enable_merge_ready_status
):
for job in workflow.jobs:
assert not any(
job in ("ubuntu-latest",) for job in job.runs_on
), f"GitHub Runners must not be used for workflow with enabled: workflow.enable_cache, workflow.enable_html or workflow.enable_merge_ready_status as s3 access is required, workflow [{workflow.name}], job [{job.name}]"
if workflow.enable_cidb:
assert (
Settings.SECRET_CI_DB_URL
), f"Settings.CI_DB_URL_SECRET must be provided if workflow.enable_cidb=True, workflow [{workflow.name}]"
assert (
Settings.SECRET_CI_DB_PASSWORD
), f"Settings.CI_DB_PASSWORD_SECRET must be provided if workflow.enable_cidb=True, workflow [{workflow.name}]"
assert (
Settings.CI_DB_DB_NAME
), f"Settings.CI_DB_DB_NAME must be provided if workflow.enable_cidb=True, workflow [{workflow.name}]"
assert (
Settings.CI_DB_TABLE_NAME
), f"Settings.CI_DB_TABLE_NAME must be provided if workflow.enable_cidb=True, workflow [{workflow.name}]"
@classmethod
def validate_file_paths_in_run_command(cls, workflow: Workflow.Config) -> None:
if not Settings.VALIDATE_FILE_PATHS:
return
with ContextManager.cd():
for job in workflow.jobs:
run_command = job.command
command_parts = run_command.split(" ")
for part in command_parts:
if ">" in part:
return
if "/" in part:
assert (
Path(part).is_file() or Path(part).is_dir()
), f"Apparently run command [{run_command}] for job [{job}] has invalid path [{part}]. Setting to disable check: VALIDATE_FILE_PATHS"
@classmethod
def validate_file_paths_in_digest_configs(cls, workflow: Workflow.Config) -> None:
if not Settings.VALIDATE_FILE_PATHS:
return
with ContextManager.cd():
for job in workflow.jobs:
if not job.digest_config:
continue
for include_path in chain(
job.digest_config.include_paths, job.digest_config.exclude_paths
):
if "*" in include_path:
assert glob.glob(
include_path, recursive=True
), f"Apparently file glob [{include_path}] in job [{job.name}] digest_config [{job.digest_config}] invalid, workflow [{workflow.name}]. Setting to disable check: VALIDATE_FILE_PATHS"
else:
assert (
Path(include_path).is_file() or Path(include_path).is_dir()
), f"Apparently file path [{include_path}] in job [{job.name}] digest_config [{job.digest_config}] invalid, workflow [{workflow.name}]. Setting to disable check: VALIDATE_FILE_PATHS"
@classmethod
def validate_requirements_txt_files(cls, workflow: Workflow.Config) -> None:
with ContextManager.cd():
for job in workflow.jobs:
if job.job_requirements:
if job.job_requirements.python_requirements_txt:
path = Path(job.job_requirements.python_requirements_txt)
message = f"File with py requirement [{path}] does not exist"
if job.name in (
Settings.DOCKER_BUILD_JOB_NAME,
Settings.CI_CONFIG_JOB_NAME,
Settings.FINISH_WORKFLOW_JOB_NAME,
):
message += '\n If all requirements already installed on your runners - add setting INSTALL_PYTHON_REQS_FOR_NATIVE_JOBS""'
message += "\n If requirements needs to be installed - add requirements file (Settings.INSTALL_PYTHON_REQS_FOR_NATIVE_JOBS):"
message += "\n echo jwt==1.3.1 > ./ci/requirements.txt"
message += (
"\n echo requests==2.32.3 >> ./ci/requirements.txt"
)
message += "\n echo https://clickhouse-builds.s3.amazonaws.com/packages/praktika-0.1-py3-none-any.whl >> ./ci/requirements.txt"
cls.evaluate_check(
path.is_file(), message, job.name, workflow.name
)
@classmethod
def validate_dockers(cls, workflow: Workflow.Config):
names = []
for docker in workflow.dockers:
cls.evaluate_check(
docker.name not in names,
f"Non uniq docker name [{docker.name}]",
workflow_name=workflow.name,
)
names.append(docker.name)
for docker in workflow.dockers:
for docker_dep in docker.depends_on:
cls.evaluate_check(
docker_dep in names,
f"Docker [{docker.name}] has invalid dependency [{docker_dep}]",
workflow_name=workflow.name,
)
@classmethod
def evaluate_check(cls, check_ok, message, workflow_name, job_name=""):
message = message.split("\n")
messages = [message] if not isinstance(message, list) else message
if check_ok:
return
else:
print(
f"ERROR: Config validation failed: workflow [{workflow_name}], job [{job_name}]:"
)
for message in messages:
print(" || " + message)
sys.exit(1)

1
ci/praktika/version.py Normal file
View File

@ -0,0 +1 @@
VERSION = 1

68
ci/praktika/workflow.py Normal file
View File

@ -0,0 +1,68 @@
from dataclasses import dataclass, field
from typing import List, Optional
from praktika import Artifact, Job
from praktika.docker import Docker
from praktika.secret import Secret
from praktika.utils import Utils
class Workflow:
class Event:
PULL_REQUEST = "pull_request"
PUSH = "push"
@dataclass
class Config:
"""
branches - List of branch names or patterns, for push trigger only
base_branches - List of base branches (target branch), for pull_request trigger only
"""
name: str
event: str
jobs: List[Job.Config]
branches: List[str] = field(default_factory=list)
base_branches: List[str] = field(default_factory=list)
artifacts: List[Artifact.Config] = field(default_factory=list)
dockers: List[Docker.Config] = field(default_factory=list)
secrets: List[Secret.Config] = field(default_factory=list)
enable_cache: bool = False
enable_report: bool = False
enable_merge_ready_status: bool = False
enable_cidb: bool = False
def is_event_pull_request(self):
return self.event == Workflow.Event.PULL_REQUEST
def is_event_push(self):
return self.event == Workflow.Event.PUSH
def get_job(self, name):
job = self.find_job(name)
if not job:
Utils.raise_with_error(
f"Failed to find job [{name}], workflow [{self.name}]"
)
return job
def find_job(self, name, lazy=False):
name = str(name)
for job in self.jobs:
if lazy:
if name.lower() in job.name.lower():
return job
else:
if job.name == name:
return job
return None
def get_secret(self, name) -> Optional[Secret.Config]:
name = str(name)
names = []
for secret in self.secrets:
if secret.name == name:
return secret
names.append(secret.name)
print(f"ERROR: Failed to find secret [{name}], workflow secrets [{names}]")
raise

View File

@ -0,0 +1,350 @@
import dataclasses
from typing import List
from praktika import Artifact, Job, Workflow
from praktika.mangle import _get_workflows
from praktika.parser import WorkflowConfigParser
from praktika.runtime import RunConfig
from praktika.settings import Settings
from praktika.utils import ContextManager, Shell, Utils
class YamlGenerator:
class Templates:
TEMPLATE_PULL_REQUEST_0 = """\
# generated by praktika
name: {NAME}
on:
{EVENT}:
branches: [{BRANCHES}]
# Cancel the previous wf run in PRs.
concurrency:
group: ${{{{{{{{ github.workflow }}}}}}}}-${{{{{{{{ github.ref }}}}}}}}
cancel-in-progress: true
env:
# Force the stdout and stderr streams to be unbuffered
PYTHONUNBUFFERED: 1
GH_TOKEN: ${{{{{{{{ github.token }}}}}}}}
# Allow updating GH commit statuses and PR comments to post an actual job reports link
permissions: write-all
jobs:
{JOBS}\
"""
TEMPLATE_CALLABLE_WORKFLOW = """\
# generated by praktika
name: {NAME}
on:
workflow_call:
inputs:
config:
type: string
required: false
default: ''
secrets:
{SECRETS}
env:
PYTHONUNBUFFERED: 1
jobs:
{JOBS}\
"""
TEMPLATE_SECRET_CONFIG = """\
{SECRET_NAME}:
required: true
"""
TEMPLATE_MATRIX = """
strategy:
fail-fast: false
matrix:
params: {PARAMS_LIST}\
"""
TEMPLATE_JOB_0 = """
{JOB_NAME_NORMALIZED}:
runs-on: [{RUNS_ON}]
needs: [{NEEDS}]{IF_EXPRESSION}
name: "{JOB_NAME_GH}"
outputs:
data: ${{{{ steps.run.outputs.DATA }}}}
steps:
- name: Checkout code
uses: actions/checkout@v4
{JOB_ADDONS}
- name: Prepare env script
run: |
cat > {ENV_SETUP_SCRIPT} << 'ENV_SETUP_SCRIPT_EOF'
export PYTHONPATH=./ci:.
{SETUP_ENVS}
cat > {WORKFLOW_CONFIG_FILE} << 'EOF'
${{{{ needs.{WORKFLOW_CONFIG_JOB_NAME}.outputs.data }}}}
EOF
cat > {WORKFLOW_STATUS_FILE} << 'EOF'
${{{{ toJson(needs) }}}}
EOF
ENV_SETUP_SCRIPT_EOF
rm -rf {INPUT_DIR} {OUTPUT_DIR} {TEMP_DIR}
mkdir -p {TEMP_DIR} {INPUT_DIR} {OUTPUT_DIR}
{DOWNLOADS_GITHUB}
- name: Run
id: run
run: |
. /tmp/praktika_setup_env.sh
set -o pipefail
{PYTHON} -m praktika run --job '''{JOB_NAME}''' --workflow "{WORKFLOW_NAME}" --ci |& tee {RUN_LOG}
{UPLOADS_GITHUB}\
"""
TEMPLATE_SETUP_ENV_SECRETS = """\
export {SECRET_NAME}=$(cat<<'EOF'
${{{{ secrets.{SECRET_NAME} }}}}
EOF
)\
"""
TEMPLATE_PY_INSTALL = """
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: {PYTHON_VERSION}
"""
TEMPLATE_PY_WITH_REQUIREMENTS = """
- name: Install dependencies
run: |
sudo apt-get update && sudo apt install -y python3-pip
# TODO: --break-system-packages? otherwise ubuntu's apt/apt-get complains
{PYTHON} -m pip install --upgrade pip --break-system-packages
{PIP} install -r {REQUIREMENT_PATH} --break-system-packages
"""
TEMPLATE_GH_UPLOAD = """
- name: Upload artifact {NAME}
uses: actions/upload-artifact@v4
with:
name: {NAME}
path: {PATH}
"""
TEMPLATE_GH_DOWNLOAD = """
- name: Download artifact {NAME}
uses: actions/download-artifact@v4
with:
name: {NAME}
path: {PATH}
"""
TEMPLATE_IF_EXPRESSION = """
if: ${{{{ !failure() && !cancelled() && !contains(fromJson(needs.{WORKFLOW_CONFIG_JOB_NAME}.outputs.data).cache_success_base64, '{JOB_NAME_BASE64}') }}}}\
"""
TEMPLATE_IF_EXPRESSION_SKIPPED_OR_SUCCESS = """
if: ${{ !failure() && !cancelled() }}\
"""
TEMPLATE_IF_EXPRESSION_NOT_CANCELLED = """
if: ${{ !cancelled() }}\
"""
def __init__(self):
self.py_workflows = [] # type: List[Workflow.Config]
@classmethod
def _get_workflow_file_name(cls, workflow_name):
return f"{Settings.WORKFLOW_PATH_PREFIX}/{Utils.normalize_string(workflow_name)}.yaml"
def generate(self, workflow_file="", workflow_config=None):
print("---Start generating yaml pipelines---")
if workflow_config:
self.py_workflows = [workflow_config]
else:
self.py_workflows = _get_workflows(file=workflow_file)
assert self.py_workflows
for workflow_config in self.py_workflows:
print(f"Generate workflow [{workflow_config.name}]")
parser = WorkflowConfigParser(workflow_config).parse()
if (
workflow_config.is_event_pull_request()
or workflow_config.is_event_push()
):
yaml_workflow_str = PullRequestPushYamlGen(parser).generate()
else:
assert (
False
), f"Workflow event not yet supported [{workflow_config.event}]"
with ContextManager.cd():
with open(self._get_workflow_file_name(workflow_config.name), "w") as f:
f.write(yaml_workflow_str)
with ContextManager.cd():
Shell.check("git add ./.github/workflows/*.yaml")
class PullRequestPushYamlGen:
def __init__(self, parser: WorkflowConfigParser):
self.workflow_config = parser.workflow_yaml_config
self.parser = parser
def generate(self):
job_items = []
for i, job in enumerate(self.workflow_config.jobs):
job_name_normalized = Utils.normalize_string(job.name)
needs = ", ".join(map(Utils.normalize_string, job.needs))
job_name = job.name
job_addons = []
for addon in job.addons:
if addon.install_python:
job_addons.append(
YamlGenerator.Templates.TEMPLATE_PY_INSTALL.format(
PYTHON_VERSION=Settings.PYTHON_VERSION
)
)
if addon.requirements_txt_path:
job_addons.append(
YamlGenerator.Templates.TEMPLATE_PY_WITH_REQUIREMENTS.format(
PYTHON=Settings.PYTHON_INTERPRETER,
PIP=Settings.PYTHON_PACKET_MANAGER,
PYTHON_VERSION=Settings.PYTHON_VERSION,
REQUIREMENT_PATH=addon.requirements_txt_path,
)
)
uploads_github = []
for artifact in job.artifacts_gh_provides:
uploads_github.append(
YamlGenerator.Templates.TEMPLATE_GH_UPLOAD.format(
NAME=artifact.name, PATH=artifact.path
)
)
downloads_github = []
for artifact in job.artifacts_gh_requires:
downloads_github.append(
YamlGenerator.Templates.TEMPLATE_GH_DOWNLOAD.format(
NAME=artifact.name, PATH=Settings.INPUT_DIR
)
)
config_job_name_normalized = Utils.normalize_string(
Settings.CI_CONFIG_JOB_NAME
)
if_expression = ""
if (
self.workflow_config.enable_cache
and job_name_normalized != config_job_name_normalized
):
if_expression = YamlGenerator.Templates.TEMPLATE_IF_EXPRESSION.format(
WORKFLOW_CONFIG_JOB_NAME=config_job_name_normalized,
JOB_NAME_BASE64=Utils.to_base64(job_name),
)
if job.run_unless_cancelled:
if_expression = (
YamlGenerator.Templates.TEMPLATE_IF_EXPRESSION_NOT_CANCELLED
)
secrets_envs = []
for secret in self.workflow_config.secret_names_gh:
secrets_envs.append(
YamlGenerator.Templates.TEMPLATE_SETUP_ENV_SECRETS.format(
SECRET_NAME=secret
)
)
job_item = YamlGenerator.Templates.TEMPLATE_JOB_0.format(
JOB_NAME_NORMALIZED=job_name_normalized,
WORKFLOW_CONFIG_JOB_NAME=config_job_name_normalized,
IF_EXPRESSION=if_expression,
RUNS_ON=", ".join(job.runs_on),
NEEDS=needs,
JOB_NAME_GH=job_name.replace('"', '\\"'),
JOB_NAME=job_name.replace(
"'", "'\\''"
), # ' must be escaped so that yaml commands are properly parsed
WORKFLOW_NAME=self.workflow_config.name,
ENV_SETUP_SCRIPT=Settings.ENV_SETUP_SCRIPT,
SETUP_ENVS="\n".join(secrets_envs),
WORKFLOW_CONFIG_FILE=RunConfig.file_name_static(
self.workflow_config.name
),
JOB_ADDONS="".join(job_addons),
DOWNLOADS_GITHUB="\n".join(downloads_github),
UPLOADS_GITHUB="\n".join(uploads_github),
RUN_LOG=Settings.RUN_LOG,
PYTHON=Settings.PYTHON_INTERPRETER,
WORKFLOW_STATUS_FILE=Settings.WORKFLOW_STATUS_FILE,
TEMP_DIR=Settings.TEMP_DIR,
INPUT_DIR=Settings.INPUT_DIR,
OUTPUT_DIR=Settings.OUTPUT_DIR,
)
job_items.append(job_item)
base_template = YamlGenerator.Templates.TEMPLATE_PULL_REQUEST_0
template_1 = base_template.strip().format(
NAME=self.workflow_config.name,
BRANCHES=", ".join(
[f"'{branch}'" for branch in self.workflow_config.branches]
),
EVENT=self.workflow_config.event,
JOBS="{}" * len(job_items),
)
res = template_1.format(*job_items)
return res
@dataclasses.dataclass
class AuxConfig:
# defines aux step to install dependencies
addon: Job.Requirements
# defines aux step(s) to upload GH artifacts
uploads_gh: List[Artifact.Config]
# defines aux step(s) to download GH artifacts
downloads_gh: List[Artifact.Config]
def get_aux_workflow_name(self):
suffix = ""
if self.addon.python_requirements_txt:
suffix += "_py"
for _ in self.uploads_gh:
suffix += "_uplgh"
for _ in self.downloads_gh:
suffix += "_dnlgh"
return f"{Settings.WORKFLOW_PATH_PREFIX}/aux_job{suffix}.yaml"
def get_aux_workflow_input(self):
res = ""
if self.addon.python_requirements_txt:
res += f" requirements_txt: {self.addon.python_requirements_txt}"
return res
if __name__ == "__main__":
WFS = [
Workflow.Config(
name="PR",
event=Workflow.Event.PULL_REQUEST,
jobs=[
Job.Config(
name="Hello World",
runs_on=["foo"],
command="bar",
job_requirements=Job.Requirements(
python_requirements_txt="./requirement.txt"
),
)
],
enable_cache=True,
)
]
YamlGenerator().generate(workflow_config=WFS)

View File

@ -7,6 +7,7 @@ S3_BUCKET_HTTP_ENDPOINT = "clickhouse-builds.s3.amazonaws.com"
class RunnerLabels:
CI_SERVICES = "ci_services"
CI_SERVICES_EBS = "ci_services_ebs"
BUILDER = "builder"
BASE_BRANCH = "master"
@ -29,155 +30,134 @@ SECRETS = [
DOCKERS = [
# Docker.Config(
# name="clickhouse/binary-builder",
# path="./docker/packager/binary-builder",
# arm64=True,
# amd64=True,
# path="./ci/docker/packager/binary-builder",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
# Docker.Config(
# name="clickhouse/cctools",
# path="./docker/packager/cctools",
# arm64=True,
# amd64=True,
# path="./ci/docker/packager/cctools",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
# Docker.Config(
# name="clickhouse/test-old-centos",
# path="./docker/test/compatibility/centos",
# arm64=True,
# amd64=True,
# path="./ci/docker/test/compatibility/centos",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
# Docker.Config(
# name="clickhouse/test-old-ubuntu",
# path="./docker/test/compatibility/ubuntu",
# arm64=True,
# amd64=True,
# path="./ci/docker/test/compatibility/ubuntu",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
# Docker.Config(
# name="clickhouse/test-util",
# path="./docker/test/util",
# arm64=True,
# amd64=True,
# path="./ci/docker/test/util",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
# Docker.Config(
# name="clickhouse/integration-test",
# path="./docker/test/integration/base",
# arm64=True,
# amd64=True,
# path="./ci/docker/test/integration/base",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/fuzzer",
# path="./docker/test/fuzzer",
# arm64=True,
# amd64=True,
# path="./ci/docker/test/fuzzer",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/performance-comparison",
# path="./docker/test/performance-comparison",
# arm64=True,
# amd64=True,
# path="./ci/docker/test/performance-comparison",
# platforms=Docker.Platforms.arm_amd,
# depends_on=[],
# ),
# Docker.Config(
# name="clickhouse/fasttest",
# path="./docker/test/fasttest",
# arm64=True,
# amd64=True,
# depends_on=["clickhouse/test-util"],
# ),
Docker.Config(
name="clickhouse/fasttest",
path="./ci/docker/fasttest",
platforms=Docker.Platforms.arm_amd,
depends_on=[],
),
# Docker.Config(
# name="clickhouse/test-base",
# path="./docker/test/base",
# arm64=True,
# amd64=True,
# path="./ci/docker/test/base",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-util"],
# ),
# Docker.Config(
# name="clickhouse/clickbench",
# path="./docker/test/clickbench",
# arm64=True,
# amd64=True,
# path="./ci/docker/test/clickbench",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/keeper-jepsen-test",
# path="./docker/test/keeper-jepsen",
# arm64=True,
# amd64=True,
# path="./ci/docker/test/keeper-jepsen",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/server-jepsen-test",
# path="./docker/test/server-jepsen",
# arm64=True,
# amd64=True,
# path="./ci/docker/test/server-jepsen",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/sqllogic-test",
# path="./docker/test/sqllogic",
# arm64=True,
# amd64=True,
# path="./ci/docker/test/sqllogic",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/sqltest",
# path="./docker/test/sqltest",
# arm64=True,
# amd64=True,
# path="./ci/docker/test/sqltest",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/stateless-test",
# path="./docker/test/stateless",
# arm64=True,
# amd64=True,
# path="./ci/docker/test/stateless",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/stateful-test",
# path="./docker/test/stateful",
# arm64=True,
# amd64=True,
# path="./ci/docker/test/stateful",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/stateless-test"],
# ),
# Docker.Config(
# name="clickhouse/stress-test",
# path="./docker/test/stress",
# arm64=True,
# amd64=True,
# path="./ci/docker/test/stress",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/stateful-test"],
# ),
# Docker.Config(
# name="clickhouse/unit-test",
# path="./docker/test/unit",
# arm64=True,
# amd64=True,
# path="./ci/docker/test/unit",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
# Docker.Config(
# name="clickhouse/integration-tests-runner",
# path="./docker/test/integration/runner",
# arm64=True,
# amd64=True,
# path="./ci/docker/test/integration/runner",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
Docker.Config(
name="clickhouse/style-test",
path="./ci_v2/docker/style-test",
path="./ci/docker/style-test",
platforms=Docker.Platforms.arm_amd,
depends_on=[],
),
# Docker.Config(
# name="clickhouse/docs-builder",
# path="./docker/docs/builder",
# arm64=True,
# amd64=True,
# path="./ci/docker/docs/builder",
# platforms=Docker.Platforms.arm_amd,
# depends_on=["clickhouse/test-base"],
# ),
]
@ -249,3 +229,5 @@ DOCKERS = [
class JobNames:
STYLE_CHECK = "Style Check"
FAST_TEST = "Fast test"
BUILD_AMD_DEBUG = "Build amd64 debug"

View File

@ -1,4 +1,4 @@
from ci_v2.settings.definitions import (
from ci.settings.definitions import (
S3_BUCKET_HTTP_ENDPOINT,
S3_BUCKET_NAME,
RunnerLabels,

View File

@ -0,0 +1,94 @@
from typing import List
from praktika import Artifact, Job, Workflow
from praktika.settings import Settings
from ci.settings.definitions import (
BASE_BRANCH,
DOCKERS,
SECRETS,
JobNames,
RunnerLabels,
)
class ArtifactNames:
ch_debug_binary = "clickhouse_debug_binary"
style_check_job = Job.Config(
name=JobNames.STYLE_CHECK,
runs_on=[RunnerLabels.CI_SERVICES],
command="python3 ./ci/jobs/check_style.py",
run_in_docker="clickhouse/style-test",
)
fast_test_job = Job.Config(
name=JobNames.FAST_TEST,
runs_on=[RunnerLabels.BUILDER],
command="python3 ./ci/jobs/fast_test.py",
run_in_docker="clickhouse/fasttest",
digest_config=Job.CacheDigestConfig(
include_paths=[
"./ci/jobs/fast_test.py",
"./tests/queries/0_stateless/",
"./src",
],
),
)
job_build_amd_debug = Job.Config(
name=JobNames.BUILD_AMD_DEBUG,
runs_on=[RunnerLabels.BUILDER],
command="python3 ./ci/jobs/build_clickhouse.py amd_debug",
run_in_docker="clickhouse/fasttest",
digest_config=Job.CacheDigestConfig(
include_paths=[
"./src",
"./contrib/",
"./CMakeLists.txt",
"./PreLoad.cmake",
"./cmake",
"./base",
"./programs",
"./docker/packager/packager",
"./rust",
"./tests/ci/version_helper.py",
],
),
provides=[ArtifactNames.ch_debug_binary],
)
workflow = Workflow.Config(
name="PR",
event=Workflow.Event.PULL_REQUEST,
base_branches=[BASE_BRANCH],
jobs=[
style_check_job,
fast_test_job,
job_build_amd_debug,
],
artifacts=[
Artifact.Config(
name=ArtifactNames.ch_debug_binary,
type=Artifact.Type.S3,
path=f"{Settings.TEMP_DIR}/build/programs/clickhouse",
)
],
dockers=DOCKERS,
secrets=SECRETS,
enable_cache=True,
enable_report=True,
enable_merge_ready_status=True,
)
WORKFLOWS = [
workflow,
] # type: List[Workflow.Config]
if __name__ == "__main__":
# local job test inside praktika environment
from praktika.runner import Runner
Runner().run(workflow, fast_test_job, docker="fasttest", dummy_env=True)

View File

@ -1,4 +0,0 @@
requests==2.32.3
yamllint==1.26.3
codespell==2.2.1
https://clickhouse-builds.s3.amazonaws.com/packages/praktika-0.1-py3-none-any.whl

View File

@ -1,44 +0,0 @@
from typing import List
from ci_v2.settings.definitions import (
BASE_BRANCH,
DOCKERS,
SECRETS,
JobNames,
RunnerLabels,
)
from praktika import Job, Workflow
style_check_job = Job.Config(
name=JobNames.STYLE_CHECK,
runs_on=[RunnerLabels.CI_SERVICES],
command="python3 ./ci_v2/jobs/check_style.py",
run_in_docker="clickhouse/style-test",
)
workflow = Workflow.Config(
name="PR",
event=Workflow.Event.PULL_REQUEST,
base_branches=[BASE_BRANCH],
jobs=[
style_check_job,
],
dockers=DOCKERS,
secrets=SECRETS,
enable_cache=True,
enable_report=True,
enable_merge_ready_status=True,
)
WORKFLOWS = [
workflow,
] # type: List[Workflow.Config]
if __name__ == "__main__":
# example: local job test inside praktika environment
from praktika.runner import Runner
Runner.generate_dummy_environment(workflow, style_check_job)
Runner().run(workflow, style_check_job)

View File

@ -2,11 +2,11 @@
# NOTE: VERSION_REVISION has nothing common with DBMS_TCP_PROTOCOL_VERSION,
# only DBMS_TCP_PROTOCOL_VERSION should be incremented on protocol changes.
SET(VERSION_REVISION 54491)
SET(VERSION_REVISION 54492)
SET(VERSION_MAJOR 24)
SET(VERSION_MINOR 10)
SET(VERSION_MINOR 11)
SET(VERSION_PATCH 1)
SET(VERSION_GITHASH b12a367741812f9e5fe754d19ebae600e2a2614c)
SET(VERSION_DESCRIBE v24.10.1.1-testing)
SET(VERSION_STRING 24.10.1.1)
SET(VERSION_GITHASH c82cf25b3e5864bcc153cbe45adb8c6527e1ec6e)
SET(VERSION_DESCRIBE v24.11.1.1-testing)
SET(VERSION_STRING 24.11.1.1)
# end of autochange

View File

@ -178,35 +178,13 @@ add_contrib (sqlite-cmake sqlite-amalgamation)
add_contrib (s2geometry-cmake s2geometry)
add_contrib (c-ares-cmake c-ares)
if (OS_LINUX AND ARCH_AMD64 AND ENABLE_SSE42)
option (ENABLE_QPL "Enable Intel® Query Processing Library (QPL)" ${ENABLE_LIBRARIES})
elseif(ENABLE_QPL)
message (${RECONFIGURE_MESSAGE_LEVEL} "QPL library is only supported on x86_64 with SSE 4.2 or higher")
endif()
if (ENABLE_QPL)
add_contrib (idxd-config-cmake idxd-config)
add_contrib (qpl-cmake qpl) # requires: idxd-config
else()
message(STATUS "Not using QPL")
endif ()
if (OS_LINUX AND ARCH_AMD64 AND NOT NO_SSE3_OR_HIGHER)
option (ENABLE_QATLIB "Enable Intel® QuickAssist Technology Library (QATlib)" ${ENABLE_LIBRARIES})
elseif(ENABLE_QATLIB)
message (${RECONFIGURE_MESSAGE_LEVEL} "QATLib is only supported on x86_64")
endif()
if (ENABLE_QATLIB)
option (ENABLE_QAT_USDM_DRIVER "A User Space DMA-able Memory (USDM) component which allocates/frees DMA-able memory" OFF)
option (ENABLE_QAT_OUT_OF_TREE_BUILD "Using out-of-tree driver, user needs to customize ICP_ROOT variable" OFF)
set(ICP_ROOT "" CACHE STRING "ICP_ROOT variable to define the path of out-of-tree driver package")
if (ENABLE_QAT_OUT_OF_TREE_BUILD)
if (ICP_ROOT STREQUAL "")
message(FATAL_ERROR "Please define the path of out-of-tree driver package with -DICP_ROOT=xxx or disable out-of-tree build with -DENABLE_QAT_OUT_OF_TREE_BUILD=OFF; \
If you want out-of-tree build but have no package available, please download and build ICP package from: https://www.intel.com/content/www/us/en/download/765501.html")
endif ()
else()
add_contrib (qatlib-cmake qatlib) # requires: isa-l
endif ()
add_contrib (QAT-ZSTD-Plugin-cmake QAT-ZSTD-Plugin)
else()
message(STATUS "Not using QATLib")

View File

@ -1,35 +1,5 @@
# Intel® QuickAssist Technology ZSTD Plugin (QAT ZSTD Plugin) is a plugin to Zstandard*(ZSTD*) for accelerating compression by QAT.
# ENABLE_QAT_OUT_OF_TREE_BUILD = 1 means kernel don't have native support, user will build and install driver from external package: https://www.intel.com/content/www/us/en/download/765501.html
# meanwhile, user need to set ICP_ROOT environment variable which point to the root directory of QAT driver source tree.
# ENABLE_QAT_OUT_OF_TREE_BUILD = 0 means kernel has built-in qat driver, QAT-ZSTD-PLUGIN just has dependency on qatlib.
if (ENABLE_QAT_OUT_OF_TREE_BUILD)
message(STATUS "Intel QATZSTD out-of-tree build, ICP_ROOT:${ICP_ROOT}")
set(QATZSTD_SRC_DIR "${ClickHouse_SOURCE_DIR}/contrib/QAT-ZSTD-Plugin/src")
set(QATZSTD_SRC "${QATZSTD_SRC_DIR}/qatseqprod.c")
set(ZSTD_LIBRARY_DIR "${ClickHouse_SOURCE_DIR}/contrib/zstd/lib")
set(QAT_INCLUDE_DIR "${ICP_ROOT}/quickassist/include")
set(QAT_DC_INCLUDE_DIR "${ICP_ROOT}/quickassist/include/dc")
set(QAT_AL_INCLUDE_DIR "${ICP_ROOT}/quickassist/lookaside/access_layer/include")
set(QAT_USDM_INCLUDE_DIR "${ICP_ROOT}/quickassist/utilities/libusdm_drv")
set(USDM_LIBRARY "${ICP_ROOT}/build/libusdm_drv_s.so")
set(QAT_S_LIBRARY "${ICP_ROOT}/build/libqat_s.so")
if (ENABLE_QAT_USDM_DRIVER)
add_definitions(-DENABLE_USDM_DRV)
endif()
add_library(_qatzstd_plugin ${QATZSTD_SRC})
target_link_libraries (_qatzstd_plugin PUBLIC ${USDM_LIBRARY} ${QAT_S_LIBRARY})
target_include_directories(_qatzstd_plugin
SYSTEM PUBLIC "${QATZSTD_SRC_DIR}"
PRIVATE ${QAT_INCLUDE_DIR}
${QAT_DC_INCLUDE_DIR}
${QAT_AL_INCLUDE_DIR}
${QAT_USDM_INCLUDE_DIR}
${ZSTD_LIBRARY_DIR})
target_compile_definitions(_qatzstd_plugin PRIVATE -DDEBUGLEVEL=0)
add_library (ch_contrib::qatzstd_plugin ALIAS _qatzstd_plugin)
else () # In-tree build
message(STATUS "Intel QATZSTD in-tree build")
set(QATZSTD_SRC_DIR "${ClickHouse_SOURCE_DIR}/contrib/QAT-ZSTD-Plugin/src")
set(QATZSTD_SRC "${QATZSTD_SRC_DIR}/qatseqprod.c")
@ -81,5 +51,3 @@ else () # In-tree build
target_compile_definitions(_qatzstd_plugin PRIVATE -DDEBUGLEVEL=0 PUBLIC -DINTREE)
target_include_directories(_qatzstd_plugin SYSTEM PUBLIC $<BUILD_INTERFACE:${QATZSTD_SRC_DIR}> $<INSTALL_INTERFACE:include>)
add_library (ch_contrib::qatzstd_plugin ALIAS _qatzstd_plugin)
endif ()

2
contrib/SimSIMD vendored

@ -1 +1 @@
Subproject commit ff51434d90c66f916e94ff05b24530b127aa4cff
Subproject commit fa60f1b8e3582c50978f0ae86c2ebb6c9af957f3

View File

@ -1,4 +1,8 @@
# See contrib/usearch-cmake/CMakeLists.txt, why only enabled on x86
if (ARCH_AMD64)
set(SIMSIMD_PROJECT_DIR "${ClickHouse_SOURCE_DIR}/contrib/SimSIMD")
add_library(_simsimd INTERFACE)
target_include_directories(_simsimd SYSTEM INTERFACE "${SIMSIMD_PROJECT_DIR}/include")
set(SIMSIMD_SRCS ${SIMSIMD_PROJECT_DIR}/c/lib.c)
add_library(_simsimd ${SIMSIMD_SRCS})
target_include_directories(_simsimd SYSTEM PUBLIC "${SIMSIMD_PROJECT_DIR}/include")
target_compile_definitions(_simsimd PUBLIC SIMSIMD_DYNAMIC_DISPATCH)
endif()

2
contrib/arrow vendored

@ -1 +1 @@
Subproject commit 5cfccd8ea65f33d4517e7409815d761c7650b45d
Subproject commit 6e2574f5013a005c050c9a7787d341aef09d0063

View File

@ -213,13 +213,19 @@ target_include_directories(_orc SYSTEM PRIVATE
set(LIBRARY_DIR "${ClickHouse_SOURCE_DIR}/contrib/arrow/cpp/src/arrow")
# arrow/cpp/src/arrow/CMakeLists.txt (ARROW_SRCS + ARROW_COMPUTE + ARROW_IPC)
# find . \( -iname \*.cc -o -iname \*.cpp -o -iname \*.c \) | sort | awk '{print "\"${LIBRARY_DIR}" substr($1,2) "\"" }' | grep -v 'test.cc' | grep -v 'json' | grep -v 'flight' \|
# grep -v 'csv' | grep -v 'acero' | grep -v 'dataset' | grep -v 'testing' | grep -v 'gpu' | grep -v 'engine' | grep -v 'filesystem' | grep -v 'benchmark.cc'
set(ARROW_SRCS
"${LIBRARY_DIR}/adapters/orc/adapter.cc"
"${LIBRARY_DIR}/adapters/orc/options.cc"
"${LIBRARY_DIR}/adapters/orc/util.cc"
"${LIBRARY_DIR}/array/array_base.cc"
"${LIBRARY_DIR}/array/array_binary.cc"
"${LIBRARY_DIR}/array/array_decimal.cc"
"${LIBRARY_DIR}/array/array_dict.cc"
"${LIBRARY_DIR}/array/array_nested.cc"
"${LIBRARY_DIR}/array/array_primitive.cc"
"${LIBRARY_DIR}/array/array_run_end.cc"
"${LIBRARY_DIR}/array/builder_adaptive.cc"
"${LIBRARY_DIR}/array/builder_base.cc"
"${LIBRARY_DIR}/array/builder_binary.cc"
@ -227,124 +233,26 @@ set(ARROW_SRCS
"${LIBRARY_DIR}/array/builder_dict.cc"
"${LIBRARY_DIR}/array/builder_nested.cc"
"${LIBRARY_DIR}/array/builder_primitive.cc"
"${LIBRARY_DIR}/array/builder_union.cc"
"${LIBRARY_DIR}/array/builder_run_end.cc"
"${LIBRARY_DIR}/array/array_run_end.cc"
"${LIBRARY_DIR}/array/builder_union.cc"
"${LIBRARY_DIR}/array/concatenate.cc"
"${LIBRARY_DIR}/array/data.cc"
"${LIBRARY_DIR}/array/diff.cc"
"${LIBRARY_DIR}/array/util.cc"
"${LIBRARY_DIR}/array/validate.cc"
"${LIBRARY_DIR}/builder.cc"
"${LIBRARY_DIR}/buffer.cc"
"${LIBRARY_DIR}/chunked_array.cc"
"${LIBRARY_DIR}/chunk_resolver.cc"
"${LIBRARY_DIR}/compare.cc"
"${LIBRARY_DIR}/config.cc"
"${LIBRARY_DIR}/datum.cc"
"${LIBRARY_DIR}/device.cc"
"${LIBRARY_DIR}/extension_type.cc"
"${LIBRARY_DIR}/memory_pool.cc"
"${LIBRARY_DIR}/pretty_print.cc"
"${LIBRARY_DIR}/record_batch.cc"
"${LIBRARY_DIR}/result.cc"
"${LIBRARY_DIR}/scalar.cc"
"${LIBRARY_DIR}/sparse_tensor.cc"
"${LIBRARY_DIR}/status.cc"
"${LIBRARY_DIR}/table.cc"
"${LIBRARY_DIR}/table_builder.cc"
"${LIBRARY_DIR}/tensor.cc"
"${LIBRARY_DIR}/tensor/coo_converter.cc"
"${LIBRARY_DIR}/tensor/csf_converter.cc"
"${LIBRARY_DIR}/tensor/csx_converter.cc"
"${LIBRARY_DIR}/type.cc"
"${LIBRARY_DIR}/visitor.cc"
"${LIBRARY_DIR}/builder.cc"
"${LIBRARY_DIR}/c/bridge.cc"
"${LIBRARY_DIR}/io/buffered.cc"
"${LIBRARY_DIR}/io/caching.cc"
"${LIBRARY_DIR}/io/compressed.cc"
"${LIBRARY_DIR}/io/file.cc"
"${LIBRARY_DIR}/io/hdfs.cc"
"${LIBRARY_DIR}/io/hdfs_internal.cc"
"${LIBRARY_DIR}/io/interfaces.cc"
"${LIBRARY_DIR}/io/memory.cc"
"${LIBRARY_DIR}/io/slow.cc"
"${LIBRARY_DIR}/io/stdio.cc"
"${LIBRARY_DIR}/io/transform.cc"
"${LIBRARY_DIR}/util/async_util.cc"
"${LIBRARY_DIR}/util/basic_decimal.cc"
"${LIBRARY_DIR}/util/bit_block_counter.cc"
"${LIBRARY_DIR}/util/bit_run_reader.cc"
"${LIBRARY_DIR}/util/bit_util.cc"
"${LIBRARY_DIR}/util/bitmap.cc"
"${LIBRARY_DIR}/util/bitmap_builders.cc"
"${LIBRARY_DIR}/util/bitmap_ops.cc"
"${LIBRARY_DIR}/util/bpacking.cc"
"${LIBRARY_DIR}/util/cancel.cc"
"${LIBRARY_DIR}/util/compression.cc"
"${LIBRARY_DIR}/util/counting_semaphore.cc"
"${LIBRARY_DIR}/util/cpu_info.cc"
"${LIBRARY_DIR}/util/decimal.cc"
"${LIBRARY_DIR}/util/delimiting.cc"
"${LIBRARY_DIR}/util/formatting.cc"
"${LIBRARY_DIR}/util/future.cc"
"${LIBRARY_DIR}/util/int_util.cc"
"${LIBRARY_DIR}/util/io_util.cc"
"${LIBRARY_DIR}/util/logging.cc"
"${LIBRARY_DIR}/util/key_value_metadata.cc"
"${LIBRARY_DIR}/util/memory.cc"
"${LIBRARY_DIR}/util/mutex.cc"
"${LIBRARY_DIR}/util/string.cc"
"${LIBRARY_DIR}/util/string_builder.cc"
"${LIBRARY_DIR}/util/task_group.cc"
"${LIBRARY_DIR}/util/tdigest.cc"
"${LIBRARY_DIR}/util/thread_pool.cc"
"${LIBRARY_DIR}/util/time.cc"
"${LIBRARY_DIR}/util/trie.cc"
"${LIBRARY_DIR}/util/unreachable.cc"
"${LIBRARY_DIR}/util/uri.cc"
"${LIBRARY_DIR}/util/utf8.cc"
"${LIBRARY_DIR}/util/value_parsing.cc"
"${LIBRARY_DIR}/util/byte_size.cc"
"${LIBRARY_DIR}/util/debug.cc"
"${LIBRARY_DIR}/util/tracing.cc"
"${LIBRARY_DIR}/util/atfork_internal.cc"
"${LIBRARY_DIR}/util/crc32.cc"
"${LIBRARY_DIR}/util/hashing.cc"
"${LIBRARY_DIR}/util/ree_util.cc"
"${LIBRARY_DIR}/util/union_util.cc"
"${LIBRARY_DIR}/vendored/base64.cpp"
"${LIBRARY_DIR}/vendored/datetime/tz.cpp"
"${LIBRARY_DIR}/vendored/musl/strptime.c"
"${LIBRARY_DIR}/vendored/uriparser/UriCommon.c"
"${LIBRARY_DIR}/vendored/uriparser/UriCompare.c"
"${LIBRARY_DIR}/vendored/uriparser/UriEscape.c"
"${LIBRARY_DIR}/vendored/uriparser/UriFile.c"
"${LIBRARY_DIR}/vendored/uriparser/UriIp4Base.c"
"${LIBRARY_DIR}/vendored/uriparser/UriIp4.c"
"${LIBRARY_DIR}/vendored/uriparser/UriMemory.c"
"${LIBRARY_DIR}/vendored/uriparser/UriNormalizeBase.c"
"${LIBRARY_DIR}/vendored/uriparser/UriNormalize.c"
"${LIBRARY_DIR}/vendored/uriparser/UriParseBase.c"
"${LIBRARY_DIR}/vendored/uriparser/UriParse.c"
"${LIBRARY_DIR}/vendored/uriparser/UriQuery.c"
"${LIBRARY_DIR}/vendored/uriparser/UriRecompose.c"
"${LIBRARY_DIR}/vendored/uriparser/UriResolve.c"
"${LIBRARY_DIR}/vendored/uriparser/UriShorten.c"
"${LIBRARY_DIR}/vendored/double-conversion/bignum.cc"
"${LIBRARY_DIR}/vendored/double-conversion/bignum-dtoa.cc"
"${LIBRARY_DIR}/vendored/double-conversion/cached-powers.cc"
"${LIBRARY_DIR}/vendored/double-conversion/double-to-string.cc"
"${LIBRARY_DIR}/vendored/double-conversion/fast-dtoa.cc"
"${LIBRARY_DIR}/vendored/double-conversion/fixed-dtoa.cc"
"${LIBRARY_DIR}/vendored/double-conversion/string-to-double.cc"
"${LIBRARY_DIR}/vendored/double-conversion/strtod.cc"
"${LIBRARY_DIR}/c/dlpack.cc"
"${LIBRARY_DIR}/chunk_resolver.cc"
"${LIBRARY_DIR}/chunked_array.cc"
"${LIBRARY_DIR}/compare.cc"
"${LIBRARY_DIR}/compute/api_aggregate.cc"
"${LIBRARY_DIR}/compute/api_scalar.cc"
"${LIBRARY_DIR}/compute/api_vector.cc"
"${LIBRARY_DIR}/compute/cast.cc"
"${LIBRARY_DIR}/compute/exec.cc"
"${LIBRARY_DIR}/compute/expression.cc"
"${LIBRARY_DIR}/compute/function.cc"
"${LIBRARY_DIR}/compute/function_internal.cc"
"${LIBRARY_DIR}/compute/kernel.cc"
@ -355,6 +263,7 @@ set(ARROW_SRCS
"${LIBRARY_DIR}/compute/kernels/aggregate_var_std.cc"
"${LIBRARY_DIR}/compute/kernels/codegen_internal.cc"
"${LIBRARY_DIR}/compute/kernels/hash_aggregate.cc"
"${LIBRARY_DIR}/compute/kernels/ree_util_internal.cc"
"${LIBRARY_DIR}/compute/kernels/row_encoder.cc"
"${LIBRARY_DIR}/compute/kernels/scalar_arithmetic.cc"
"${LIBRARY_DIR}/compute/kernels/scalar_boolean.cc"
@ -382,30 +291,139 @@ set(ARROW_SRCS
"${LIBRARY_DIR}/compute/kernels/vector_cumulative_ops.cc"
"${LIBRARY_DIR}/compute/kernels/vector_hash.cc"
"${LIBRARY_DIR}/compute/kernels/vector_nested.cc"
"${LIBRARY_DIR}/compute/kernels/vector_pairwise.cc"
"${LIBRARY_DIR}/compute/kernels/vector_rank.cc"
"${LIBRARY_DIR}/compute/kernels/vector_replace.cc"
"${LIBRARY_DIR}/compute/kernels/vector_run_end_encode.cc"
"${LIBRARY_DIR}/compute/kernels/vector_select_k.cc"
"${LIBRARY_DIR}/compute/kernels/vector_selection.cc"
"${LIBRARY_DIR}/compute/kernels/vector_sort.cc"
"${LIBRARY_DIR}/compute/kernels/vector_selection_internal.cc"
"${LIBRARY_DIR}/compute/kernels/vector_selection_filter_internal.cc"
"${LIBRARY_DIR}/compute/kernels/vector_selection_internal.cc"
"${LIBRARY_DIR}/compute/kernels/vector_selection_take_internal.cc"
"${LIBRARY_DIR}/compute/light_array.cc"
"${LIBRARY_DIR}/compute/registry.cc"
"${LIBRARY_DIR}/compute/expression.cc"
"${LIBRARY_DIR}/compute/kernels/vector_sort.cc"
"${LIBRARY_DIR}/compute/key_hash_internal.cc"
"${LIBRARY_DIR}/compute/key_map_internal.cc"
"${LIBRARY_DIR}/compute/light_array_internal.cc"
"${LIBRARY_DIR}/compute/ordering.cc"
"${LIBRARY_DIR}/compute/registry.cc"
"${LIBRARY_DIR}/compute/row/compare_internal.cc"
"${LIBRARY_DIR}/compute/row/encode_internal.cc"
"${LIBRARY_DIR}/compute/row/grouper.cc"
"${LIBRARY_DIR}/compute/row/row_internal.cc"
"${LIBRARY_DIR}/compute/util.cc"
"${LIBRARY_DIR}/config.cc"
"${LIBRARY_DIR}/datum.cc"
"${LIBRARY_DIR}/device.cc"
"${LIBRARY_DIR}/extension_type.cc"
"${LIBRARY_DIR}/integration/c_data_integration_internal.cc"
"${LIBRARY_DIR}/io/buffered.cc"
"${LIBRARY_DIR}/io/caching.cc"
"${LIBRARY_DIR}/io/compressed.cc"
"${LIBRARY_DIR}/io/file.cc"
"${LIBRARY_DIR}/io/hdfs.cc"
"${LIBRARY_DIR}/io/hdfs_internal.cc"
"${LIBRARY_DIR}/io/interfaces.cc"
"${LIBRARY_DIR}/io/memory.cc"
"${LIBRARY_DIR}/io/slow.cc"
"${LIBRARY_DIR}/io/stdio.cc"
"${LIBRARY_DIR}/io/transform.cc"
"${LIBRARY_DIR}/ipc/dictionary.cc"
"${LIBRARY_DIR}/ipc/feather.cc"
"${LIBRARY_DIR}/ipc/file_to_stream.cc"
"${LIBRARY_DIR}/ipc/message.cc"
"${LIBRARY_DIR}/ipc/metadata_internal.cc"
"${LIBRARY_DIR}/ipc/options.cc"
"${LIBRARY_DIR}/ipc/reader.cc"
"${LIBRARY_DIR}/ipc/stream_to_file.cc"
"${LIBRARY_DIR}/ipc/writer.cc"
"${LIBRARY_DIR}/memory_pool.cc"
"${LIBRARY_DIR}/pretty_print.cc"
"${LIBRARY_DIR}/record_batch.cc"
"${LIBRARY_DIR}/result.cc"
"${LIBRARY_DIR}/scalar.cc"
"${LIBRARY_DIR}/sparse_tensor.cc"
"${LIBRARY_DIR}/status.cc"
"${LIBRARY_DIR}/table.cc"
"${LIBRARY_DIR}/table_builder.cc"
"${LIBRARY_DIR}/tensor.cc"
"${LIBRARY_DIR}/tensor/coo_converter.cc"
"${LIBRARY_DIR}/tensor/csf_converter.cc"
"${LIBRARY_DIR}/tensor/csx_converter.cc"
"${LIBRARY_DIR}/type.cc"
"${LIBRARY_DIR}/type_traits.cc"
"${LIBRARY_DIR}/util/align_util.cc"
"${LIBRARY_DIR}/util/async_util.cc"
"${LIBRARY_DIR}/util/atfork_internal.cc"
"${LIBRARY_DIR}/util/basic_decimal.cc"
"${LIBRARY_DIR}/util/bit_block_counter.cc"
"${LIBRARY_DIR}/util/bit_run_reader.cc"
"${LIBRARY_DIR}/util/bit_util.cc"
"${LIBRARY_DIR}/util/bitmap.cc"
"${LIBRARY_DIR}/util/bitmap_builders.cc"
"${LIBRARY_DIR}/util/bitmap_ops.cc"
"${LIBRARY_DIR}/util/bpacking.cc"
"${LIBRARY_DIR}/util/byte_size.cc"
"${LIBRARY_DIR}/util/cancel.cc"
"${LIBRARY_DIR}/util/compression.cc"
"${LIBRARY_DIR}/util/counting_semaphore.cc"
"${LIBRARY_DIR}/util/cpu_info.cc"
"${LIBRARY_DIR}/util/crc32.cc"
"${LIBRARY_DIR}/util/debug.cc"
"${LIBRARY_DIR}/util/decimal.cc"
"${LIBRARY_DIR}/util/delimiting.cc"
"${LIBRARY_DIR}/util/dict_util.cc"
"${LIBRARY_DIR}/util/float16.cc"
"${LIBRARY_DIR}/util/formatting.cc"
"${LIBRARY_DIR}/util/future.cc"
"${LIBRARY_DIR}/util/hashing.cc"
"${LIBRARY_DIR}/util/int_util.cc"
"${LIBRARY_DIR}/util/io_util.cc"
"${LIBRARY_DIR}/util/key_value_metadata.cc"
"${LIBRARY_DIR}/util/list_util.cc"
"${LIBRARY_DIR}/util/logging.cc"
"${LIBRARY_DIR}/util/memory.cc"
"${LIBRARY_DIR}/util/mutex.cc"
"${LIBRARY_DIR}/util/ree_util.cc"
"${LIBRARY_DIR}/util/string.cc"
"${LIBRARY_DIR}/util/string_builder.cc"
"${LIBRARY_DIR}/util/task_group.cc"
"${LIBRARY_DIR}/util/tdigest.cc"
"${LIBRARY_DIR}/util/thread_pool.cc"
"${LIBRARY_DIR}/util/time.cc"
"${LIBRARY_DIR}/util/tracing.cc"
"${LIBRARY_DIR}/util/trie.cc"
"${LIBRARY_DIR}/util/union_util.cc"
"${LIBRARY_DIR}/util/unreachable.cc"
"${LIBRARY_DIR}/util/uri.cc"
"${LIBRARY_DIR}/util/utf8.cc"
"${LIBRARY_DIR}/util/value_parsing.cc"
"${LIBRARY_DIR}/vendored/base64.cpp"
"${LIBRARY_DIR}/vendored/datetime/tz.cpp"
"${LIBRARY_DIR}/vendored/double-conversion/bignum-dtoa.cc"
"${LIBRARY_DIR}/vendored/double-conversion/bignum.cc"
"${LIBRARY_DIR}/vendored/double-conversion/cached-powers.cc"
"${LIBRARY_DIR}/vendored/double-conversion/double-to-string.cc"
"${LIBRARY_DIR}/vendored/double-conversion/fast-dtoa.cc"
"${LIBRARY_DIR}/vendored/double-conversion/fixed-dtoa.cc"
"${LIBRARY_DIR}/vendored/double-conversion/string-to-double.cc"
"${LIBRARY_DIR}/vendored/double-conversion/strtod.cc"
"${LIBRARY_DIR}/vendored/musl/strptime.c"
"${LIBRARY_DIR}/vendored/uriparser/UriCommon.c"
"${LIBRARY_DIR}/vendored/uriparser/UriCompare.c"
"${LIBRARY_DIR}/vendored/uriparser/UriEscape.c"
"${LIBRARY_DIR}/vendored/uriparser/UriFile.c"
"${LIBRARY_DIR}/vendored/uriparser/UriIp4.c"
"${LIBRARY_DIR}/vendored/uriparser/UriIp4Base.c"
"${LIBRARY_DIR}/vendored/uriparser/UriMemory.c"
"${LIBRARY_DIR}/vendored/uriparser/UriNormalize.c"
"${LIBRARY_DIR}/vendored/uriparser/UriNormalizeBase.c"
"${LIBRARY_DIR}/vendored/uriparser/UriParse.c"
"${LIBRARY_DIR}/vendored/uriparser/UriParseBase.c"
"${LIBRARY_DIR}/vendored/uriparser/UriQuery.c"
"${LIBRARY_DIR}/vendored/uriparser/UriRecompose.c"
"${LIBRARY_DIR}/vendored/uriparser/UriResolve.c"
"${LIBRARY_DIR}/vendored/uriparser/UriShorten.c"
"${LIBRARY_DIR}/visitor.cc"
"${ARROW_SRC_DIR}/arrow/adapters/orc/adapter.cc"
"${ARROW_SRC_DIR}/arrow/adapters/orc/util.cc"
@ -465,22 +483,38 @@ set(PARQUET_SRCS
"${LIBRARY_DIR}/arrow/schema.cc"
"${LIBRARY_DIR}/arrow/schema_internal.cc"
"${LIBRARY_DIR}/arrow/writer.cc"
"${LIBRARY_DIR}/benchmark_util.cc"
"${LIBRARY_DIR}/bloom_filter.cc"
"${LIBRARY_DIR}/bloom_filter_reader.cc"
"${LIBRARY_DIR}/column_reader.cc"
"${LIBRARY_DIR}/column_scanner.cc"
"${LIBRARY_DIR}/column_writer.cc"
"${LIBRARY_DIR}/encoding.cc"
"${LIBRARY_DIR}/encryption/crypto_factory.cc"
"${LIBRARY_DIR}/encryption/encryption.cc"
"${LIBRARY_DIR}/encryption/encryption_internal.cc"
"${LIBRARY_DIR}/encryption/encryption_internal_nossl.cc"
"${LIBRARY_DIR}/encryption/file_key_unwrapper.cc"
"${LIBRARY_DIR}/encryption/file_key_wrapper.cc"
"${LIBRARY_DIR}/encryption/file_system_key_material_store.cc"
"${LIBRARY_DIR}/encryption/internal_file_decryptor.cc"
"${LIBRARY_DIR}/encryption/internal_file_encryptor.cc"
"${LIBRARY_DIR}/encryption/key_material.cc"
"${LIBRARY_DIR}/encryption/key_metadata.cc"
"${LIBRARY_DIR}/encryption/key_toolkit.cc"
"${LIBRARY_DIR}/encryption/key_toolkit_internal.cc"
"${LIBRARY_DIR}/encryption/kms_client.cc"
"${LIBRARY_DIR}/encryption/local_wrap_kms_client.cc"
"${LIBRARY_DIR}/encryption/openssl_internal.cc"
"${LIBRARY_DIR}/exception.cc"
"${LIBRARY_DIR}/file_reader.cc"
"${LIBRARY_DIR}/file_writer.cc"
"${LIBRARY_DIR}/page_index.cc"
"${LIBRARY_DIR}/level_conversion.cc"
"${LIBRARY_DIR}/level_comparison.cc"
"${LIBRARY_DIR}/level_comparison_avx2.cc"
"${LIBRARY_DIR}/level_conversion.cc"
"${LIBRARY_DIR}/level_conversion_bmi2.cc"
"${LIBRARY_DIR}/metadata.cc"
"${LIBRARY_DIR}/page_index.cc"
"${LIBRARY_DIR}/platform.cc"
"${LIBRARY_DIR}/printer.cc"
"${LIBRARY_DIR}/properties.cc"
@ -489,7 +523,6 @@ set(PARQUET_SRCS
"${LIBRARY_DIR}/stream_reader.cc"
"${LIBRARY_DIR}/stream_writer.cc"
"${LIBRARY_DIR}/types.cc"
"${LIBRARY_DIR}/bloom_filter_reader.cc"
"${LIBRARY_DIR}/xxhasher.cc"
"${GEN_LIBRARY_DIR}/parquet_constants.cpp"
@ -520,6 +553,9 @@ endif ()
add_definitions(-DPARQUET_THRIFT_VERSION_MAJOR=0)
add_definitions(-DPARQUET_THRIFT_VERSION_MINOR=16)
# As per https://github.com/apache/arrow/pull/35672 you need to enable it explicitly.
add_definitions(-DARROW_ENABLE_THREADING)
# === tools
set(TOOLS_DIR "${ClickHouse_SOURCE_DIR}/contrib/arrow/cpp/tools/parquet")

2
contrib/flatbuffers vendored

@ -1 +1 @@
Subproject commit eb3f827948241ce0e701516f16cd67324802bce9
Subproject commit 0100f6a5779831fa7a651e4b67ef389a8752bd9b

1
contrib/idxd-config vendored

@ -1 +0,0 @@
Subproject commit a836ce0e42052a69bffbbc14239ab4097f3b77f1

View File

@ -1,23 +0,0 @@
## accel_config is the utility library required by QPL-Deflate codec for controlling and configuring Intel® In-Memory Analytics Accelerator (Intel® IAA).
set (LIBACCEL_SOURCE_DIR "${ClickHouse_SOURCE_DIR}/contrib/idxd-config")
set (UUID_DIR "${ClickHouse_SOURCE_DIR}/contrib/qpl-cmake")
set (LIBACCEL_HEADER_DIR "${ClickHouse_SOURCE_DIR}/contrib/idxd-config-cmake/include")
set (SRCS
"${LIBACCEL_SOURCE_DIR}/accfg/lib/libaccfg.c"
"${LIBACCEL_SOURCE_DIR}/util/log.c"
"${LIBACCEL_SOURCE_DIR}/util/sysfs.c"
)
add_library(_accel-config ${SRCS})
target_compile_options(_accel-config PRIVATE "-D_GNU_SOURCE")
target_include_directories(_accel-config BEFORE
PRIVATE ${UUID_DIR}
PRIVATE ${LIBACCEL_HEADER_DIR}
PRIVATE ${LIBACCEL_SOURCE_DIR})
target_include_directories(_accel-config SYSTEM BEFORE
PUBLIC ${LIBACCEL_SOURCE_DIR}/accfg)
add_library(ch_contrib::accel-config ALIAS _accel-config)

View File

@ -1,159 +0,0 @@
/* config.h. Generated from config.h.in by configure. */
/* config.h.in. Generated from configure.ac by autoheader. */
/* Define if building universal (internal helper macro) */
/* #undef AC_APPLE_UNIVERSAL_BUILD */
/* Debug messages. */
/* #undef ENABLE_DEBUG */
/* Documentation / man pages. */
/* #define ENABLE_DOCS */
/* System logging. */
#define ENABLE_LOGGING 1
/* accfg test support */
/* #undef ENABLE_TEST */
/* Define to 1 if big-endian-arch */
/* #undef HAVE_BIG_ENDIAN */
/* Define to 1 if you have the <dlfcn.h> header file. */
#define HAVE_DLFCN_H 1
/* Define to 1 if you have the <inttypes.h> header file. */
#define HAVE_INTTYPES_H 1
/* Define to 1 if you have the <linux/version.h> header file. */
#define HAVE_LINUX_VERSION_H 1
/* Define to 1 if little-endian-arch */
#define HAVE_LITTLE_ENDIAN 1
/* Define to 1 if you have the <memory.h> header file. */
#define HAVE_MEMORY_H 1
/* Define to 1 if you have the `secure_getenv' function. */
#define HAVE_SECURE_GETENV 1
/* Define to 1 if you have statement expressions. */
#define HAVE_STATEMENT_EXPR 1
/* Define to 1 if you have the <stdint.h> header file. */
#define HAVE_STDINT_H 1
/* Define to 1 if you have the <stdlib.h> header file. */
#define HAVE_STDLIB_H 1
/* Define to 1 if you have the <strings.h> header file. */
#define HAVE_STRINGS_H 1
/* Define to 1 if you have the <string.h> header file. */
#define HAVE_STRING_H 1
/* Define to 1 if you have the <sys/stat.h> header file. */
#define HAVE_SYS_STAT_H 1
/* Define to 1 if you have the <sys/types.h> header file. */
#define HAVE_SYS_TYPES_H 1
/* Define to 1 if typeof works with your compiler. */
#define HAVE_TYPEOF 1
/* Define to 1 if you have the <unistd.h> header file. */
#define HAVE_UNISTD_H 1
/* Define to 1 if using libuuid */
#define HAVE_UUID 1
/* Define to 1 if you have the `__secure_getenv' function. */
/* #undef HAVE___SECURE_GETENV */
/* Define to the sub-directory where libtool stores uninstalled libraries. */
#define LT_OBJDIR ".libs/"
/* Name of package */
#define PACKAGE "accel-config"
/* Define to the address where bug reports for this package should be sent. */
#define PACKAGE_BUGREPORT "linux-dsa@lists.01.org"
/* Define to the full name of this package. */
#define PACKAGE_NAME "accel-config"
/* Define to the full name and version of this package. */
#define PACKAGE_STRING "accel-config 3.5.2.gitf6605c41"
/* Define to the one symbol short name of this package. */
#define PACKAGE_TARNAME "accel-config"
/* Define to the home page for this package. */
#define PACKAGE_URL "https://github.com/xxx/accel-config"
/* Define to the version of this package. */
#define PACKAGE_VERSION "3.5.2.gitf6605c41"
/* Define to 1 if you have the ANSI C header files. */
#define STDC_HEADERS 1
/* Enable extensions on AIX 3, Interix. */
#ifndef _ALL_SOURCE
# define _ALL_SOURCE 1
#endif
/* Enable GNU extensions on systems that have them. */
#ifndef _GNU_SOURCE
# define _GNU_SOURCE 1
#endif
/* Enable threading extensions on Solaris. */
#ifndef _POSIX_PTHREAD_SEMANTICS
# define _POSIX_PTHREAD_SEMANTICS 1
#endif
/* Enable extensions on HP NonStop. */
#ifndef _TANDEM_SOURCE
# define _TANDEM_SOURCE 1
#endif
/* Enable general extensions on Solaris. */
#ifndef __EXTENSIONS__
# define __EXTENSIONS__ 1
#endif
/* Version number of package */
#define VERSION "3.5.2.gitf6605c41"
/* Define WORDS_BIGENDIAN to 1 if your processor stores words with the most
significant byte first (like Motorola and SPARC, unlike Intel). */
#if defined AC_APPLE_UNIVERSAL_BUILD
# if defined __BIG_ENDIAN__
# define WORDS_BIGENDIAN 1
# endif
#else
# ifndef WORDS_BIGENDIAN
/* # undef WORDS_BIGENDIAN */
# endif
#endif
/* Enable large inode numbers on Mac OS X 10.5. */
#ifndef _DARWIN_USE_64_BIT_INODE
# define _DARWIN_USE_64_BIT_INODE 1
#endif
/* Number of bits in a file offset, on hosts where this is settable. */
/* #undef _FILE_OFFSET_BITS */
/* Define for large files, on AIX-style hosts. */
/* #undef _LARGE_FILES */
/* Define to 1 if on MINIX. */
/* #undef _MINIX */
/* Define to 2 if the system does not provide POSIX.1 features except with
this defined. */
/* #undef _POSIX_1_SOURCE */
/* Define to 1 if you need to in order for `stat' and other things to work. */
/* #undef _POSIX_SOURCE */
/* Define to __typeof__ if your compiler spells it that way. */
/* #undef typeof */

2
contrib/krb5 vendored

@ -1 +1 @@
Subproject commit 71b06c2276009ae649c7703019f3b4605f66fd3d
Subproject commit c5b4b994c18db86933255907a97eee5993fd18fe

2
contrib/numactl vendored

@ -1 +1 @@
Subproject commit 8d13d63a05f0c3cd88bf777cbb61541202b7da08
Subproject commit ff32c618d63ca7ac48cce366c5a04bb3563683a0

2
contrib/orc vendored

@ -1 +1 @@
Subproject commit bcc025c09828c556f54cfbdf83a66b9acae7d17f
Subproject commit 223e1957ff308f2aec7793884ffb0d7692d48484

1
contrib/qpl vendored

@ -1 +0,0 @@
Subproject commit c2ced94c53c1ee22191201a59878e9280bc9b9b8

View File

@ -1,738 +0,0 @@
## The Intel® QPL provides high performance implementations of data processing functions for existing hardware accelerator, and/or software path in case if hardware accelerator is not available.
set (UUID_DIR "${ClickHouse_SOURCE_DIR}/contrib/qpl-cmake")
set (QPL_PROJECT_DIR "${ClickHouse_SOURCE_DIR}/contrib/qpl")
set (QPL_SRC_DIR "${ClickHouse_SOURCE_DIR}/contrib/qpl/sources")
set (QPL_BINARY_DIR "${ClickHouse_BINARY_DIR}/build/contrib/qpl")
set (EFFICIENT_WAIT OFF)
set (LOG_HW_INIT OFF)
set (SANITIZE_MEMORY OFF)
set (SANITIZE_THREADS OFF)
set (LIB_FUZZING_ENGINE OFF)
set (DYNAMIC_LOADING_LIBACCEL_CONFIG OFF)
function(GetLibraryVersion _content _outputVar)
string(REGEX MATCHALL "QPL VERSION (.+) LANGUAGES" VERSION_REGEX "${_content}")
SET(${_outputVar} ${CMAKE_MATCH_1} PARENT_SCOPE)
endfunction()
set (QPL_VERSION 1.6.0)
message(STATUS "Intel QPL version: ${QPL_VERSION}")
# There are 5 source subdirectories under $QPL_SRC_DIR: c_api, core-iaa, core-sw, middle-layer and isal.
# Generate 8 library targets: qpl_c_api, core_iaa, qplcore_px, qplcore_avx512, qplcore_sw_dispatcher, middle_layer_lib, isal and isal_asm,
# which are then combined into static or shared qpl.
# Output ch_contrib::qpl by linking with 8 library targets.
# Note, QPL has integrated a customized version of ISA-L to meet specific needs.
# This version has been significantly modified and there are no plans to maintain compatibility with the upstream version
# or upgrade the current copy.
## cmake/CompileOptions.cmake and automatic wrappers generation
# ==========================================================================
# Copyright (C) 2022 Intel Corporation
#
# SPDX-License-Identifier: MIT
# ==========================================================================
set(QPL_LINUX_TOOLCHAIN_CPP_EMBEDDED_FLAGS "-fno-exceptions;-fno-rtti")
function(modify_standard_language_flag)
# Declaring function parameters
set(OPTIONS "")
set(ONE_VALUE_ARGS
LANGUAGE_NAME
FLAG_NAME
NEW_FLAG_VALUE)
set(MULTI_VALUE_ARGS "")
# Parsing function parameters
cmake_parse_arguments(MODIFY
"${OPTIONS}"
"${ONE_VALUE_ARGS}"
"${MULTI_VALUE_ARGS}"
${ARGN})
# Variables
set(FLAG_REGULAR_EXPRESSION "${MODIFY_FLAG_NAME}.*[ ]*")
set(NEW_VALUE "${MODIFY_FLAG_NAME}${MODIFY_NEW_FLAG_VALUE}")
# Replacing specified flag with new value
string(REGEX REPLACE
${FLAG_REGULAR_EXPRESSION} ${NEW_VALUE}
NEW_COMPILE_FLAGS
"${CMAKE_${MODIFY_LANGUAGE_NAME}_FLAGS}")
# Returning the value
set(CMAKE_${MODIFY_LANGUAGE_NAME}_FLAGS ${NEW_COMPILE_FLAGS} PARENT_SCOPE)
endfunction()
function(get_function_name_with_default_bit_width in_function_name bit_width out_function_name)
if(in_function_name MATCHES ".*_i")
string(REPLACE "_i" "" in_function_name ${in_function_name})
set(${out_function_name} "${in_function_name}_${bit_width}_i" PARENT_SCOPE)
else()
set(${out_function_name} "${in_function_name}_${bit_width}" PARENT_SCOPE)
endif()
endfunction()
macro(get_list_of_supported_optimizations PLATFORMS_LIST)
list(APPEND PLATFORMS_LIST "")
list(APPEND PLATFORMS_LIST "px")
list(APPEND PLATFORMS_LIST "avx512")
endmacro(get_list_of_supported_optimizations)
function(generate_unpack_kernel_arrays current_directory PLATFORMS_LIST)
list(APPEND UNPACK_POSTFIX_LIST "")
list(APPEND UNPACK_PRLE_POSTFIX_LIST "")
list(APPEND PACK_POSTFIX_LIST "")
list(APPEND PACK_INDEX_POSTFIX_LIST "")
list(APPEND SCAN_POSTFIX_LIST "")
list(APPEND DEFAULT_BIT_WIDTH_FUNCTIONS_LIST "")
list(APPEND DEFAULT_BIT_WIDTH_LIST "")
#create list of functions that use only 8u 16u 32u postfixes
list(APPEND DEFAULT_BIT_WIDTH_FUNCTIONS_LIST "unpack_prle")
list(APPEND DEFAULT_BIT_WIDTH_FUNCTIONS_LIST "extract")
list(APPEND DEFAULT_BIT_WIDTH_FUNCTIONS_LIST "extract_i")
list(APPEND DEFAULT_BIT_WIDTH_FUNCTIONS_LIST "select")
list(APPEND DEFAULT_BIT_WIDTH_FUNCTIONS_LIST "select_i")
list(APPEND DEFAULT_BIT_WIDTH_FUNCTIONS_LIST "expand")
#create default bit width list
list(APPEND DEFAULT_BIT_WIDTH_LIST "8u")
list(APPEND DEFAULT_BIT_WIDTH_LIST "16u")
list(APPEND DEFAULT_BIT_WIDTH_LIST "32u")
#create scan kernel postfixes
list(APPEND SCAN_COMPARATOR_LIST "")
list(APPEND SCAN_COMPARATOR_LIST "eq")
list(APPEND SCAN_COMPARATOR_LIST "ne")
list(APPEND SCAN_COMPARATOR_LIST "lt")
list(APPEND SCAN_COMPARATOR_LIST "le")
list(APPEND SCAN_COMPARATOR_LIST "gt")
list(APPEND SCAN_COMPARATOR_LIST "ge")
list(APPEND SCAN_COMPARATOR_LIST "range")
list(APPEND SCAN_COMPARATOR_LIST "not_range")
foreach(SCAN_COMPARATOR IN LISTS SCAN_COMPARATOR_LIST)
list(APPEND SCAN_POSTFIX_LIST "_${SCAN_COMPARATOR}_8u")
list(APPEND SCAN_POSTFIX_LIST "_${SCAN_COMPARATOR}_16u8u")
list(APPEND SCAN_POSTFIX_LIST "_${SCAN_COMPARATOR}_32u8u")
endforeach()
# create unpack kernel postfixes
foreach(input_width RANGE 1 32 1)
if(input_width LESS 8 OR input_width EQUAL 8)
list(APPEND UNPACK_POSTFIX_LIST "_${input_width}u8u")
elseif(input_width LESS 16 OR input_width EQUAL 16)
list(APPEND UNPACK_POSTFIX_LIST "_${input_width}u16u")
else()
list(APPEND UNPACK_POSTFIX_LIST "_${input_width}u32u")
endif()
endforeach()
# create pack kernel postfixes
foreach(output_width RANGE 1 8 1)
list(APPEND PACK_POSTFIX_LIST "_8u${output_width}u")
endforeach()
foreach(output_width RANGE 9 16 1)
list(APPEND PACK_POSTFIX_LIST "_16u${output_width}u")
endforeach()
foreach(output_width RANGE 17 32 1)
list(APPEND PACK_POSTFIX_LIST "_32u${output_width}u")
endforeach()
list(APPEND PACK_POSTFIX_LIST "_8u16u")
list(APPEND PACK_POSTFIX_LIST "_8u32u")
list(APPEND PACK_POSTFIX_LIST "_16u32u")
# create pack index kernel postfixes
list(APPEND PACK_INDEX_POSTFIX_LIST "_nu")
list(APPEND PACK_INDEX_POSTFIX_LIST "_8u")
list(APPEND PACK_INDEX_POSTFIX_LIST "_8u16u")
list(APPEND PACK_INDEX_POSTFIX_LIST "_8u32u")
# write to file
file(MAKE_DIRECTORY ${current_directory}/generated)
foreach(PLATFORM_VALUE IN LISTS PLATFORMS_LIST)
set(directory "${current_directory}/generated")
set(PLATFORM_PREFIX "${PLATFORM_VALUE}_")
#
# Write unpack table
#
file(WRITE ${directory}/${PLATFORM_PREFIX}unpack.cpp "#include \"qplc_api.h\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}unpack.cpp "#include \"dispatcher/dispatcher.hpp\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}unpack.cpp "namespace qpl::core_sw::dispatcher\n{\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}unpack.cpp "unpack_table_t ${PLATFORM_PREFIX}unpack_table = {\n")
#write LE kernels
foreach(UNPACK_POSTFIX IN LISTS UNPACK_POSTFIX_LIST)
file(APPEND ${directory}/${PLATFORM_PREFIX}unpack.cpp "\t${PLATFORM_PREFIX}qplc_unpack${UNPACK_POSTFIX},\n")
endforeach()
#write BE kernels
#get last element of the list
set(LAST_ELEMENT "")
list(GET UNPACK_POSTFIX_LIST -1 LAST_ELEMENT)
foreach(UNPACK_POSTFIX IN LISTS UNPACK_POSTFIX_LIST)
if(UNPACK_POSTFIX STREQUAL LAST_ELEMENT)
file(APPEND ${directory}/${PLATFORM_PREFIX}unpack.cpp "\t${PLATFORM_PREFIX}qplc_unpack_be${UNPACK_POSTFIX}};\n")
else()
file(APPEND ${directory}/${PLATFORM_PREFIX}unpack.cpp "\t${PLATFORM_PREFIX}qplc_unpack_be${UNPACK_POSTFIX},\n")
endif()
endforeach()
file(APPEND ${directory}/${PLATFORM_PREFIX}unpack.cpp "}\n")
#
# Write pack table
#
file(WRITE ${directory}/${PLATFORM_PREFIX}pack.cpp "#include \"qplc_api.h\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}pack.cpp "#include \"dispatcher/dispatcher.hpp\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}pack.cpp "namespace qpl::core_sw::dispatcher\n{\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}pack.cpp "pack_table_t ${PLATFORM_PREFIX}pack_table = {\n")
#write LE kernels
foreach(PACK_POSTFIX IN LISTS PACK_POSTFIX_LIST)
file(APPEND ${directory}/${PLATFORM_PREFIX}pack.cpp "\t${PLATFORM_PREFIX}qplc_pack${PACK_POSTFIX},\n")
endforeach()
#write BE kernels
#get last element of the list
set(LAST_ELEMENT "")
list(GET PACK_POSTFIX_LIST -1 LAST_ELEMENT)
foreach(PACK_POSTFIX IN LISTS PACK_POSTFIX_LIST)
if(PACK_POSTFIX STREQUAL LAST_ELEMENT)
file(APPEND ${directory}/${PLATFORM_PREFIX}pack.cpp "\t${PLATFORM_PREFIX}qplc_pack_be${PACK_POSTFIX}};\n")
else()
file(APPEND ${directory}/${PLATFORM_PREFIX}pack.cpp "\t${PLATFORM_PREFIX}qplc_pack_be${PACK_POSTFIX},\n")
endif()
endforeach()
file(APPEND ${directory}/${PLATFORM_PREFIX}pack.cpp "}\n")
#
# Write scan table
#
file(WRITE ${directory}/${PLATFORM_PREFIX}scan.cpp "#include \"qplc_api.h\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}scan.cpp "#include \"dispatcher/dispatcher.hpp\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}scan.cpp "namespace qpl::core_sw::dispatcher\n{\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}scan.cpp "scan_table_t ${PLATFORM_PREFIX}scan_table = {\n")
#get last element of the list
set(LAST_ELEMENT "")
list(GET SCAN_POSTFIX_LIST -1 LAST_ELEMENT)
foreach(SCAN_POSTFIX IN LISTS SCAN_POSTFIX_LIST)
if(SCAN_POSTFIX STREQUAL LAST_ELEMENT)
file(APPEND ${directory}/${PLATFORM_PREFIX}scan.cpp "\t${PLATFORM_PREFIX}qplc_scan${SCAN_POSTFIX}};\n")
else()
file(APPEND ${directory}/${PLATFORM_PREFIX}scan.cpp "\t${PLATFORM_PREFIX}qplc_scan${SCAN_POSTFIX},\n")
endif()
endforeach()
file(APPEND ${directory}/${PLATFORM_PREFIX}scan.cpp "}\n")
#
# Write scan_i table
#
file(WRITE ${directory}/${PLATFORM_PREFIX}scan_i.cpp "#include \"qplc_api.h\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}scan_i.cpp "#include \"dispatcher/dispatcher.hpp\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}scan_i.cpp "namespace qpl::core_sw::dispatcher\n{\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}scan_i.cpp "scan_i_table_t ${PLATFORM_PREFIX}scan_i_table = {\n")
#get last element of the list
set(LAST_ELEMENT "")
list(GET SCAN_POSTFIX_LIST -1 LAST_ELEMENT)
foreach(SCAN_POSTFIX IN LISTS SCAN_POSTFIX_LIST)
if(SCAN_POSTFIX STREQUAL LAST_ELEMENT)
file(APPEND ${directory}/${PLATFORM_PREFIX}scan_i.cpp "\t${PLATFORM_PREFIX}qplc_scan${SCAN_POSTFIX}_i};\n")
else()
file(APPEND ${directory}/${PLATFORM_PREFIX}scan_i.cpp "\t${PLATFORM_PREFIX}qplc_scan${SCAN_POSTFIX}_i,\n")
endif()
endforeach()
file(APPEND ${directory}/${PLATFORM_PREFIX}scan_i.cpp "}\n")
#
# Write pack_index table
#
file(WRITE ${directory}/${PLATFORM_PREFIX}pack_index.cpp "#include \"qplc_api.h\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}pack_index.cpp "#include \"dispatcher/dispatcher.hpp\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}pack_index.cpp "namespace qpl::core_sw::dispatcher\n{\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}pack_index.cpp "pack_index_table_t ${PLATFORM_PREFIX}pack_index_table = {\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}pack_index.cpp "\t${PLATFORM_PREFIX}qplc_pack_bits_nu,\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}pack_index.cpp "\t${PLATFORM_PREFIX}qplc_pack_index_8u,\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}pack_index.cpp "\t${PLATFORM_PREFIX}qplc_pack_index_8u16u,\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}pack_index.cpp "\t${PLATFORM_PREFIX}qplc_pack_index_8u32u,\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}pack_index.cpp "\t${PLATFORM_PREFIX}qplc_pack_bits_be_nu,\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}pack_index.cpp "\t${PLATFORM_PREFIX}qplc_pack_index_8u,\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}pack_index.cpp "\t${PLATFORM_PREFIX}qplc_pack_index_be_8u16u,\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}pack_index.cpp "\t${PLATFORM_PREFIX}qplc_pack_index_be_8u32u};\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}pack_index.cpp "}\n")
#
# Write default bit width functions
#
foreach(DEAULT_BIT_WIDTH_FUNCTION IN LISTS DEFAULT_BIT_WIDTH_FUNCTIONS_LIST)
file(WRITE ${directory}/${PLATFORM_PREFIX}${DEAULT_BIT_WIDTH_FUNCTION}.cpp "#include \"qplc_api.h\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}${DEAULT_BIT_WIDTH_FUNCTION}.cpp "#include \"dispatcher/dispatcher.hpp\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}${DEAULT_BIT_WIDTH_FUNCTION}.cpp "namespace qpl::core_sw::dispatcher\n{\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}${DEAULT_BIT_WIDTH_FUNCTION}.cpp "${DEAULT_BIT_WIDTH_FUNCTION}_table_t ${PLATFORM_PREFIX}${DEAULT_BIT_WIDTH_FUNCTION}_table = {\n")
#get last element of the list
set(LAST_ELEMENT "")
list(GET DEFAULT_BIT_WIDTH_LIST -1 LAST_ELEMENT)
foreach(BIT_WIDTH IN LISTS DEFAULT_BIT_WIDTH_LIST)
set(FUNCTION_NAME "")
get_function_name_with_default_bit_width(${DEAULT_BIT_WIDTH_FUNCTION} ${BIT_WIDTH} FUNCTION_NAME)
if(BIT_WIDTH STREQUAL LAST_ELEMENT)
file(APPEND ${directory}/${PLATFORM_PREFIX}${DEAULT_BIT_WIDTH_FUNCTION}.cpp "\t${PLATFORM_PREFIX}qplc_${FUNCTION_NAME}};\n")
else()
file(APPEND ${directory}/${PLATFORM_PREFIX}${DEAULT_BIT_WIDTH_FUNCTION}.cpp "\t${PLATFORM_PREFIX}qplc_${FUNCTION_NAME},\n")
endif()
endforeach()
file(APPEND ${directory}/${PLATFORM_PREFIX}${DEAULT_BIT_WIDTH_FUNCTION}.cpp "}\n")
endforeach()
#
# Write aggregates table
#
file(WRITE ${directory}/${PLATFORM_PREFIX}aggregates.cpp "#include \"qplc_api.h\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}aggregates.cpp "#include \"dispatcher/dispatcher.hpp\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}aggregates.cpp "namespace qpl::core_sw::dispatcher\n{\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}aggregates.cpp "aggregates_table_t ${PLATFORM_PREFIX}aggregates_table = {\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}aggregates.cpp "\t${PLATFORM_PREFIX}qplc_bit_aggregates_8u,\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}aggregates.cpp "\t${PLATFORM_PREFIX}qplc_aggregates_8u,\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}aggregates.cpp "\t${PLATFORM_PREFIX}qplc_aggregates_16u,\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}aggregates.cpp "\t${PLATFORM_PREFIX}qplc_aggregates_32u};\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}aggregates.cpp "}\n")
#
# Write mem_copy functions table
#
file(WRITE ${directory}/${PLATFORM_PREFIX}memory_copy.cpp "#include \"qplc_api.h\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}memory_copy.cpp "#include \"dispatcher/dispatcher.hpp\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}memory_copy.cpp "namespace qpl::core_sw::dispatcher\n{\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}memory_copy.cpp "memory_copy_table_t ${PLATFORM_PREFIX}memory_copy_table = {\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}memory_copy.cpp "\t${PLATFORM_PREFIX}qplc_copy_8u,\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}memory_copy.cpp "\t${PLATFORM_PREFIX}qplc_copy_16u,\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}memory_copy.cpp "\t${PLATFORM_PREFIX}qplc_copy_32u};\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}memory_copy.cpp "}\n")
#
# Write mem_copy functions table
#
file(WRITE ${directory}/${PLATFORM_PREFIX}zero.cpp "#include \"qplc_api.h\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}zero.cpp "#include \"dispatcher/dispatcher.hpp\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}zero.cpp "namespace qpl::core_sw::dispatcher\n{\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}zero.cpp "zero_table_t ${PLATFORM_PREFIX}zero_table = {\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}zero.cpp "\t${PLATFORM_PREFIX}qplc_zero_8u};\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}zero.cpp "}\n")
#
# Write move functions table
#
file(WRITE ${directory}/${PLATFORM_PREFIX}move.cpp "#include \"qplc_api.h\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}move.cpp "#include \"dispatcher/dispatcher.hpp\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}move.cpp "namespace qpl::core_sw::dispatcher\n{\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}move.cpp "move_table_t ${PLATFORM_PREFIX}move_table = {\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}move.cpp "\t${PLATFORM_PREFIX}qplc_move_8u};\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}move.cpp "}\n")
#
# Write crc64 function table
#
file(WRITE ${directory}/${PLATFORM_PREFIX}crc64.cpp "#include \"qplc_api.h\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}crc64.cpp "#include \"dispatcher/dispatcher.hpp\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}crc64.cpp "namespace qpl::core_sw::dispatcher\n{\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}crc64.cpp "crc64_table_t ${PLATFORM_PREFIX}crc64_table = {\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}crc64.cpp "\t${PLATFORM_PREFIX}qplc_crc64};\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}crc64.cpp "}\n")
#
# Write xor_checksum function table
#
file(WRITE ${directory}/${PLATFORM_PREFIX}xor_checksum.cpp "#include \"qplc_api.h\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}xor_checksum.cpp "#include \"dispatcher/dispatcher.hpp\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}xor_checksum.cpp "namespace qpl::core_sw::dispatcher\n{\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}xor_checksum.cpp "xor_checksum_table_t ${PLATFORM_PREFIX}xor_checksum_table = {\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}xor_checksum.cpp "\t${PLATFORM_PREFIX}qplc_xor_checksum_8u};\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}xor_checksum.cpp "}\n")
#
# Write deflate functions table
#
file(WRITE ${directory}/${PLATFORM_PREFIX}deflate.cpp "#include \"deflate_slow_icf.h\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}deflate.cpp "#include \"deflate_hash_table.h\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}deflate.cpp "#include \"deflate_histogram.h\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}deflate.cpp "#include \"dispatcher/dispatcher.hpp\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}deflate.cpp "namespace qpl::core_sw::dispatcher\n{\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}deflate.cpp "deflate_table_t ${PLATFORM_PREFIX}deflate_table = {\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}deflate.cpp "\t reinterpret_cast<void *>(&${PLATFORM_PREFIX}slow_deflate_icf_body),\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}deflate.cpp "\t reinterpret_cast<void *>(&${PLATFORM_PREFIX}deflate_histogram_reset),\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}deflate.cpp "\t reinterpret_cast<void *>(&${PLATFORM_PREFIX}deflate_hash_table_reset)};\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}deflate.cpp "}\n")
#
# Write deflate fix functions table
#
file(WRITE ${directory}/${PLATFORM_PREFIX}deflate_fix.cpp "#include \"deflate_slow.h\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}deflate_fix.cpp "#include \"dispatcher/dispatcher.hpp\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}deflate_fix.cpp "namespace qpl::core_sw::dispatcher\n{\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}deflate_fix.cpp "deflate_fix_table_t ${PLATFORM_PREFIX}deflate_fix_table = {\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}deflate_fix.cpp "\t reinterpret_cast<void *>(&${PLATFORM_PREFIX}slow_deflate_body)};\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}deflate_fix.cpp "}\n")
#
# Write setup_dictionary functions table
#
file(WRITE ${directory}/${PLATFORM_PREFIX}setup_dictionary.cpp "#include \"deflate_slow_utils.h\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}setup_dictionary.cpp "#include \"dispatcher/dispatcher.hpp\"\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}setup_dictionary.cpp "namespace qpl::core_sw::dispatcher\n{\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}setup_dictionary.cpp "setup_dictionary_table_t ${PLATFORM_PREFIX}setup_dictionary_table = {\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}setup_dictionary.cpp "\t reinterpret_cast<void *>(&${PLATFORM_PREFIX}setup_dictionary)};\n")
file(APPEND ${directory}/${PLATFORM_PREFIX}setup_dictionary.cpp "}\n")
endforeach()
endfunction()
# [SUBDIR]isal
enable_language(ASM_NASM)
set(ISAL_C_SRC ${QPL_SRC_DIR}/isal/igzip/adler32_base.c
${QPL_SRC_DIR}/isal/igzip/huff_codes.c
${QPL_SRC_DIR}/isal/igzip/hufftables_c.c
${QPL_SRC_DIR}/isal/igzip/igzip.c
${QPL_SRC_DIR}/isal/igzip/igzip_base.c
${QPL_SRC_DIR}/isal/igzip/flatten_ll.c
${QPL_SRC_DIR}/isal/igzip/encode_df.c
${QPL_SRC_DIR}/isal/igzip/igzip_icf_base.c
${QPL_SRC_DIR}/isal/igzip/igzip_inflate.c
${QPL_SRC_DIR}/isal/igzip/igzip_icf_body.c
${QPL_SRC_DIR}/isal/crc/crc_base.c
${QPL_SRC_DIR}/isal/crc/crc64_base.c)
set(ISAL_ASM_SRC ${QPL_SRC_DIR}/isal/igzip/igzip_body.asm
${QPL_SRC_DIR}/isal/igzip/igzip_gen_icf_map_lh1_04.asm
${QPL_SRC_DIR}/isal/igzip/igzip_gen_icf_map_lh1_06.asm
${QPL_SRC_DIR}/isal/igzip/igzip_decode_block_stateless_04.asm
${QPL_SRC_DIR}/isal/igzip/igzip_finish.asm
${QPL_SRC_DIR}/isal/igzip/encode_df_04.asm
${QPL_SRC_DIR}/isal/igzip/encode_df_06.asm
${QPL_SRC_DIR}/isal/igzip/igzip_decode_block_stateless_01.asm
${QPL_SRC_DIR}/isal/igzip/proc_heap.asm
${QPL_SRC_DIR}/isal/igzip/igzip_icf_body_h1_gr_bt.asm
${QPL_SRC_DIR}/isal/igzip/igzip_icf_finish.asm
${QPL_SRC_DIR}/isal/igzip/igzip_inflate_multibinary.asm
${QPL_SRC_DIR}/isal/igzip/igzip_update_histogram_01.asm
${QPL_SRC_DIR}/isal/igzip/igzip_update_histogram_04.asm
${QPL_SRC_DIR}/isal/igzip/rfc1951_lookup.asm
${QPL_SRC_DIR}/isal/igzip/adler32_sse.asm
${QPL_SRC_DIR}/isal/igzip/adler32_avx2_4.asm
${QPL_SRC_DIR}/isal/igzip/igzip_deflate_hash.asm
${QPL_SRC_DIR}/isal/igzip/igzip_set_long_icf_fg_04.asm
${QPL_SRC_DIR}/isal/igzip/igzip_set_long_icf_fg_06.asm
${QPL_SRC_DIR}/isal/igzip/igzip_multibinary.asm
${QPL_SRC_DIR}/isal/crc/crc_multibinary.asm
${QPL_SRC_DIR}/isal/crc/crc32_gzip_refl_by8.asm
${QPL_SRC_DIR}/isal/crc/crc32_gzip_refl_by8_02.asm
${QPL_SRC_DIR}/isal/crc/crc32_gzip_refl_by16_10.asm
${QPL_SRC_DIR}/isal/crc/crc32_ieee_01.asm
${QPL_SRC_DIR}/isal/crc/crc32_ieee_02.asm
${QPL_SRC_DIR}/isal/crc/crc32_ieee_by4.asm
${QPL_SRC_DIR}/isal/crc/crc32_ieee_by16_10.asm
${QPL_SRC_DIR}/isal/crc/crc32_iscsi_00.asm
${QPL_SRC_DIR}/isal/crc/crc32_iscsi_01.asm
${QPL_SRC_DIR}/isal/crc/crc32_iscsi_by16_10.asm)
# Adding ISA-L library target
add_library(isal OBJECT ${ISAL_C_SRC})
add_library(isal_asm OBJECT ${ISAL_ASM_SRC})
set_property(GLOBAL APPEND PROPERTY QPL_LIB_DEPS
$<TARGET_OBJECTS:isal>)
set_property(GLOBAL APPEND PROPERTY QPL_LIB_DEPS
$<TARGET_OBJECTS:isal_asm>)
# Setting external and internal interfaces for ISA-L library
target_include_directories(isal
PUBLIC $<BUILD_INTERFACE:${QPL_SRC_DIR}/isal/include>
PUBLIC ${QPL_SRC_DIR}/isal/igzip)
set_target_properties(isal PROPERTIES
CXX_STANDARD 11
C_STANDARD 99)
# AS_FEATURE_LEVEL=10 means "Check SIMD capabilities of the target system at runtime and use up to AVX512 if available".
# HAVE_KNOWS_AVX512 means rely on AVX512 being available on the target system.
target_compile_options(isal_asm PRIVATE "-I${QPL_SRC_DIR}/isal/include/"
PRIVATE "-I${QPL_SRC_DIR}/isal/igzip/"
PRIVATE "-I${QPL_SRC_DIR}/isal/crc/"
PRIVATE "-DHAVE_AS_KNOWS_AVX512"
PRIVATE "-DAS_FEATURE_LEVEL=10"
PRIVATE "-DQPL_LIB")
# Here must remove "-fno-sanitize=undefined" from COMPILE_OPTIONS.
# Otherwise nasm compiler would fail to proceed due to unrecognition of "-fno-sanitize=undefined"
if (SANITIZE STREQUAL "undefined")
get_target_property(target_options isal_asm COMPILE_OPTIONS)
list(REMOVE_ITEM target_options "-fno-sanitize=undefined")
set_property(TARGET isal_asm PROPERTY COMPILE_OPTIONS ${target_options})
endif()
target_compile_definitions(isal PUBLIC
QPL_LIB
NDEBUG)
# [SUBDIR]core-sw
# Create set of libraries corresponding to supported platforms for SW fallback which are implemented by AVX512 and non-AVX512 instructions respectively.
# The upper level QPL API will check SIMD capabilities of the target system at runtime and decide to call AVX512 function or non-AVX512 function.
# Hence, here we don't need put ENABLE_AVX512 CMake switch.
get_list_of_supported_optimizations(PLATFORMS_LIST)
foreach(PLATFORM_ID IN LISTS PLATFORMS_LIST)
# Find Core Sources
file(GLOB SOURCES
${QPL_SRC_DIR}/core-sw/src/checksums/*.c
${QPL_SRC_DIR}/core-sw/src/filtering/*.c
${QPL_SRC_DIR}/core-sw/src/other/*.c
${QPL_SRC_DIR}/core-sw/src/compression/*.c)
file(GLOB DATA_SOURCES
${QPL_SRC_DIR}/core-sw/src/data/*.c)
# Create library
add_library(qplcore_${PLATFORM_ID} OBJECT ${SOURCES})
set_property(GLOBAL APPEND PROPERTY QPL_LIB_DEPS
$<TARGET_OBJECTS:qplcore_${PLATFORM_ID}>)
target_include_directories(qplcore_${PLATFORM_ID}
PUBLIC $<BUILD_INTERFACE:${QPL_SRC_DIR}/core-sw>
PUBLIC $<BUILD_INTERFACE:${QPL_SRC_DIR}/core-sw/include>
PUBLIC $<BUILD_INTERFACE:${QPL_SRC_DIR}/core-sw/src/include>
PUBLIC $<BUILD_INTERFACE:${QPL_SRC_DIR}/core-sw/src/compression/include>
PRIVATE $<TARGET_PROPERTY:isal,INTERFACE_INCLUDE_DIRECTORIES>)
# Set specific compiler options and/or definitions based on a platform
if (${PLATFORM_ID} MATCHES "avx512")
target_compile_definitions(qplcore_${PLATFORM_ID} PRIVATE PLATFORM=2)
target_compile_options(qplcore_${PLATFORM_ID} PRIVATE -march=skylake-avx512)
else() # Create default px library
target_compile_definitions(qplcore_${PLATFORM_ID} PRIVATE PLATFORM=0)
endif()
target_link_libraries(qplcore_${PLATFORM_ID} isal)
endforeach()
#
# Create dispatcher between platforms and auto-generated wrappers
#
file(GLOB SW_DISPATCHER_SOURCES ${QPL_SRC_DIR}/core-sw/dispatcher/*.cpp)
add_library(qplcore_sw_dispatcher OBJECT ${SW_DISPATCHER_SOURCES})
set_property(GLOBAL APPEND PROPERTY QPL_LIB_DEPS
$<TARGET_OBJECTS:qplcore_sw_dispatcher>)
target_include_directories(qplcore_sw_dispatcher
PUBLIC $<BUILD_INTERFACE:${QPL_SRC_DIR}/core-sw/dispatcher>)
# Generate kernel wrappers
generate_unpack_kernel_arrays(${QPL_BINARY_DIR} "${PLATFORMS_LIST}")
foreach(PLATFORM_ID IN LISTS PLATFORMS_LIST)
file(GLOB GENERATED_${PLATFORM_ID}_TABLES_SRC ${QPL_BINARY_DIR}/generated/${PLATFORM_ID}_*.cpp)
target_sources(qplcore_sw_dispatcher PRIVATE ${GENERATED_${PLATFORM_ID}_TABLES_SRC})
# Set specific compiler options and/or definitions based on a platform
if (${PLATFORM_ID} MATCHES "avx512")
set_source_files_properties(${GENERATED_${PLATFORM_ID}_TABLES_SRC} PROPERTIES COMPILE_DEFINITIONS PLATFORM=2)
else()
set_source_files_properties(${GENERATED_${PLATFORM_ID}_TABLES_SRC} PROPERTIES COMPILE_DEFINITIONS PLATFORM=0)
endif()
target_include_directories(qplcore_sw_dispatcher
PUBLIC $<TARGET_PROPERTY:qplcore_${PLATFORM_ID},INTERFACE_INCLUDE_DIRECTORIES>)
endforeach()
set_target_properties(qplcore_sw_dispatcher PROPERTIES CXX_STANDARD 17)
# w/a for build compatibility with ISAL codebase
target_compile_definitions(qplcore_sw_dispatcher PUBLIC -DQPL_LIB)
target_compile_options(qplcore_sw_dispatcher
PRIVATE ${QPL_LINUX_TOOLCHAIN_CPP_EMBEDDED_FLAGS})
# [SUBDIR]core-iaa
file(GLOB HW_PATH_SRC ${QPL_SRC_DIR}/core-iaa/sources/aecs/*.c
${QPL_SRC_DIR}/core-iaa/sources/driver_loader/*.c
${QPL_SRC_DIR}/core-iaa/sources/descriptors/*.c
${QPL_SRC_DIR}/core-iaa/sources/*.c)
# Create library
add_library(core_iaa OBJECT ${HW_PATH_SRC})
set_property(GLOBAL APPEND PROPERTY QPL_LIB_DEPS
$<TARGET_OBJECTS:core_iaa>)
target_include_directories(core_iaa
PRIVATE ${UUID_DIR}
PUBLIC $<BUILD_INTERFACE:${QPL_SRC_DIR}/core-iaa/include>
PUBLIC $<BUILD_INTERFACE:${QPL_SRC_DIR}/core-iaa/sources/include>
PRIVATE $<BUILD_INTERFACE:${QPL_PROJECT_DIR}/include> # status.h in own_checkers.h
PRIVATE $<TARGET_PROPERTY:qpl_c_api,INTERFACE_INCLUDE_DIRECTORIES> # for own_checkers.h
PRIVATE $<TARGET_PROPERTY:qplcore_sw_dispatcher,INTERFACE_INCLUDE_DIRECTORIES>)
target_compile_features(core_iaa PRIVATE c_std_11)
target_compile_definitions(core_iaa PRIVATE QPL_BADARG_CHECK
PRIVATE $<$<BOOL:${LOG_HW_INIT}>:LOG_HW_INIT>
PRIVATE $<$<BOOL:${DYNAMIC_LOADING_LIBACCEL_CONFIG}>:DYNAMIC_LOADING_LIBACCEL_CONFIG>)
# [SUBDIR]middle-layer
file(GLOB MIDDLE_LAYER_SRC
${QPL_SRC_DIR}/middle-layer/accelerator/*.cpp
${QPL_SRC_DIR}/middle-layer/analytics/*.cpp
${QPL_SRC_DIR}/middle-layer/common/*.cpp
${QPL_SRC_DIR}/middle-layer/compression/*.cpp
${QPL_SRC_DIR}/middle-layer/compression/*/*.cpp
${QPL_SRC_DIR}/middle-layer/compression/*/*/*.cpp
${QPL_SRC_DIR}/middle-layer/dispatcher/*.cpp
${QPL_SRC_DIR}/middle-layer/other/*.cpp
${QPL_SRC_DIR}/middle-layer/util/*.cpp)
add_library(middle_layer_lib OBJECT
${MIDDLE_LAYER_SRC})
set_property(GLOBAL APPEND PROPERTY QPL_LIB_DEPS
$<TARGET_OBJECTS:middle_layer_lib>)
target_compile_options(middle_layer_lib
PRIVATE $<$<C_COMPILER_ID:GNU,Clang>:$<$<CONFIG:Release>:-O3;-U_FORTIFY_SOURCE;-D_FORTIFY_SOURCE=2>>
PRIVATE ${QPL_LINUX_TOOLCHAIN_CPP_EMBEDDED_FLAGS})
target_compile_definitions(middle_layer_lib
PUBLIC QPL_VERSION="${QPL_VERSION}"
PUBLIC $<$<BOOL:${LOG_HW_INIT}>:LOG_HW_INIT>
PUBLIC $<$<BOOL:${EFFICIENT_WAIT}>:QPL_EFFICIENT_WAIT>
PUBLIC QPL_BADARG_CHECK
PUBLIC $<$<BOOL:${DYNAMIC_LOADING_LIBACCEL_CONFIG}>:DYNAMIC_LOADING_LIBACCEL_CONFIG>)
set_target_properties(middle_layer_lib PROPERTIES CXX_STANDARD 17)
target_include_directories(middle_layer_lib
PRIVATE ${UUID_DIR}
PUBLIC $<BUILD_INTERFACE:${QPL_SRC_DIR}/middle-layer>
PUBLIC $<TARGET_PROPERTY:_qpl,INTERFACE_INCLUDE_DIRECTORIES>
PRIVATE $<TARGET_PROPERTY:qpl_c_api,INTERFACE_INCLUDE_DIRECTORIES>
PUBLIC $<TARGET_PROPERTY:qplcore_sw_dispatcher,INTERFACE_INCLUDE_DIRECTORIES>
PUBLIC $<TARGET_PROPERTY:isal,INTERFACE_INCLUDE_DIRECTORIES>
PUBLIC $<TARGET_PROPERTY:core_iaa,INTERFACE_INCLUDE_DIRECTORIES>)
target_compile_definitions(middle_layer_lib PUBLIC -DQPL_LIB)
# [SUBDIR]c_api
file(GLOB QPL_C_API_SRC
${QPL_SRC_DIR}/c_api/compression_operations/*.c
${QPL_SRC_DIR}/c_api/compression_operations/*.cpp
${QPL_SRC_DIR}/c_api/filter_operations/*.cpp
${QPL_SRC_DIR}/c_api/legacy_hw_path/*.c
${QPL_SRC_DIR}/c_api/legacy_hw_path/*.cpp
${QPL_SRC_DIR}/c_api/other_operations/*.cpp
${QPL_SRC_DIR}/c_api/serialization/*.cpp
${QPL_SRC_DIR}/c_api/*.cpp)
add_library(qpl_c_api OBJECT ${QPL_C_API_SRC})
target_include_directories(qpl_c_api
PUBLIC $<BUILD_INTERFACE:${QPL_SRC_DIR}/c_api/>
PUBLIC $<BUILD_INTERFACE:${QPL_SRC_DIR}/include/> $<INSTALL_INTERFACE:include>
PRIVATE $<TARGET_PROPERTY:middle_layer_lib,INTERFACE_INCLUDE_DIRECTORIES>)
set_target_properties(qpl_c_api PROPERTIES
$<$<C_COMPILER_ID:GNU,Clang>:C_STANDARD 17
CXX_STANDARD 17)
target_compile_options(qpl_c_api
PRIVATE $<$<C_COMPILER_ID:GNU,Clang>:$<$<CONFIG:Release>:-O3;-U_FORTIFY_SOURCE;-D_FORTIFY_SOURCE=2>>
PRIVATE $<$<COMPILE_LANG_AND_ID:CXX,GNU,Clang>:${QPL_LINUX_TOOLCHAIN_CPP_EMBEDDED_FLAGS}>)
target_compile_definitions(qpl_c_api
PUBLIC -DQPL_BADARG_CHECK # own_checkers.h
PUBLIC -DQPL_LIB # needed for middle_layer_lib
PUBLIC $<$<BOOL:${LOG_HW_INIT}>:LOG_HW_INIT>) # needed for middle_layer_lib
set_property(GLOBAL APPEND PROPERTY QPL_LIB_DEPS
$<TARGET_OBJECTS:qpl_c_api>)
# Final _qpl target
get_property(LIB_DEPS GLOBAL PROPERTY QPL_LIB_DEPS)
add_library(_qpl STATIC ${LIB_DEPS})
target_include_directories(_qpl
PUBLIC $<BUILD_INTERFACE:${QPL_PROJECT_DIR}/include/> $<INSTALL_INTERFACE:include>)
target_link_libraries(_qpl
PRIVATE ch_contrib::accel-config)
target_include_directories(_qpl SYSTEM BEFORE
PUBLIC "${QPL_PROJECT_DIR}/include"
PUBLIC ${UUID_DIR})
add_library (ch_contrib::qpl ALIAS _qpl)

View File

@ -1,4 +0,0 @@
#ifndef _QPL_UUID_UUID_H
#define _QPL_UUID_UUID_H
typedef unsigned char uuid_t[16];
#endif /* _QPL_UUID_UUID_H */

2
contrib/usearch vendored

@ -1 +1 @@
Subproject commit 1706420acafbd83d852c512dcf343af0a4059e48
Subproject commit 7efe8b710c9831bfe06573b1df0fad001b04a2b5

View File

@ -6,12 +6,63 @@ target_include_directories(_usearch SYSTEM INTERFACE ${USEARCH_PROJECT_DIR}/incl
target_link_libraries(_usearch INTERFACE _fp16)
target_compile_definitions(_usearch INTERFACE USEARCH_USE_FP16LIB)
# target_compile_definitions(_usearch INTERFACE USEARCH_USE_SIMSIMD)
# ^^ simsimd is not enabled at the moment. Reasons:
# - Vectorization is important for raw scans but not so much for HNSW. We use usearch only for HNSW.
# - Simsimd does compile-time dispatch (choice of SIMD kernels determined by capabilities of the build machine) or dynamic dispatch (SIMD
# kernels chosen at runtime based on cpuid instruction). Since current builds are limited to SSE 4.2 (x86) and NEON (ARM), the speedup of
# the former would be moderate compared to AVX-512 / SVE. The latter is at the moment too fragile with respect to portability across x86
# and ARM machines ... certain conbinations of quantizations / distance functions / SIMD instructions are not implemented at the moment.
# Only x86 for now. On ARM, the linker goes down in flames. To make SimSIMD compile, I had to remove a macro checks in SimSIMD
# for AVX512 (x86, worked nicely) and __ARM_BF16_FORMAT_ALTERNATIVE. It is probably because of that.
if (ARCH_AMD64)
target_link_libraries(_usearch INTERFACE _simsimd)
target_compile_definitions(_usearch INTERFACE USEARCH_USE_SIMSIMD)
target_compile_definitions(_usearch INTERFACE USEARCH_CAN_COMPILE_FLOAT16)
target_compile_definitions(_usearch INTERFACE USEARCH_CAN_COMPILE_BF16)
endif ()
add_library(ch_contrib::usearch ALIAS _usearch)
# Cf. https://github.com/llvm/llvm-project/issues/107810 (though it is not 100% the same stack)
#
# LLVM ERROR: Cannot select: 0x7996e7a73150: f32,ch = load<(load (s16) from %ir.22, !tbaa !54231), anyext from bf16> 0x79961cb737c0, 0x7996e7a1a500, undef:i64, ./contrib/SimSIMD/include/simsimd/dot.h:215:1
# 0x7996e7a1a500: i64 = add 0x79961e770d00, Constant:i64<-16>, ./contrib/SimSIMD/include/simsimd/dot.h:215:1
# 0x79961e770d00: i64,ch = CopyFromReg 0x79961cb737c0, Register:i64 %4, ./contrib/SimSIMD/include/simsimd/dot.h:215:1
# 0x7996e7a1ae10: i64 = Register %4
# 0x7996e7a1b5f0: i64 = Constant<-16>
# 0x7996e7a1a730: i64 = undef
# In function: _ZL23simsimd_dot_bf16_serialPKu6__bf16S0_yPd
# PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
# Stack dump:
# 0. Running pass 'Function Pass Manager' on module 'src/libdbms.a(MergeTreeIndexVectorSimilarity.cpp.o at 2312737440)'.
# 1. Running pass 'AArch64 Instruction Selection' on function '@_ZL23simsimd_dot_bf16_serialPKu6__bf16S0_yPd'
# #0 0x00007999e83a63bf llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xda63bf)
# #1 0x00007999e83a44f9 llvm::sys::RunSignalHandlers() (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xda44f9)
# #2 0x00007999e83a6b00 (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xda6b00)
# #3 0x00007999e6e45320 (/lib/x86_64-linux-gnu/libc.so.6+0x45320)
# #4 0x00007999e6e9eb1c pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x9eb1c)
# #5 0x00007999e6e4526e raise (/lib/x86_64-linux-gnu/libc.so.6+0x4526e)
# #6 0x00007999e6e288ff abort (/lib/x86_64-linux-gnu/libc.so.6+0x288ff)
# #7 0x00007999e82fe0c2 llvm::report_fatal_error(llvm::Twine const&, bool) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xcfe0c2)
# #8 0x00007999e8c2f8e3 (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x162f8e3)
# #9 0x00007999e8c2ed76 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x162ed76)
# #10 0x00007999ea1adbcb (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x2badbcb)
# #11 0x00007999e8c2611f llvm::SelectionDAGISel::DoInstructionSelection() (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x162611f)
# #12 0x00007999e8c25790 llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x1625790)
# #13 0x00007999e8c248de llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x16248de)
# #14 0x00007999e8c22934 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x1622934)
# #15 0x00007999e87826b9 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x11826b9)
# #16 0x00007999e84f7772 llvm::FPPassManager::runOnFunction(llvm::Function&) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xef7772)
# #17 0x00007999e84fd2f4 llvm::FPPassManager::runOnModule(llvm::Module&) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xefd2f4)
# #18 0x00007999e84f7e9f llvm::legacy::PassManagerImpl::run(llvm::Module&) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xef7e9f)
# #19 0x00007999e99f7d61 (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x23f7d61)
# #20 0x00007999e99f8c91 (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x23f8c91)
# #21 0x00007999e99f8b10 llvm::lto::thinBackend(llvm::lto::Config const&, unsigned int, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, llvm::Module&, llvm::ModuleSummaryIndex const&, llvm::DenseMap<llvm::StringRef, std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long>>, llvm::DenseMapInfo<llvm::StringRef, void
# >, llvm::detail::DenseMapPair<llvm::StringRef, std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long>>>> const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::S
# tringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>*, std::vector<unsigned char, std::allocator<unsigned char>> const&) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x23f8b10)
# #22 0x00007999e99f248d (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x23f248d)
# #23 0x00007999e99f1cd6 (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x23f1cd6)
# #24 0x00007999e82c9beb (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xcc9beb)
# #25 0x00007999e834ebe3 llvm::ThreadPool::processTasks(llvm::ThreadPoolTaskGroup*) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xd4ebe3)
# #26 0x00007999e834f704 (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xd4f704)
# #27 0x00007999e6e9ca94 (/lib/x86_64-linux-gnu/libc.so.6+0x9ca94)
# #28 0x00007999e6f29c3c (/lib/x86_64-linux-gnu/libc.so.6+0x129c3c)
# clang++-18: error: unable to execute command: Aborted (core dumped)
# clang++-18: error: linker command failed due to signal (use -v to see invocation)
# ^[[A^Cninja: build stopped: interrupted by user.

View File

@ -1,7 +1,7 @@
# The Dockerfile.ubuntu exists for the tests/ci/docker_server.py script
# If the image is built from Dockerfile.alpine, then the `-alpine` suffix is added automatically,
# so the only purpose of Dockerfile.ubuntu is to push `latest`, `head` and so on w/o suffixes
FROM ubuntu:20.04 AS glibc-donor
FROM ubuntu:22.04 AS glibc-donor
ARG TARGETARCH
RUN arch=${TARGETARCH:-amd64} \
@ -9,7 +9,11 @@ RUN arch=${TARGETARCH:-amd64} \
amd64) rarch=x86_64 ;; \
arm64) rarch=aarch64 ;; \
esac \
&& ln -s "${rarch}-linux-gnu" /lib/linux-gnu
&& ln -s "${rarch}-linux-gnu" /lib/linux-gnu \
&& case $arch in \
amd64) ln /lib/linux-gnu/ld-linux-x86-64.so.2 /lib/linux-gnu/ld-2.35.so ;; \
arm64) ln /lib/linux-gnu/ld-linux-aarch64.so.1 /lib/linux-gnu/ld-2.35.so ;; \
esac
FROM alpine
@ -20,7 +24,7 @@ ENV LANG=en_US.UTF-8 \
TZ=UTC \
CLICKHOUSE_CONFIG=/etc/clickhouse-server/config.xml
COPY --from=glibc-donor /lib/linux-gnu/libc.so.6 /lib/linux-gnu/libdl.so.2 /lib/linux-gnu/libm.so.6 /lib/linux-gnu/libpthread.so.0 /lib/linux-gnu/librt.so.1 /lib/linux-gnu/libnss_dns.so.2 /lib/linux-gnu/libnss_files.so.2 /lib/linux-gnu/libresolv.so.2 /lib/linux-gnu/ld-2.31.so /lib/
COPY --from=glibc-donor /lib/linux-gnu/libc.so.6 /lib/linux-gnu/libdl.so.2 /lib/linux-gnu/libm.so.6 /lib/linux-gnu/libpthread.so.0 /lib/linux-gnu/librt.so.1 /lib/linux-gnu/libnss_dns.so.2 /lib/linux-gnu/libnss_files.so.2 /lib/linux-gnu/libresolv.so.2 /lib/linux-gnu/ld-2.35.so /lib/
COPY --from=glibc-donor /etc/nsswitch.conf /etc/
COPY entrypoint.sh /entrypoint.sh
@ -34,7 +38,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="24.9.2.42"
ARG VERSION="24.10.1.2812"
ARG PACKAGES="clickhouse-keeper"
ARG DIRECT_DOWNLOAD_URLS=""

View File

@ -1,21 +1,31 @@
#!/bin/bash
set +x
set -eo pipefail
shopt -s nullglob
DO_CHOWN=1
if [ "${CLICKHOUSE_DO_NOT_CHOWN:-0}" = "1" ]; then
if [[ "${CLICKHOUSE_RUN_AS_ROOT:=0}" = "1" || "${CLICKHOUSE_DO_NOT_CHOWN:-0}" = "1" ]]; then
DO_CHOWN=0
fi
CLICKHOUSE_UID="${CLICKHOUSE_UID:-"$(id -u clickhouse)"}"
CLICKHOUSE_GID="${CLICKHOUSE_GID:-"$(id -g clickhouse)"}"
# CLICKHOUSE_UID and CLICKHOUSE_GID are kept for backward compatibility, but deprecated
# One must use either "docker run --user" or CLICKHOUSE_RUN_AS_ROOT=1 to run the process as
# FIXME: Remove ALL CLICKHOUSE_UID CLICKHOUSE_GID before 25.3
if [[ "${CLICKHOUSE_UID:-}" || "${CLICKHOUSE_GID:-}" ]]; then
echo 'WARNING: Support for CLICKHOUSE_UID/CLICKHOUSE_GID will be removed in a couple of releases.' >&2
echo 'WARNING: Either use a proper "docker run --user=xxx:xxxx" argument instead of CLICKHOUSE_UID/CLICKHOUSE_GID' >&2
echo 'WARNING: or set "CLICKHOUSE_RUN_AS_ROOT=1" ENV to run the clickhouse-server as root:root' >&2
fi
# support --user
if [ "$(id -u)" = "0" ]; then
USER=$CLICKHOUSE_UID
GROUP=$CLICKHOUSE_GID
# support `docker run --user=xxx:xxxx`
if [[ "$(id -u)" = "0" ]]; then
if [[ "$CLICKHOUSE_RUN_AS_ROOT" = 1 ]]; then
USER=0
GROUP=0
else
USER="${CLICKHOUSE_UID:-"$(id -u clickhouse)"}"
GROUP="${CLICKHOUSE_GID:-"$(id -g clickhouse)"}"
fi
if command -v gosu &> /dev/null; then
gosu="gosu $USER:$GROUP"
elif command -v su-exec &> /dev/null; then
@ -82,11 +92,11 @@ if [[ $# -lt 1 ]] || [[ "$1" == "--"* ]]; then
# There is a config file. It is already tested with gosu (if it is readably by keeper user)
if [ -f "$KEEPER_CONFIG" ]; then
exec $gosu /usr/bin/clickhouse-keeper --config-file="$KEEPER_CONFIG" "$@"
exec $gosu clickhouse-keeper --config-file="$KEEPER_CONFIG" "$@"
fi
# There is no config file. Will use embedded one
exec $gosu /usr/bin/clickhouse-keeper --log-file="$LOG_PATH" --errorlog-file="$ERROR_LOG_PATH" "$@"
exec $gosu clickhouse-keeper --log-file="$LOG_PATH" --errorlog-file="$ERROR_LOG_PATH" "$@"
fi
# Otherwise, we assume the user want to run his own process, for example a `bash` shell to explore this image

View File

@ -35,7 +35,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="24.9.2.42"
ARG VERSION="24.10.1.2812"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
ARG DIRECT_DOWNLOAD_URLS=""

View File

@ -1,4 +1,4 @@
FROM ubuntu:20.04
FROM ubuntu:22.04
# see https://github.com/moby/moby/issues/4032#issuecomment-192327844
# It could be removed after we move on a version 23:04+
@ -28,7 +28,7 @@ RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb ${REPO_CHANNEL} main"
ARG VERSION="24.9.2.42"
ARG VERSION="24.10.1.2812"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
#docker-official-library:off
@ -88,10 +88,10 @@ RUN if [ -n "${single_binary_location_url}" ]; then \
#docker-official-library:on
# A fallback to installation from ClickHouse repository
RUN if ! clickhouse local -q "SELECT ''" > /dev/null 2>&1; then \
apt-get update \
# It works unless the clickhouse binary already exists
RUN clickhouse local -q 'SELECT 1' >/dev/null 2>&1 && exit 0 || : \
; apt-get update \
&& apt-get install --yes --no-install-recommends \
apt-transport-https \
dirmngr \
gnupg2 \
&& mkdir -p /etc/apt/sources.list.d \
@ -108,14 +108,12 @@ RUN if ! clickhouse local -q "SELECT ''" > /dev/null 2>&1; then \
&& for package in ${PACKAGES}; do \
packages="${packages} ${package}=${VERSION}" \
; done \
&& apt-get install --allow-unauthenticated --yes --no-install-recommends ${packages} || exit 1 \
&& apt-get install --yes --no-install-recommends ${packages} || exit 1 \
&& rm -rf \
/var/lib/apt/lists/* \
/var/cache/debconf \
/tmp/* \
&& apt-get autoremove --purge -yq libksba8 \
&& apt-get autoremove -yq \
; fi
&& apt-get autoremove --purge -yq dirmngr gnupg2
# post install
# we need to allow "others" access to clickhouse folder, because docker container
@ -126,8 +124,6 @@ RUN clickhouse-local -q 'SELECT * FROM system.build_options' \
RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
ENV TZ UTC
RUN mkdir /docker-entrypoint-initdb.d

View File

@ -1,3 +1,11 @@
<!---
The README.md is generated by README.sh from the following sources:
- README.src/content.md
- README.src/license.md
If you want to change it, edit these files
-->
# ClickHouse Server Docker Image
## What is ClickHouse?
@ -8,6 +16,7 @@ ClickHouse works 100-1000x faster than traditional database management systems,
For more information and documentation see https://clickhouse.com/.
<!-- This is not related to the docker official library, remove it before commit to https://github.com/docker-library/docs -->
## Versions
- The `latest` tag points to the latest release of the latest stable branch.
@ -16,10 +25,12 @@ For more information and documentation see https://clickhouse.com/.
- The tag `head` is built from the latest commit to the default branch.
- Each tag has optional `-alpine` suffix to reflect that it's built on top of `alpine`.
<!-- REMOVE UNTIL HERE -->
### Compatibility
- The amd64 image requires support for [SSE3 instructions](https://en.wikipedia.org/wiki/SSE3). Virtually all x86 CPUs after 2005 support SSE3.
- The arm64 image requires support for the [ARMv8.2-A architecture](https://en.wikipedia.org/wiki/AArch64#ARMv8.2-A) and additionally the Load-Acquire RCpc register. The register is optional in version ARMv8.2-A and mandatory in [ARMv8.3-A](https://en.wikipedia.org/wiki/AArch64#ARMv8.3-A). Supported in Graviton >=2, Azure and GCP instances. Examples for unsupported devices are Raspberry Pi 4 (ARMv8.0-A) and Jetson AGX Xavier/Orin (ARMv8.2-A).
- Since the Clickhouse 24.11 Ubuntu images started using `ubuntu:22.04` as its base image. It requires docker version >= `20.10.10` containing [patch](https://github.com/moby/moby/commit/977283509f75303bc6612665a04abf76ff1d2468). As a workaround you could use `docker run --security-opt seccomp=unconfined` instead, however that has security implications.
## How to use this image
@ -29,7 +40,7 @@ For more information and documentation see https://clickhouse.com/.
docker run -d --name some-clickhouse-server --ulimit nofile=262144:262144 clickhouse/clickhouse-server
```
By default, ClickHouse will be accessible only via the Docker network. See the [networking section below](#networking).
By default, ClickHouse will be accessible only via the Docker network. See the **networking** section below.
By default, starting above server instance will be run as the `default` user without password.
@ -46,7 +57,7 @@ More information about the [ClickHouse client](https://clickhouse.com/docs/en/in
### connect to it using curl
```bash
echo "SELECT 'Hello, ClickHouse!'" | docker run -i --rm --link some-clickhouse-server:clickhouse-server curlimages/curl 'http://clickhouse-server:8123/?query=' -s --data-binary @-
echo "SELECT 'Hello, ClickHouse!'" | docker run -i --rm --link some-clickhouse-server:clickhouse-server buildpack-deps:curl curl 'http://clickhouse-server:8123/?query=' -s --data-binary @-
```
More information about the [ClickHouse HTTP Interface](https://clickhouse.com/docs/en/interfaces/http/).
@ -69,7 +80,7 @@ echo 'SELECT version()' | curl 'http://localhost:18123/' --data-binary @-
`22.6.3.35`
or by allowing the container to use [host ports directly](https://docs.docker.com/network/host/) using `--network=host` (also allows achieving better network performance):
Or by allowing the container to use [host ports directly](https://docs.docker.com/network/host/) using `--network=host` (also allows achieving better network performance):
```bash
docker run -d --network=host --name some-clickhouse-server --ulimit nofile=262144:262144 clickhouse/clickhouse-server
@ -87,8 +98,8 @@ Typically you may want to mount the following folders inside your container to a
```bash
docker run -d \
-v $(realpath ./ch_data):/var/lib/clickhouse/ \
-v $(realpath ./ch_logs):/var/log/clickhouse-server/ \
-v "$PWD/ch_data:/var/lib/clickhouse/" \
-v "$PWD/ch_logs:/var/log/clickhouse-server/" \
--name some-clickhouse-server --ulimit nofile=262144:262144 clickhouse/clickhouse-server
```
@ -110,6 +121,8 @@ docker run -d \
--name some-clickhouse-server --ulimit nofile=262144:262144 clickhouse/clickhouse-server
```
Read more in [knowledge base](https://clickhouse.com/docs/knowledgebase/configure_cap_ipc_lock_and_cap_sys_nice_in_docker).
## Configuration
The container exposes port 8123 for the [HTTP interface](https://clickhouse.com/docs/en/interfaces/http_interface/) and port 9000 for the [native client](https://clickhouse.com/docs/en/interfaces/tcp/).
@ -125,8 +138,8 @@ docker run -d --name some-clickhouse-server --ulimit nofile=262144:262144 -v /pa
### Start server as custom user
```bash
# $(pwd)/data/clickhouse should exist and be owned by current user
docker run --rm --user ${UID}:${GID} --name some-clickhouse-server --ulimit nofile=262144:262144 -v "$(pwd)/logs/clickhouse:/var/log/clickhouse-server" -v "$(pwd)/data/clickhouse:/var/lib/clickhouse" clickhouse/clickhouse-server
# $PWD/data/clickhouse should exist and be owned by current user
docker run --rm --user "${UID}:${GID}" --name some-clickhouse-server --ulimit nofile=262144:262144 -v "$PWD/logs/clickhouse:/var/log/clickhouse-server" -v "$PWD/data/clickhouse:/var/lib/clickhouse" clickhouse/clickhouse-server
```
When you use the image with local directories mounted, you probably want to specify the user to maintain the proper file ownership. Use the `--user` argument and mount `/var/lib/clickhouse` and `/var/log/clickhouse-server` inside the container. Otherwise, the image will complain and not start.
@ -134,7 +147,7 @@ When you use the image with local directories mounted, you probably want to spec
### Start server from root (useful in case of enabled user namespace)
```bash
docker run --rm -e CLICKHOUSE_UID=0 -e CLICKHOUSE_GID=0 --name clickhouse-server-userns -v "$(pwd)/logs/clickhouse:/var/log/clickhouse-server" -v "$(pwd)/data/clickhouse:/var/lib/clickhouse" clickhouse/clickhouse-server
docker run --rm -e CLICKHOUSE_RUN_AS_ROOT=1 --name clickhouse-server-userns -v "$PWD/logs/clickhouse:/var/log/clickhouse-server" -v "$PWD/data/clickhouse:/var/lib/clickhouse" clickhouse/clickhouse-server
```
### How to create default database and user on starting

38
docker/server/README.sh Executable file
View File

@ -0,0 +1,38 @@
#!/usr/bin/env bash
set -ueo pipefail
# A script to generate README.sh close to as it done in https://github.com/docker-library/docs
WORKDIR=$(dirname "$0")
SCRIPT_NAME=$(basename "$0")
CONTENT=README.src/content.md
LICENSE=README.src/license.md
cd "$WORKDIR"
R=README.md
cat > "$R" <<EOD
<!---
The $R is generated by $SCRIPT_NAME from the following sources:
- $CONTENT
- $LICENSE
If you want to change it, edit these files
-->
EOD
cat "$CONTENT" >> "$R"
cat >> "$R" <<EOD
## License
$(cat $LICENSE)
EOD
# Remove %%LOGO%% from the file with one line below
sed -i '/^%%LOGO%%/,+1d' "$R"
# Replace each %%IMAGE%% with our `clickhouse/clickhouse-server`
sed -i '/%%IMAGE%%/s:%%IMAGE%%:clickhouse/clickhouse-server:g' $R

View File

@ -0,0 +1 @@
ClickHouse is the fastest and most resource efficient OSS database for real-time apps and analytics.

View File

@ -0,0 +1,170 @@
# ClickHouse Server Docker Image
## What is ClickHouse?
%%LOGO%%
ClickHouse is an open-source column-oriented DBMS (columnar database management system) for online analytical processing (OLAP) that allows users to generate analytical reports using SQL queries in real-time.
ClickHouse works 100-1000x faster than traditional database management systems, and processes hundreds of millions to over a billion rows and tens of gigabytes of data per server per second. With a widespread user base around the globe, the technology has received praise for its reliability, ease of use, and fault tolerance.
For more information and documentation see https://clickhouse.com/.
<!-- This is not related to the docker official library, remove it before commit to https://github.com/docker-library/docs -->
## Versions
- The `latest` tag points to the latest release of the latest stable branch.
- Branch tags like `22.2` point to the latest release of the corresponding branch.
- Full version tags like `22.2.3.5` point to the corresponding release.
- The tag `head` is built from the latest commit to the default branch.
- Each tag has optional `-alpine` suffix to reflect that it's built on top of `alpine`.
<!-- REMOVE UNTIL HERE -->
### Compatibility
- The amd64 image requires support for [SSE3 instructions](https://en.wikipedia.org/wiki/SSE3). Virtually all x86 CPUs after 2005 support SSE3.
- The arm64 image requires support for the [ARMv8.2-A architecture](https://en.wikipedia.org/wiki/AArch64#ARMv8.2-A) and additionally the Load-Acquire RCpc register. The register is optional in version ARMv8.2-A and mandatory in [ARMv8.3-A](https://en.wikipedia.org/wiki/AArch64#ARMv8.3-A). Supported in Graviton >=2, Azure and GCP instances. Examples for unsupported devices are Raspberry Pi 4 (ARMv8.0-A) and Jetson AGX Xavier/Orin (ARMv8.2-A).
- Since the Clickhouse 24.11 Ubuntu images started using `ubuntu:22.04` as its base image. It requires docker version >= `20.10.10` containing [patch](https://github.com/moby/moby/commit/977283509f75303bc6612665a04abf76ff1d2468). As a workaround you could use `docker run --security-opt seccomp=unconfined` instead, however that has security implications.
## How to use this image
### start server instance
```bash
docker run -d --name some-clickhouse-server --ulimit nofile=262144:262144 %%IMAGE%%
```
By default, ClickHouse will be accessible only via the Docker network. See the **networking** section below.
By default, starting above server instance will be run as the `default` user without password.
### connect to it from a native client
```bash
docker run -it --rm --link some-clickhouse-server:clickhouse-server --entrypoint clickhouse-client %%IMAGE%% --host clickhouse-server
# OR
docker exec -it some-clickhouse-server clickhouse-client
```
More information about the [ClickHouse client](https://clickhouse.com/docs/en/interfaces/cli/).
### connect to it using curl
```bash
echo "SELECT 'Hello, ClickHouse!'" | docker run -i --rm --link some-clickhouse-server:clickhouse-server buildpack-deps:curl curl 'http://clickhouse-server:8123/?query=' -s --data-binary @-
```
More information about the [ClickHouse HTTP Interface](https://clickhouse.com/docs/en/interfaces/http/).
### stopping / removing the container
```bash
docker stop some-clickhouse-server
docker rm some-clickhouse-server
```
### networking
You can expose your ClickHouse running in docker by [mapping a particular port](https://docs.docker.com/config/containers/container-networking/) from inside the container using host ports:
```bash
docker run -d -p 18123:8123 -p19000:9000 --name some-clickhouse-server --ulimit nofile=262144:262144 %%IMAGE%%
echo 'SELECT version()' | curl 'http://localhost:18123/' --data-binary @-
```
`22.6.3.35`
Or by allowing the container to use [host ports directly](https://docs.docker.com/network/host/) using `--network=host` (also allows achieving better network performance):
```bash
docker run -d --network=host --name some-clickhouse-server --ulimit nofile=262144:262144 %%IMAGE%%
echo 'SELECT version()' | curl 'http://localhost:8123/' --data-binary @-
```
`22.6.3.35`
### Volumes
Typically you may want to mount the following folders inside your container to achieve persistency:
- `/var/lib/clickhouse/` - main folder where ClickHouse stores the data
- `/var/log/clickhouse-server/` - logs
```bash
docker run -d \
-v "$PWD/ch_data:/var/lib/clickhouse/" \
-v "$PWD/ch_logs:/var/log/clickhouse-server/" \
--name some-clickhouse-server --ulimit nofile=262144:262144 %%IMAGE%%
```
You may also want to mount:
- `/etc/clickhouse-server/config.d/*.xml` - files with server configuration adjustments
- `/etc/clickhouse-server/users.d/*.xml` - files with user settings adjustments
- `/docker-entrypoint-initdb.d/` - folder with database initialization scripts (see below).
### Linux capabilities
ClickHouse has some advanced functionality, which requires enabling several [Linux capabilities](https://man7.org/linux/man-pages/man7/capabilities.7.html).
They are optional and can be enabled using the following [docker command-line arguments](https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities):
```bash
docker run -d \
--cap-add=SYS_NICE --cap-add=NET_ADMIN --cap-add=IPC_LOCK \
--name some-clickhouse-server --ulimit nofile=262144:262144 %%IMAGE%%
```
Read more in [knowledge base](https://clickhouse.com/docs/knowledgebase/configure_cap_ipc_lock_and_cap_sys_nice_in_docker).
## Configuration
The container exposes port 8123 for the [HTTP interface](https://clickhouse.com/docs/en/interfaces/http_interface/) and port 9000 for the [native client](https://clickhouse.com/docs/en/interfaces/tcp/).
ClickHouse configuration is represented with a file "config.xml" ([documentation](https://clickhouse.com/docs/en/operations/configuration_files/))
### Start server instance with custom configuration
```bash
docker run -d --name some-clickhouse-server --ulimit nofile=262144:262144 -v /path/to/your/config.xml:/etc/clickhouse-server/config.xml %%IMAGE%%
```
### Start server as custom user
```bash
# $PWD/data/clickhouse should exist and be owned by current user
docker run --rm --user "${UID}:${GID}" --name some-clickhouse-server --ulimit nofile=262144:262144 -v "$PWD/logs/clickhouse:/var/log/clickhouse-server" -v "$PWD/data/clickhouse:/var/lib/clickhouse" %%IMAGE%%
```
When you use the image with local directories mounted, you probably want to specify the user to maintain the proper file ownership. Use the `--user` argument and mount `/var/lib/clickhouse` and `/var/log/clickhouse-server` inside the container. Otherwise, the image will complain and not start.
### Start server from root (useful in case of enabled user namespace)
```bash
docker run --rm -e CLICKHOUSE_RUN_AS_ROOT=1 --name clickhouse-server-userns -v "$PWD/logs/clickhouse:/var/log/clickhouse-server" -v "$PWD/data/clickhouse:/var/lib/clickhouse" %%IMAGE%%
```
### How to create default database and user on starting
Sometimes you may want to create a user (user named `default` is used by default) and database on a container start. You can do it using environment variables `CLICKHOUSE_DB`, `CLICKHOUSE_USER`, `CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT` and `CLICKHOUSE_PASSWORD`:
```bash
docker run --rm -e CLICKHOUSE_DB=my_database -e CLICKHOUSE_USER=username -e CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1 -e CLICKHOUSE_PASSWORD=password -p 9000:9000/tcp %%IMAGE%%
```
## How to extend this image
To perform additional initialization in an image derived from this one, add one or more `*.sql`, `*.sql.gz`, or `*.sh` scripts under `/docker-entrypoint-initdb.d`. After the entrypoint calls `initdb`, it will run any `*.sql` files, run any executable `*.sh` scripts, and source any non-executable `*.sh` scripts found in that directory to do further initialization before starting the service.
Also, you can provide environment variables `CLICKHOUSE_USER` & `CLICKHOUSE_PASSWORD` that will be used for clickhouse-client during initialization.
For example, to add an additional user and database, add the following to `/docker-entrypoint-initdb.d/init-db.sh`:
```bash
#!/bin/bash
set -e
clickhouse client -n <<-EOSQL
CREATE DATABASE docker;
CREATE TABLE docker.docker (x Int32) ENGINE = Log;
EOSQL
```

View File

@ -0,0 +1 @@
https://github.com/ClickHouse/ClickHouse

View File

@ -0,0 +1 @@
View [license information](https://github.com/ClickHouse/ClickHouse/blob/master/LICENSE) for the software contained in this image.

View File

@ -0,0 +1,43 @@
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 616 616">
<defs>
<style>
.cls-1 {
clip-path: url(#clippath);
}
.cls-2 {
fill: none;
}
.cls-2, .cls-3, .cls-4 {
stroke-width: 0px;
}
.cls-3 {
fill: #1e1e1e;
}
.cls-4 {
fill: #faff69;
}
</style>
<clipPath id="clippath">
<rect class="cls-2" x="83.23" y="71.73" width="472.55" height="472.55"/>
</clipPath>
</defs>
<g id="Layer_2" data-name="Layer 2">
<rect class="cls-4" width="616" height="616"/>
</g>
<g id="Layer_1" data-name="Layer 1">
<g class="cls-1">
<g>
<path class="cls-3" d="m120.14,113.3c0-2.57,2.09-4.66,4.66-4.66h34.98c2.57,0,4.66,2.09,4.66,4.66v389.38c0,2.57-2.09,4.66-4.66,4.66h-34.98c-2.57,0-4.66-2.09-4.66-4.66V113.3Z"/>
<path class="cls-3" d="m208.75,113.3c0-2.57,2.09-4.66,4.66-4.66h34.98c2.57,0,4.66,2.09,4.66,4.66v389.38c0,2.57-2.09,4.66-4.66,4.66h-34.98c-2.57,0-4.66-2.09-4.66-4.66V113.3Z"/>
<path class="cls-3" d="m297.35,113.3c0-2.57,2.09-4.66,4.66-4.66h34.98c2.57,0,4.66,2.09,4.66,4.66v389.38c0,2.57-2.09,4.66-4.66,4.66h-34.98c-2.57,0-4.66-2.09-4.66-4.66V113.3Z"/>
<path class="cls-3" d="m385.94,113.3c0-2.57,2.09-4.66,4.66-4.66h34.98c2.57,0,4.66,2.09,4.66,4.66v389.38c0,2.57-2.09,4.66-4.66,4.66h-34.98c-2.57,0-4.66-2.09-4.66-4.66V113.3Z"/>
<path class="cls-3" d="m474.56,268.36c0-2.57,2.09-4.66,4.66-4.66h34.98c2.57,0,4.65,2.09,4.65,4.66v79.28c0,2.57-2.09,4.66-4.65,4.66h-34.98c-2.57,0-4.66-2.09-4.66-4.66v-79.28Z"/>
</g>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 1.7 KiB

Some files were not shown because too many files have changed in this diff Show More