ClickHouse/src/Common/TLDListsHolder.h

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

81 lines
1.9 KiB
C++
Raw Normal View History

#pragma once
2021-10-02 07:13:14 +00:00
#include <base/defines.h>
#include <base/StringRef.h>
#include <Common/HashTable/StringHashMap.h>
#include <Common/Arena.h>
#include <Poco/Util/AbstractConfiguration.h>
#include <mutex>
#include <string>
#include <unordered_map>
namespace DB
{
enum TLDType
{
/// Does not exist marker
TLD_NONE,
/// For regular lines
TLD_REGULAR,
/// For asterisk (*)
TLD_ANY,
/// For exclamation mark (!)
TLD_EXCLUDE,
};
/// Custom TLD List
///
/// Unlike tldLookup (which uses gperf) this one uses plain StringHashMap.
class TLDList
{
public:
using Container = StringHashMap<TLDType>;
explicit TLDList(size_t size);
void insert(const String & host, TLDType type);
TLDType lookup(StringRef host) const;
size_t size() const { return tld_container.size(); }
private:
Container tld_container;
std::unique_ptr<Arena> memory_pool;
};
class TLDListsHolder
{
public:
using Map = std::unordered_map<std::string, TLDList>;
static TLDListsHolder & getInstance();
/// Parse "top_level_domains_lists" section,
/// And add each found dictionary.
void parseConfig(const std::string & top_level_domains_path, const Poco::Util::AbstractConfiguration & config);
/// Parse file and add it as a Set to the list of TLDs
/// - "//" -- comment,
/// - empty lines will be ignored.
///
/// Treats the following special symbols:
/// - "*"
/// - "!"
///
/// Format : https://github.com/publicsuffix/list/wiki/Format
/// Example: https://publicsuffix.org/list/public_suffix_list.dat
///
/// Return size of the list.
size_t parseAndAddTldList(const std::string & name, const std::string & path);
/// Throws TLD_LIST_NOT_FOUND if list does not exist
const TLDList & getTldList(const std::string & name);
protected:
TLDListsHolder();
std::mutex tld_lists_map_mutex;
Support for Clang Thread Safety Analysis (TSA) - TSA is a static analyzer build by Google which finds race conditions and deadlocks at compile time. - It works by associating a shared member variable with a synchronization primitive that protects it. The compiler can then check at each access if proper locking happened before. A good introduction are [0] and [1]. - TSA requires some help by the programmer via annotations. Luckily, LLVM's libcxx already has annotations for std::mutex, std::lock_guard, std::shared_mutex and std::scoped_lock. This commit enables them (--> contrib/libcxx-cmake/CMakeLists.txt). - Further, this commit adds convenience macros for the low-level annotations for use in ClickHouse (--> base/defines.h). For demonstration, they are leveraged in a few places. - As we compile with "-Wall -Wextra -Weverything", the required compiler flag "-Wthread-safety-analysis" was already enabled. Negative checks are an experimental feature of TSA and disabled (--> cmake/warnings.cmake). Compile times did not increase noticeably. - TSA is used in a few places with simple locking. I tried TSA also where locking is more complex. The problem was usually that it is unclear which data is protected by which lock :-(. But there was definitely some weird code where locking looked broken. So there is some potential to find bugs. *** Limitations of TSA besides the ones listed in [1]: - The programmer needs to know which lock protects which piece of shared data. This is not always easy for large classes. - Two synchronization primitives used in ClickHouse are not annotated in libcxx: (1) std::unique_lock: A releaseable lock handle often together with std::condition_variable, e.g. in solve producer-consumer problems. (2) std::recursive_mutex: A re-entrant mutex variant. Its usage can be considered a design flaw + typically it is slower than a standard mutex. In this commit, one std::recursive_mutex was converted to std::mutex and annotated with TSA. - For free-standing functions (e.g. helper functions) which are passed shared data members, it can be tricky to specify the associated lock. This is because the annotations use the normal C++ rules for symbol resolution. [0] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html [1] https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42958.pdf
2022-06-14 22:35:55 +00:00
Map tld_lists_map TSA_GUARDED_BY(tld_lists_map_mutex);
};
}