Added draft texts [#METR-20000].

This commit is contained in:
Alexey Milovidov 2016-03-14 00:31:01 +03:00
parent e4f23a53f1
commit c8aed08fdf
2 changed files with 236 additions and 0 deletions

151
doc/drafts/build.txt Normal file
View File

@ -0,0 +1,151 @@
# How to build ClickHouse
#
# Build should work on Linux Ubuntu 14.04 or newer.
# With appropriate changes, build should work on any other Linux distribution.
# Build is not intended to work on Mac OS X.
sudo apt-get install git cmake
# Install GCC 5.
# There are several ways to do it.
#
# 1. If you run on Ubuntu 15.10 or newer, just do
# sudo apt-get install g++-5
#
# 2. Install from PPA package.
sudo apt-get install software-properties-common
sudo apt-add-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-5 g++-5
export THREADS=$(grep -c ^processor /proc/cpuinfo)
# 3. Install GCC 5 from sources.
#
# Download gcc from https://gcc.gnu.org/mirrors.html
# Example:
# wget ftp://ftp.fu-berlin.de/unix/languages/gcc/releases/gcc-5.3.0/gcc-5.3.0.tar.bz2
# tar xf gcc-5.3.0.tar.bz2
# cd gcc-5.3.0
# ./contrib/download_prerequisites
# cd ..
# mkdir gcc-build
# cd gcc-build
# ../gcc-5.3.0/configure --enable-languages=c,c++
# make -j $THREADS
# sudo make install
# hash gcc g++
# gcc --version
# sudo ln -s /usr/local/bin/gcc /usr/local/bin/gcc-5
# sudo ln -s /usr/local/bin/g++ /usr/local/bin/g++-5
# sudo ln -s /usr/local/bin/cc /usr/local/bin/gcc-5
# sudo ln -s /usr/local/bin/c++ /usr/local/bin/g++-5
# /usr/local/bin/ should be in $PATH
#
# Note that these ways of installation differs.
# When installing from PPA, by default, "old C++ ABI" is used,
# and when installing from sources, "new C++ ABI" is used.
# When using different C++ ABI, you need to recompile all C++ libraries,
# otherwise libraries will not link.
# ClickHouse works with both old and new C++ ABI,
# but production releases is built with old C++ ABI.
export CC=gcc-5
export CXX=g++-5
# Install required libraries from packages
sudo apt-get install libicu-dev libglib2.0-dev libreadline-dev libzookeeper-mt-dev libmysqlclient-dev libssl-dev unixodbc-dev
# Install recent version of boost. Version 1.57 or newer will be Ok.
wget http://downloads.sourceforge.net/project/boost/boost/1.60.0/boost_1_60_0.tar.bz2
tar xf boost_1_60_0.tar.bz2
cd boost_1_60_0
./bootstrap.sh
./b2 --toolset=gcc-5 -j $THREADS
sudo ./b2 install --toolset=gcc-5 -j $THREADS
cd ..
# Install tcmalloc. Patch is important.
wget https://googledrive.com/host/0B6NtGsLhIcf7MWxMMF9JdTN3UVk/gperftools-2.4.tar.gz
tar -xf gperftools-2.4.tar.gz
cd gperftools-2.4
patch src/static_vars.cc <<END
103c103
< TCMallocGetenvSafe("TCMALLOC_AGGRESSIVE_DECOMMIT"), true);
---
> TCMallocGetenvSafe("TCMALLOC_AGGRESSIVE_DECOMMIT"), false);
END
./configure --enable-minimal
make -j $THREADS
sudo make install
cd ..
# Install mongoclient. This library is needed only for 'external dictionaries' with MongoDB source. This is rarely used but enabled by default.
sudo apt-get install scons
git clone -b legacy https://github.com/mongodb/mongo-cxx-driver.git
cd mongo-cxx-driver
sudo scons --c++11 --release --cc=$CC --cxx=$CXX --disable-warnings-as-errors -j $THREADS --prefix=/usr/local install
cd ..
# Checkout ClickHouse sources.
git clone git@███████████.yandex-team.ru:Metrika/ClickHouse.git # TODO Change path.
cd ClickHouse
# There are two variants of build.
# 1. Build release package.
# Install prerequisites to build debian packages.
sudo apt-get install devscripts dupload fakeroot debhelper
# Install recent version of clang. Clang is embedded into ClickHouse package and used at runtime.
cd ..
sudo apt-get install subversion
mkdir llvm
cd llvm
svn co http://llvm.org/svn/llvm-project/llvm/tags/RELEASE_380/final llvm
cd llvm/tools
svn co http://llvm.org/svn/llvm-project/cfe/tags/RELEASE_380/final clang
cd ..
cd projects/
svn co http://llvm.org/svn/llvm-project/compiler-rt/tags/RELEASE_380/final compiler-rt
cd ../..
mkdir build
cd build/
cmake -D CMAKE_BUILD_TYPE:STRING=Release ../llvm
make -j $THREADS
sudo make install
hash clang
# You may also build ClickHouse with clang for development purposes.
# For production releases, GCC is used.
# Run release script.
rm -f ../clickhouse*.deb
./release
# debsign and dupload will not work by default.
# It's Ok. You will find built packages in parent directory.
# ls -l ../clickhouse*.deb
# Note that usage of debian packages is not required.
# ClickHouse has no runtime dependencies except libc,
# so it could work on almost any Linux.
# Installing just built packages on development server.
sudo dpkg -i ../clickhouse*.deb
sudo service clickhouse-server start
# 2. Build to work with code.
#
# mkdir build
# cd build
# cmake ..
# make -j $THREADS
# cd ..

85
doc/drafts/site.txt Normal file
View File

@ -0,0 +1,85 @@
ClickHouse is free column-oriented DBMS for big data.
ClickHouse powers Yandex.Metrica - second largest web analytics system in the world.
In Yandex.Metrica, all incoming data is ingested into ClickHouse in realtime (about 20 billion events each day).
Currently, Yandex.Metrica has more than 13 trillion records in ClickHouse powered database.
It is used for fully customizable reports, that are generated on-the-fly, directly from non-aggregated data.
Yandex.Metrica allows customers to slice and dice data in every detail, even for huge traffic sites, with instant results.
ClickHouse is the only open-source system, that is capable of doing such kind of things.
Big Data
Linearly scalable
ClickHouse allows to add servers to cluster when necessary.
For example, in Yandex.Metrica, main cluster has grown from 60 to 394 servers in two years.
Servers are placed in six different geographically distributed datacenters.
ClickHouse is using maximum of available hardware to process queries as fast as possible.
We achieve peak performance of more than 2 terabytes per second for single query (data after decompression, only used columns).
ClickHouse scales well both vertically and horizontally.
We have installations with more than two trillion rows per single node, and another installations with 100 TB of storage per single node.
Efficient use of hardware
ClickHouse is space and time efficient. All data is stored compressed. Compression works surprisingly good, thanks to column store.
ClickHouse constantly maintains data locality for loaded data. It minimizes number of seeks for range queries, so ClickHouse works fine on cheap rotational drives.
ClickHouse is also CPU efficient,
ClickHouse is using IO throughput in
Fast
We proud of high performance of ClickHouse. Throughput of query processing per single server is usually from hundreds millions to more than billion rows per second and to more than tens of gigabytes per second. It's hard to believe that is possible to process data in such high rates. But you don't need to beleive, because ClickHouse is actually do that.
On our performance testing, ClickHouse works few times faster than best of available commercial column-oriented DBMS.
Fault tolerance
Feature rich
Simple and handy
Stable
Opens new possibilities
True column-oriented
Vectorized query execution
Data compression
Parallel and distributed query execution
Realtime data ingestion
On-disk data locality
Online query processing
Cross-datacenter replication
High availability
SQL support
Support for approximate query processing
Sketching data structures
Full support of IPv6
Features for web analytics
State-of-the-art algorithms
Clean documented code
Web and application analytics
Advertisement networks and RTB
Telecommunications analytics
E-commerce analytics
Analytics for information security
Monitoring and telemetry
Business intelligence
Analytics for Internet of Things