Merge pull request #55430 from arthurpassos/properly_split_remote_proxy_http_https

Properly split remote proxy http https
This commit is contained in:
Alexander Tokmakov 2023-10-20 21:15:34 +02:00 committed by GitHub
commit f899254e2c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
14 changed files with 291 additions and 188 deletions

View File

@ -2630,3 +2630,114 @@ Possible values:
- 1 — Enabled.
Default value: 0.
## proxy {#proxy}
Define proxy servers for HTTP and HTTPS requests, currently supported by S3 storage, S3 table functions, and URL functions.
There are three ways to define proxy servers: environment variables, proxy lists, and remote proxy resolvers.
### Environment variables
The `http_proxy` and `https_proxy` environment variables allow you to specify a
proxy server for a given protocol. If you have it set on your system, it should work seamlessly.
This is the simplest approach if a given protocol has
only one proxy server and that proxy server doesn't change.
### Proxy lists
This approach allows you to specify one or more
proxy servers for a protocol. If more than one proxy server is defined,
ClickHouse uses the different proxies on a round-robin basis, balancing the
load across the servers. This is the simplest approach if there is more than
one proxy server for a protocol and the list of proxy servers doesn't change.
### Configuration template
``` xml
<proxy>
<http>
<uri>http://proxy1</uri>
<uri>http://proxy2:3128</uri>
</http>
<https>
<uri>http://proxy1:3128</uri>
</https>
</proxy>
```
`<proxy>` fields
* `<http>` - A list of one or more HTTP proxies
* `<https>` - A list of one or more HTTPS proxies
`<http>` and `<https>` fields
* `<uri>` - The URI of the proxy
### Remote proxy resolvers
It's possible that the proxy servers change dynamically. In that
case, you can define the endpoint of a resolver. ClickHouse sends
an empty GET request to that endpoint, the remote resolver should return the proxy host.
ClickHouse will use it to form the proxy URI using the following template: `{proxy_scheme}://{proxy_host}:{proxy_port}`
### Configuration template
``` xml
<proxy>
<http>
<resolver>
<endpoint>http://resolver:8080/hostname</endpoint>
<proxy_scheme>http</proxy_scheme>
<proxy_port>80</proxy_port>
<proxy_cache_time>10</proxy_cache_time>
</resolver>
</http>
<https>
<resolver>
<endpoint>http://resolver:8080/hostname</endpoint>
<proxy_scheme>http</proxy_scheme>
<proxy_port>3128</proxy_port>
<proxy_cache_time>10</proxy_cache_time>
</resolver>
</https>
</proxy>
```
`<proxy>` fields
* `<http>` - A list of one or more resolvers*
* `<https>` - A list of one or more resolvers*
`<http>` and `<https>` fields
* `<resolver>` - The endpoint and other details for a resolver.
You can have multiple `<resolver>` elements, but only the first
`<resolver>` for a given protocol is used. Any other `<resolver>`
elements for that protocol are ignored. That means load balancing
(if needed) should be implemented by the remote resolver.
`<resolver>` fields
* `<endpoint>` - The URI of the proxy resolver
* `<proxy_scheme>` - The protocol of the final proxy URI. This can be either `http` or `https`.
* `<proxy_port>` - The port number of the proxy resolver
* `<proxy_cache_time>` - The time in seconds that values from the resolver
should be cached by ClickHouse. Setting this value to `0` causes ClickHouse
to contact the resolver for every HTTP or HTTPS request.
### Precedence
Proxy settings are determined in the following order:
1. Remote proxy resolvers
2. Proxy lists
3. Environment variables
ClickHouse will check the highest priority resolver type for the request protocol. If it is not defined,
it will check the next highest priority resolver type, until it reaches the environment resolver.
This also allows a mix of resolver types can be used.

View File

@ -25,25 +25,12 @@ namespace
* getenv is safe to use here because ClickHouse code does not make any call to `setenv` or `putenv`
* aside from tests and a very early call during startup: https://github.com/ClickHouse/ClickHouse/blob/master/src/Daemon/BaseDaemon.cpp#L791
* */
if (protocol == DB::ProxyConfiguration::Protocol::HTTP)
switch (protocol)
{
return std::getenv(PROXY_HTTP_ENVIRONMENT_VARIABLE); // NOLINT(concurrency-mt-unsafe)
}
else if (protocol == DB::ProxyConfiguration::Protocol::HTTPS)
{
return std::getenv(PROXY_HTTPS_ENVIRONMENT_VARIABLE); // NOLINT(concurrency-mt-unsafe)
}
else
{
if (const char * http_proxy_host = std::getenv(PROXY_HTTP_ENVIRONMENT_VARIABLE)) // NOLINT(concurrency-mt-unsafe)
{
return http_proxy_host;
}
else
{
case ProxyConfiguration::Protocol::HTTP:
return std::getenv(PROXY_HTTP_ENVIRONMENT_VARIABLE); // NOLINT(concurrency-mt-unsafe)
case ProxyConfiguration::Protocol::HTTPS:
return std::getenv(PROXY_HTTPS_ENVIRONMENT_VARIABLE); // NOLINT(concurrency-mt-unsafe)
}
}
}
}

View File

@ -1,17 +1,23 @@
#pragma once
#include <string>
#include <Common/Exception.h>
namespace DB
{
namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
}
struct ProxyConfiguration
{
enum class Protocol
{
HTTP,
HTTPS,
ANY
HTTPS
};
static auto protocolFromString(const std::string & str)
@ -24,10 +30,8 @@ struct ProxyConfiguration
{
return Protocol::HTTPS;
}
else
{
return Protocol::ANY;
}
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Unknown proxy protocol: {}", str);
}
static auto protocolToString(Protocol protocol)
@ -38,8 +42,6 @@ struct ProxyConfiguration
return "http";
case Protocol::HTTPS:
return "https";
case Protocol::ANY:
return "any";
}
}

View File

@ -20,12 +20,13 @@ namespace
std::shared_ptr<ProxyConfigurationResolver> getRemoteResolver(
const String & config_prefix, const Poco::Util::AbstractConfiguration & configuration)
{
auto endpoint = Poco::URI(configuration.getString(config_prefix + ".endpoint"));
auto proxy_scheme = configuration.getString(config_prefix + ".proxy_scheme");
auto resolver_prefix = config_prefix + ".resolver";
auto endpoint = Poco::URI(configuration.getString(resolver_prefix + ".endpoint"));
auto proxy_scheme = configuration.getString(resolver_prefix + ".proxy_scheme");
if (proxy_scheme != "http" && proxy_scheme != "https")
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Only HTTP/HTTPS schemas allowed in proxy resolver config: {}", proxy_scheme);
auto proxy_port = configuration.getUInt(config_prefix + ".proxy_port");
auto cache_ttl = configuration.getUInt(config_prefix + ".proxy_cache_time", 10);
auto proxy_port = configuration.getUInt(resolver_prefix + ".proxy_port");
auto cache_ttl = configuration.getUInt(resolver_prefix + ".proxy_cache_time", 10);
LOG_DEBUG(&Poco::Logger::get("ProxyConfigurationResolverProvider"), "Configured remote proxy resolver: {}, Scheme: {}, Port: {}",
endpoint.toString(), proxy_scheme, proxy_port);
@ -33,31 +34,6 @@ namespace
return std::make_shared<RemoteProxyConfigurationResolver>(endpoint, proxy_scheme, proxy_port, cache_ttl);
}
std::shared_ptr<ProxyConfigurationResolver> getRemoteResolver(
ProxyConfiguration::Protocol protocol, const String & config_prefix, const Poco::Util::AbstractConfiguration & configuration)
{
std::vector<String> keys;
configuration.keys(config_prefix, keys);
std::vector<Poco::URI> uris;
for (const auto & key : keys)
{
if (startsWith(key, "resolver"))
{
auto prefix_with_key = config_prefix + "." + key;
auto proxy_scheme_config_string = prefix_with_key + ".proxy_scheme";
auto config_protocol = configuration.getString(proxy_scheme_config_string);
if (ProxyConfiguration::Protocol::ANY == protocol || config_protocol == ProxyConfiguration::protocolToString(protocol))
{
return getRemoteResolver(prefix_with_key, configuration);
}
}
}
return nullptr;
}
auto extractURIList(const String & config_prefix, const Poco::Util::AbstractConfiguration & configuration)
{
std::vector<String> keys;
@ -84,34 +60,7 @@ namespace
return uris;
}
std::shared_ptr<ProxyConfigurationResolver> getListResolverNewSyntax(
ProxyConfiguration::Protocol protocol,
const String & config_prefix,
const Poco::Util::AbstractConfiguration & configuration
)
{
std::vector<Poco::URI> uris;
bool include_http_uris = ProxyConfiguration::Protocol::ANY == protocol || ProxyConfiguration::Protocol::HTTP == protocol;
if (include_http_uris && configuration.has(config_prefix + ".http"))
{
auto http_uris = extractURIList(config_prefix + ".http", configuration);
uris.insert(uris.end(), http_uris.begin(), http_uris.end());
}
bool include_https_uris = ProxyConfiguration::Protocol::ANY == protocol || ProxyConfiguration::Protocol::HTTPS == protocol;
if (include_https_uris && configuration.has(config_prefix + ".https"))
{
auto https_uris = extractURIList(config_prefix + ".https", configuration);
uris.insert(uris.end(), https_uris.begin(), https_uris.end());
}
return uris.empty() ? nullptr : std::make_shared<ProxyListConfigurationResolver>(uris);
}
std::shared_ptr<ProxyConfigurationResolver> getListResolverOldSyntax(
std::shared_ptr<ProxyConfigurationResolver> getListResolver(
const String & config_prefix,
const Poco::Util::AbstractConfiguration & configuration
)
@ -121,29 +70,58 @@ namespace
return uris.empty() ? nullptr : std::make_shared<ProxyListConfigurationResolver>(uris);
}
std::shared_ptr<ProxyConfigurationResolver> getListResolver(
ProxyConfiguration::Protocol protocol, const String & config_prefix, const Poco::Util::AbstractConfiguration & configuration
bool hasRemoteResolver(const String & config_prefix, const Poco::Util::AbstractConfiguration & configuration)
{
return configuration.has(config_prefix + ".resolver");
}
bool hasListResolver(const String & config_prefix, const Poco::Util::AbstractConfiguration & configuration)
{
return configuration.has(config_prefix + ".uri");
}
/*
* New syntax requires protocol prefix "<http> or <https>"
* */
std::optional<std::string> getProtocolPrefix(
ProxyConfiguration::Protocol request_protocol,
const String & config_prefix,
const Poco::Util::AbstractConfiguration & configuration
)
{
std::vector<String> keys;
configuration.keys(config_prefix, keys);
auto protocol_prefix = config_prefix + "." + ProxyConfiguration::protocolToString(request_protocol);
if (!configuration.has(protocol_prefix))
{
return std::nullopt;
}
bool new_setting_syntax = std::find_if(
keys.begin(),
keys.end(),
[](const String & key)
{
return startsWith(key, "http") || startsWith(key, "https");
}) != keys.end();
return protocol_prefix;
}
return new_setting_syntax ? getListResolverNewSyntax(protocol, config_prefix, configuration)
: getListResolverOldSyntax(config_prefix, configuration);
template <bool new_syntax>
std::optional<std::string> calculatePrefixBasedOnSettingsSyntax(
ProxyConfiguration::Protocol request_protocol,
const String & config_prefix,
const Poco::Util::AbstractConfiguration & configuration
)
{
if (!configuration.has(config_prefix))
{
return std::nullopt;
}
if constexpr (new_syntax)
{
return getProtocolPrefix(request_protocol, config_prefix, configuration);
}
return config_prefix;
}
}
std::shared_ptr<ProxyConfigurationResolver> ProxyConfigurationResolverProvider::get(Protocol protocol, const Poco::Util::AbstractConfiguration & configuration)
{
if (auto resolver = getFromSettings(protocol, "", configuration))
if (auto resolver = getFromSettings(protocol, "proxy", configuration))
{
return resolver;
}
@ -151,34 +129,36 @@ std::shared_ptr<ProxyConfigurationResolver> ProxyConfigurationResolverProvider::
return std::make_shared<EnvironmentProxyConfigurationResolver>(protocol);
}
template <bool is_new_syntax>
std::shared_ptr<ProxyConfigurationResolver> ProxyConfigurationResolverProvider::getFromSettings(
Protocol protocol,
Protocol request_protocol,
const String & config_prefix,
const Poco::Util::AbstractConfiguration & configuration
)
{
auto proxy_prefix = config_prefix.empty() ? "proxy" : config_prefix + ".proxy";
auto prefix_opt = calculatePrefixBasedOnSettingsSyntax<is_new_syntax>(request_protocol, config_prefix, configuration);
if (configuration.has(proxy_prefix))
if (!prefix_opt)
{
std::vector<String> config_keys;
configuration.keys(proxy_prefix, config_keys);
return nullptr;
}
if (auto remote_resolver = getRemoteResolver(protocol, proxy_prefix, configuration))
{
return remote_resolver;
}
auto prefix = *prefix_opt;
if (auto list_resolver = getListResolver(protocol, proxy_prefix, configuration))
{
return list_resolver;
}
if (hasRemoteResolver(prefix, configuration))
{
return getRemoteResolver(prefix, configuration);
}
else if (hasListResolver(prefix, configuration))
{
return getListResolver(prefix, configuration);
}
return nullptr;
}
std::shared_ptr<ProxyConfigurationResolver> ProxyConfigurationResolverProvider::getFromOldSettingsFormat(
Protocol request_protocol,
const String & config_prefix,
const Poco::Util::AbstractConfiguration & configuration
)
@ -187,7 +167,7 @@ std::shared_ptr<ProxyConfigurationResolver> ProxyConfigurationResolverProvider::
* First try to get it from settings only using the combination of config_prefix and configuration.
* This logic exists for backward compatibility with old S3 storage specific proxy configuration.
* */
if (auto resolver = ProxyConfigurationResolverProvider::getFromSettings(Protocol::ANY, config_prefix, configuration))
if (auto resolver = ProxyConfigurationResolverProvider::getFromSettings<false>(request_protocol, config_prefix + ".proxy", configuration))
{
return resolver;
}
@ -196,7 +176,7 @@ std::shared_ptr<ProxyConfigurationResolver> ProxyConfigurationResolverProvider::
* In case the combination of config_prefix and configuration does not provide a resolver, try to get it from general / new settings.
* Falls back to Environment resolver if no configuration is found.
* */
return ProxyConfigurationResolverProvider::get(Protocol::ANY, configuration);
return ProxyConfigurationResolverProvider::get(request_protocol, configuration);
}
}

View File

@ -27,11 +27,13 @@ public:
* If no configuration is found, returns nullptr.
* */
static std::shared_ptr<ProxyConfigurationResolver> getFromOldSettingsFormat(
Protocol request_protocol,
const String & config_prefix,
const Poco::Util::AbstractConfiguration & configuration
);
private:
template <bool is_new_syntax = true>
static std::shared_ptr<ProxyConfigurationResolver> getFromSettings(
Protocol protocol,
const String & config_prefix,

View File

@ -120,4 +120,78 @@ TEST_F(ProxyConfigurationResolverProviderTests, ListBoth)
ASSERT_EQ(https_proxy_configuration.port, https_list_proxy_server.getPort());
}
TEST_F(ProxyConfigurationResolverProviderTests, RemoteResolverIsBasedOnProtocolConfigurationHTTP)
{
/*
* Since there is no way to call `ProxyConfigurationResolver::resolve` on remote resolver,
* it is hard to verify the remote resolver was actually picked. One hackish way to assert
* the remote resolver was OR was not picked based on the configuration, is to use the
* environment resolver. Since the environment resolver is always returned as a fallback,
* we can assert the remote resolver was not picked if `ProxyConfigurationResolver::resolve`
* succeeds and returns an environment proxy configuration.
* */
EnvironmentProxySetter setter(http_env_proxy_server, https_env_proxy_server);
ConfigurationPtr config = Poco::AutoPtr(new Poco::Util::MapConfiguration());
config->setString("proxy", "");
config->setString("proxy.https", "");
config->setString("proxy.https.resolver", "");
config->setString("proxy.https.resolver.endpoint", "http://resolver:8080/hostname");
// even tho proxy protocol / scheme is http, it should not be picked (prior to this PR, it would be picked)
config->setString("proxy.https.resolver.proxy_scheme", "http");
config->setString("proxy.https.resolver.proxy_port", "80");
config->setString("proxy.https.resolver.proxy_cache_time", "10");
context->setConfig(config);
auto http_proxy_configuration = DB::ProxyConfigurationResolverProvider::get(DB::ProxyConfiguration::Protocol::HTTP, *config)->resolve();
/*
* Asserts env proxy is used and not the remote resolver. If the remote resolver is picked, it is an error because
* there is no `http` specification for remote resolver
* */
ASSERT_EQ(http_proxy_configuration.host, http_env_proxy_server.getHost());
ASSERT_EQ(http_proxy_configuration.port, http_env_proxy_server.getPort());
ASSERT_EQ(http_proxy_configuration.protocol, DB::ProxyConfiguration::protocolFromString(http_env_proxy_server.getScheme()));
}
TEST_F(ProxyConfigurationResolverProviderTests, RemoteResolverIsBasedOnProtocolConfigurationHTTPS)
{
/*
* Since there is no way to call `ProxyConfigurationResolver::resolve` on remote resolver,
* it is hard to verify the remote resolver was actually picked. One hackish way to assert
* the remote resolver was OR was not picked based on the configuration, is to use the
* environment resolver. Since the environment resolver is always returned as a fallback,
* we can assert the remote resolver was not picked if `ProxyConfigurationResolver::resolve`
* succeeds and returns an environment proxy configuration.
* */
EnvironmentProxySetter setter(http_env_proxy_server, https_env_proxy_server);
ConfigurationPtr config = Poco::AutoPtr(new Poco::Util::MapConfiguration());
config->setString("proxy", "");
config->setString("proxy.http", "");
config->setString("proxy.http.resolver", "");
config->setString("proxy.http.resolver.endpoint", "http://resolver:8080/hostname");
// even tho proxy protocol / scheme is https, it should not be picked (prior to this PR, it would be picked)
config->setString("proxy.http.resolver.proxy_scheme", "https");
config->setString("proxy.http.resolver.proxy_port", "80");
config->setString("proxy.http.resolver.proxy_cache_time", "10");
context->setConfig(config);
auto http_proxy_configuration = DB::ProxyConfigurationResolverProvider::get(DB::ProxyConfiguration::Protocol::HTTPS, *config)->resolve();
/*
* Asserts env proxy is used and not the remote resolver. If the remote resolver is picked, it is an error because
* there is no `http` specification for remote resolver
* */
ASSERT_EQ(http_proxy_configuration.host, https_env_proxy_server.getHost());
ASSERT_EQ(http_proxy_configuration.port, https_env_proxy_server.getPort());
ASSERT_EQ(http_proxy_configuration.protocol, DB::ProxyConfiguration::protocolFromString(https_env_proxy_server.getScheme()));
}
// remote resolver is tricky to be tested in unit tests

View File

@ -57,40 +57,3 @@ TEST(EnvironmentProxyConfigurationResolver, TestHTTPsNoEnv)
ASSERT_EQ(configuration.protocol, DB::ProxyConfiguration::Protocol::HTTP);
ASSERT_EQ(configuration.port, 0u);
}
TEST(EnvironmentProxyConfigurationResolver, TestANYHTTP)
{
EnvironmentProxySetter setter(http_proxy_server, {});
DB::EnvironmentProxyConfigurationResolver resolver(DB::ProxyConfiguration::Protocol::ANY);
auto configuration = resolver.resolve();
ASSERT_EQ(configuration.host, http_proxy_server.getHost());
ASSERT_EQ(configuration.port, http_proxy_server.getPort());
ASSERT_EQ(configuration.protocol, DB::ProxyConfiguration::protocolFromString(http_proxy_server.getScheme()));
}
TEST(EnvironmentProxyConfigurationResolver, TestANYHTTPS)
{
EnvironmentProxySetter setter({}, https_proxy_server);
DB::EnvironmentProxyConfigurationResolver resolver(DB::ProxyConfiguration::Protocol::ANY);
auto configuration = resolver.resolve();
ASSERT_EQ(configuration.host, https_proxy_server.getHost());
ASSERT_EQ(configuration.port, https_proxy_server.getPort());
ASSERT_EQ(configuration.protocol, DB::ProxyConfiguration::protocolFromString(https_proxy_server.getScheme()));
}
TEST(EnvironmentProxyConfigurationResolver, TestANYNoEnv)
{
DB::EnvironmentProxyConfigurationResolver resolver(DB::ProxyConfiguration::Protocol::ANY);
auto configuration = resolver.resolve();
ASSERT_EQ(configuration.host, "");
ASSERT_EQ(configuration.protocol, DB::ProxyConfiguration::Protocol::HTTP);
ASSERT_EQ(configuration.port, 0u);
}

View File

@ -71,7 +71,11 @@ std::unique_ptr<S3::Client> getClient(
/*
* Override proxy configuration for backwards compatibility with old configuration format.
* */
auto proxy_config = DB::ProxyConfigurationResolverProvider::getFromOldSettingsFormat(config_prefix, config);
auto proxy_config = DB::ProxyConfigurationResolverProvider::getFromOldSettingsFormat(
ProxyConfiguration::protocolFromString(uri.uri.getScheme()),
config_prefix,
config
);
if (proxy_config)
{
client_configuration.per_request_configuration

View File

@ -5,4 +5,4 @@
<uri>http://proxy2</uri>
</http>
</proxy>
</clickhouse>
</clickhouse>

View File

@ -1,14 +0,0 @@
import random
import bottle
@bottle.route("/hostname")
def index():
if random.randrange(2) == 0:
return "proxy1"
else:
return "proxy2"
bottle.run(host="0.0.0.0", port=8080)

View File

@ -5,11 +5,13 @@
Proxy host is returned as string in response body.
Then S3 client uses proxy URL formed as proxy_scheme://proxy_host:proxy_port to make request.
-->
<resolver>
<endpoint>http://resolver:8080/hostname</endpoint>
<proxy_scheme>http</proxy_scheme>
<proxy_port>80</proxy_port>
<proxy_cache_time>10</proxy_cache_time>
</resolver>
<http>
<resolver>
<endpoint>http://resolver:8080/hostname</endpoint>
<proxy_scheme>http</proxy_scheme>
<proxy_port>80</proxy_port>
<proxy_cache_time>10</proxy_cache_time>
</resolver>
</http>
</proxy>
</clickhouse>

View File

@ -1,22 +1,12 @@
<clickhouse>
<proxy>
<!--
At each interaction with S3 resolver sends empty GET request to specified endpoint URL to obtain proxy host.
Proxy host is returned as string in response body.
Then S3 client uses proxy URL formed as proxy_scheme://proxy_host:proxy_port to make request.
-->
<resolver>
<endpoint>http://resolver:8080/hostname</endpoint>
<proxy_scheme>http</proxy_scheme>
<proxy_port>80</proxy_port>
<proxy_cache_time>10</proxy_cache_time>
</resolver>
<resolver>
<endpoint>http://resolver:8080/hostname</endpoint>
<proxy_scheme>https</proxy_scheme>
<proxy_port>443</proxy_port>
<proxy_cache_time>10</proxy_cache_time>
</resolver>
<https>
<resolver>
<endpoint>http://resolver:8080/hostname</endpoint>
<proxy_scheme>https</proxy_scheme>
<proxy_port>443</proxy_port>
<proxy_cache_time>10</proxy_cache_time>
</resolver>
</https>
</proxy>
</clickhouse>

View File

@ -41,6 +41,7 @@ def cluster():
env_variables={
"https_proxy": "https://proxy1",
},
instance_env_variables=True,
)
logging.info("Starting cluster...")

View File

@ -2147,6 +2147,7 @@ repo
representable
requestor
requireTLSv
resolvers
resharding
reshards
resultset