惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

The GitHub Blog
The GitHub Blog
Microsoft Azure Blog
Microsoft Azure Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
V
Vulnerabilities – Threatpost
人人都是产品经理
人人都是产品经理
小众软件
小众软件
O
OpenAI News
量子位
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cyber Attacks, Cyber Crime and Cyber Security
N
Netflix TechBlog - Medium
J
Java Code Geeks
D
Darknet – Hacking Tools, Hacker News & Cyber Security
Martin Fowler
Martin Fowler
S
SegmentFault 最新的问题
I
InfoQ
V
Visual Studio Blog
C
Cisco Blogs
T
Threat Research - Cisco Blogs
K
Kaspersky official blog
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Scott Helme
Scott Helme
大猫的无限游戏
大猫的无限游戏
C
Cybersecurity and Infrastructure Security Agency CISA
P
Proofpoint News Feed
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
T
The Exploit Database - CXSecurity.com
Y
Y Combinator Blog
宝玉的分享
宝玉的分享
酷 壳 – CoolShell
酷 壳 – CoolShell
T
The Blog of Author Tim Ferriss
T
Tor Project blog
Spread Privacy
Spread Privacy
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Cloudflare Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
CXSECURITY Database RSS Feed - CXSecurity.com
Simon Willison's Weblog
Simon Willison's Weblog
Apple Machine Learning Research
Apple Machine Learning Research
D
DataBreaches.Net
D
Docker
G
GRAHAM CLULEY
S
Securelist
P
Palo Alto Networks Blog
阮一峰的网络日志
阮一峰的网络日志
L
LINUX DO - 热门话题
P
Privacy International News Feed

CMTOPS.DEV

CMTOPS.DEV CMTOPS.DEV CMTOPS.DEV CMTOPS.DEV CMTOPS.DEV
CMTOPS.DEV
Timofey Chuchkanov · 2025-01-19 · via CMTOPS.DEV

A Cleaner Approach to IP-based Geolocation in ClickHouse

~4 min read

Updated on

This is a short note on what I found to be a better way of implementing IP-based geolocation in ClickHouse. It may be not for everyone.

Contents

  • Intro
  • Datasets Used
  • Implementation
    • Server Configuration
    • Dictionary
    • Custom Lookup Functions
  • Usage
    • lookupIPv4
    • lookupIPv4All
  • The Result

Intro

Since I use ClickHouse for building observability at my company, there are some situations when IP geolocation is needed.

There’s a really great article on the ClickHouse blog explaining in detail how this can be approached.

It’s a good start, but I decided to make my own flavor by adding country names from another dataset in addition to country codes, and squash everything into a single Dictionary definition to eliminate the hustle of maintaining any possible tables backing the dictionary.

Datasets Used

Implementation

Server Configuration

By default, ClickHouse expects the default user to be available, when it uses the CLICKHOUSE SOURCE for a dictionary.

For improved security, I usually disable this user, and create separate users as needed, each protected by a password.

  1. Disable the default user

    Create this file at /etc/clickhouse-server/users.d/default.xml

    <clickhouse>
      <users>
          <default remove="remove"></default>
      </users>
    </clickhouse>
  2. Create a user for dictionaries

    /etc/clickhouse-server/users.d/dict_updater.xml

    <clickhouse>
      <users>
          <dict_updater>
              <networks>
                  <ip>127.0.0.1/32</ip>
              </networks>
              <password_sha256_hex>REPLACEME</password_sha256_hex>
          </dict_updater>
      </users>
    </clickhouse>

    Where REPLACEME is a SHA256 hash. It can be generated using openssl:

    $

    (read P; echo -n $P | openssl dgst -sha256) # Use the `read` builtin with an `-s` option to prevent echoing if your shell supports this

Dictionary

I like to avoid using the default database, and use a separate database for geolocation and other supplementary data.

CREATE DATABASE IF NOT EXISTS meta;

Now, create the Dictionary itself. As of the time writing, I update geolocation data every 2 weeks, so LIFETIME is set to 1209600 seconds. Don’t forget to replace REPLACEME with your password. It should be the password itself this time, not its hash.

If you are using an interactive ClickHouse client, make sure it doesn’t store command history for the query below. Otherwise, it will leak your dict_updater password.

I usually just connect with the --history-file /dev/null option using clickhouse client for that purpose.

With city and coords

After using this for some time, I noticed that the geoip city database doesn’t match some clients like they don’t have a country at all.

If you need only the country codes, the diff for the new version is in the collapsible below this one.

CREATE DICTIONARY IF NOT EXISTS meta.ipv4_geo
(
    cidr          String,
    country_code  String,
    country_name  String,
    city          String,
    latitude      Float64,
    longitude     Float64
)
PRIMARY KEY cidr
SOURCE(CLICKHOUSE(
    user 'dict_updater'
    password 'REPLACEME'
    query '
        SELECT cidr, country_code, country_name, city, latitude, longitude FROM
        (
            WITH
                bitXor(ip_range_start, ip_range_end) AS xor,
                IF(xor != 0, CEIL(LOG2(xor)), 0) AS unmatched,
                32 - unmatched AS cidr_suffix,
                toIPv4(bitAnd(bitNot(POW(2, unmatched) - 1), ip_range_start)::UInt64) AS cidr_address
            SELECT
                *,
                CONCAT(toString(cidr_address), \'/\', toString(cidr_suffix)) AS cidr
            FROM
            (
                SELECT
                    *
                FROM
                    url(
                        \'https://raw.githubusercontent.com/sapics/ip-location-db/master/dbip-city/dbip-city-ipv4.csv.gz\',
                        \'CSV\',
                        \'
                            ip_range_start IPv4,
                            ip_range_end   IPv4,
                            country_code   String,
                            state1         String,
                            state2         String,
                            city           String,
                            postcode       String,
                            latitude       Float64,
                            longitude      Float64
                        \'
                    )
            )
        ) AS geoip
        JOIN
        (
            SELECT Code as country_code, Name as country_name
            FROM
            (
                SELECT
                    *
                FROM
                    url(
                        \'https://raw.githubusercontent.com/annexare/Countries/refs/heads/main/dist/countries.csv\',
                        \'CSVWithNames\',
                        \'
                            Code String,
                            Name String
                        \'
                    )
            )
        ) AS countries
        ON geoip.country_code = countries.country_code
    '
))
LAYOUT(ip_trie)
LIFETIME(1209600);
Country code and name only (diff)
5,8c5
<     country_name  String,
<     city          String,
<     latitude      Float64,
<     longitude     Float64
---
>     country_name  String
15c12
<         SELECT cidr, country_code, country_name, city, latitude, longitude FROM
---
>         SELECT cidr, country_code, country_name FROM
31c28
<                         \'https://raw.githubusercontent.com/sapics/ip-location-db/master/dbip-city/dbip-city-ipv4.csv.gz\',
---
>                         \'https://raw.githubusercontent.com/sapics/ip-location-db/refs/heads/main/asn-country/asn-country-ipv4.csv\',
36,42c33
<                             country_code   String,
<                             state1         String,
<                             state2         String,
<                             city           String,
<                             postcode       String,
<                             latitude       Float64,
<                             longitude      Float64
---
>                             country_code   String

Custom Lookup Functions

To make your life easier and queries cleaner, create a couple of custom functions for performing IPv4 lookups:

CREATE FUNCTION IF NOT EXISTS lookupIPv4 AS
    (address, selection) -> dictGet('meta.ipv4_geo', selection, tuple(IPv4StringToNumOrDefault(address)));

CREATE FUNCTION IF NOT EXISTS lookupIPv4All AS
    (address) -> dictGet('meta.ipv4_geo',
                         tuple('country_code', 'country_name', 'city', 'latitude', 'longitude'),
                         tuple(IPv4StringToNumOrDefault(address)));

Usage

lookupIPv4

If an IP address is invalid, or there’s no data for this address, default values of each data type are returned.

(String, String or Tuple) -> Tuple or any value

select lookupIPv4('172.67.199.23', 'country_code');
select lookupIPv4('172.67.199.23', ('country_name', 'longitude', 'latitude'));

lookupIPv4All

The behavior is same is above.

(String) -> Tuple

select lookupIPv4All('172.67.199.23');

The Result

Now you have an automatically updating in-memory dictionary that provides you with geoip data. Functions lookupIPv4 and lookupIPv4All can make queries cleaner.

The dictionary will be populated when you first try to access a value from it, so the query will take a long time. Consequent queries will be very fast.

If you restart the server, the dictionary will be wiped from memory.

Even though this covers IPv4 only, I think people will come up with a similar solution for IPv6.


Thank you for reading, and have a good rest of your day! (^ ~ ^ )

If you have any questions/suggestions, or found an error, contact me!