惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
WordPress大学
WordPress大学
小众软件
小众软件
Cloudbric
Cloudbric
AWS News Blog
AWS News Blog
腾讯CDC
量子位
人人都是产品经理
人人都是产品经理
大猫的无限游戏
大猫的无限游戏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
V
Vulnerabilities – Threatpost
Scott Helme
Scott Helme
Hugging Face - Blog
Hugging Face - Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
The Hacker News
The Hacker News
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
IT之家
IT之家
Jina AI
Jina AI
Attack and Defense Labs
Attack and Defense Labs
S
SegmentFault 最新的问题
Simon Willison's Weblog
Simon Willison's Weblog
The Cloudflare Blog
阮一峰的网络日志
阮一峰的网络日志
T
Tailwind CSS Blog
Last Week in AI
Last Week in AI
博客园 - 【当耐特】
Google Online Security Blog
Google Online Security Blog
美团技术团队
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
罗磊的独立博客
L
LINUX DO - 最新话题
博客园 - Franky
博客园 - 叶小钗
Apple Machine Learning Research
Apple Machine Learning Research
The Last Watchdog
The Last Watchdog
J
Java Code Geeks
AI
AI
C
Cisco Blogs
酷 壳 – CoolShell
酷 壳 – CoolShell
C
Cyber Attacks, Cyber Crime and Cyber Security
Cisco Talos Blog
Cisco Talos Blog
博客园 - 三生石上(FineUI控件)
雷峰网
雷峰网
Help Net Security
Help Net Security
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
云风的 BLOG
云风的 BLOG
I
Intezer
S
Securelist

Home on Alex Plescan

Just for fun: animating a mosaic of 90s GIFs Two computers, one monitor, zero fiddling Placeholder names should be bad and unique Rebuilding this site Okay, I really like WezTerm GNU Parallel, where have you been all my life? Easy SVG sparklines Using Declarative Shadow DOM to embed HTML emails on a web page Selling SaaS on Gumroad PDF: The Conjoined Triangles of Success Deploying Metabase to Fly.io The ".x" Files Xcode 8 managed signing: adding new device UUIDs to a provisioning profile Emojify your Wi-Fi (Netgear R6300 edition) How to use the San Francisco Mono typeface before macOS Sierra is released Disabling App Transport Security in your development environment Swift: A nicer way to tell if your app is running in Debug mode Development environment config overrides in Jekyll Setting up SwiftLint on Travis CI
Timeseries with PostgreSQL
2023-07-15 · via Home on Alex Plescan

After publishing Easy SVG Sparklines last week, I received a couple of questions about how the timeseries data displayed on Mailgrip’s charts is generated.

tl;dr: by querying PostgreSQL, leveraging its set returning functions, and taking a pragmatic stance on performance.

And now, for the longer answer:

Let’s go with the usecase I had for Mailgrip. Generating a timeseries representing how many emails were received by an inbox over a given time period.

You could reach for a dedicated timeseries database, or maybe a PostgreSQL extension to do this, but do you really need to take on that complexity?

If you’re reading this, PostgreSQL is likely where your application’s data already lives, and it comes with great support for generating timeseries built-in. Let’s look at how to build a query, and then how it may be used in an Elixir application.

The timeseries query

We’ll build a SQL query that’ll return the number of messages received to an inbox in the past 5 days. Let’s start by listing those days:

select
  cast(calendar.entry as date) as date_str
from
  generate_series(now(), now() - interval '4 day', '-1 day') as calendar (entry);
  date_str
------------
 2023-07-15
 2023-07-14
 2023-07-13
 2023-07-12
 2023-07-11
(5 rows)

Few noteworthy things happening here:

  • generate_series (docs) produces a set containing the days we’re after. In our query this is the same as having table named calendar with a column named entry.

    It accepts start, stop, step as arguments. In this case I’ve specified: “start today”, “stop 4 days ago”, “1 day per step”.

  • now() - interval '4 day' utilizes PostgreSQL’s interval type to subtract 4 days from today

  • cast(calendar.entry as date) drops the time from our datetime entries, as we don’t care about time precision in this case

Now that we’ve got the dates we’re interested in, we can join them with other data from our database:

select
  cast(calendar.entry as date) as date_str,
  messages.id
from
  generate_series(now(), now() - interval '4 day', '-1 day') as calendar (entry)
  left join messages on messages.inbox_id = 1
    and cast(messages.received_at as date) = cast(calendar.entry as date);
  date_str  |  id
------------+------
 2023-07-11 | 1200
 2023-07-11 | 1201
 2023-07-12 | 1202
 2023-07-13 | 1203
 2023-07-13 | 1204
 2023-07-14 | 1205
 2023-07-14 | 1206
 2023-07-15 |
(8 rows)

Using a left join to the messages table means that if there are any days with 0 messages received, we still include them (because “no emails received” is valid data that we want to include in our timeseries). Indeed you can see this in the example above for 2023-07-15.

Finally, let’s round this off by grouping the results and counting them:

select
  cast(calendar.entry as date) as date_str,
  count(messages.id)
from
  generate_series(now(), now() - interval '4 day', '-1 day') as calendar (entry)
  left join messages on messages.inbox_id = 1
    and cast(messages.received_at as date) = cast(calendar.entry as date)
group by calendar.entry
order by date_str asc;
  date_str  | count
------------+-------
 2023-07-11 |     2
 2023-07-12 |     1
 2023-07-13 |     2
 2023-07-14 |     2
 2023-07-15 |     0
(5 rows)

Beautiful! We can see how many messages were received on each day in our series. Now if we want to generate more data, we just change the interval '4 day' to whatever interval we need.

Query performance

This solution does a sequential scan of messages that belong to an inbox, for the period we’re querying. Sequential scans are one of the first things you’d typically look at optimising, however in this particular case I haven’t found it necessary.

My query window used to visualize sparklines is 30 days, for just one inbox. Even with a busy inbox that’s received ~800 messages in 30 days, and running on fly.io’s cheapest database machine type (1 shared CPU, 256MB RAM, $1.94/mo) - this query executes in 1.13 milliseconds.

Yes, this approach would not scale for an unbounded data set, but that’s something I definitely don’t need to worry about now.

Example implementation

In Elixir and using the Ecto package, the implementation looks like:

defmodule Mailgrip.Emails do
  def message_stats(%Inbox{} = inbox, num_days) do
    sql = """
      select
        cast(calendar.entry as date) as date_str,
        count(messages.id)
      from
        generate_series(now(), now() - $2::integer * interval '1 day', '-1 day') as calendar (entry)
        left join messages on messages.inbox_id = $1::integer
          and cast(messages.received_at as date) = cast(calendar.entry as date)
      group by calendar.entry
      order by date_str asc
    """

    Repo.query!(sql, [inbox.id, Enum.max([0, num_days - 1])])
    |> Map.fetch!(:rows)
    |> Enum.map(fn [date_str, count] ->
      %{date: date_str, count: count}
    end)
  end
end

Combining this with the sample Elixir/Phoenix code on Easy SVG Sparklines would give you an end to end solution.

(For simplicity I’ve omitted timezone support, as well as some other business logic relevant to Mailgrip. This is still ~90% of the code I use in production to generate timeseries data).

Acknowledgements

The Art of PostgreSQL was a very influential book in my use of PostgreSQL (and thinking about the role of a RDBMS in general). If you like the approach layed out in this post - it’s barely scratching the surface of what Dimitri Fontaine covers in the book.