Home > Software design >  How to do databases store live second data?
How to do databases store live second data?

Time:05-17

So what I mean by live second data is something like the stock market where every second the data is getting inputted to the exact area of the specific stock item.

How would the data look in the database? Does it have a timestamp of each second? If so, wouldn't that cause the database to quickly fill up? Are there specific Databases that manage this type of stuff?

Thank you!

CodePudding user response:

Given the sheer amount of money that gets thrown around in fintech, I'd be surprised if trading platforms even use traditional RDMBS databases to store their trading data, but I digress...

How would the data look in the database?

(Again, assuming they're even using a relation-based model in the first place) then something like this in SQL:

CREATE TABLE SymbolPrices (
    Symbol char(4)  NOT NULL, -- 4 bytes, or even 3 bytes given a symbol char only needs 32 bits-per-char.
    Utc    datetime NOT NULL, -- 8 byte timestamp (nanosececond precision)
    Price  int      NOT NULL  -- Assuming integer cents (not 4 digits), that's 4 bytes
)

...which has a fixed row length of 16 bytes.

Does it have a timestamp of each second?

It can do, but not per second - you'd need far greater granularity than that: I wouldn't be surprised if they were using at least 100-nanosecond resolution, which is a common unit for computer system clock "ticks" (e.g. .NET's DateTime.Ticks is a 64-bit integer value of 100-nanosecond units). Java and JavaScript both use milliseconds, though this resolution might be too coarse.

Storage space requirements for changing numeric values can always be significantly optimized if you instead store the deltas instead of absolute values: I reckon it could come down to 8 bytes per record:

  • I reason that 3 bytes is sufficient to store trade timestamp deltas at ~1.5ms resolution assuming 100,000 trades per day per stock: that's 16.7m values to represent a 7 hour (25,200s) trading window,

  • Price deltas also likely be reduced to a 2 byte value (-$327.68 to $327.67).

  • And assuming symbols never exceed 4 uppercase Latin characters (A-Z), then that can be represented in 3 bytes.

  • Giving an improved fixed row length of 8 bytes (3 3 2).

    • Though you would now need to store "keyframe" data every few thousand rows to prevent needing to re-play every trade from the very beginning to get the current price.
  • If data is physically partitioned by symbol (i.e.. using a separate file on disk for each symbol) then you don't need to include the symbol in the record at all, bringing the row length down to merely 5 bytes.

If so, wouldn't that cause the database to quickly fill up?

No, not really (at least assuming you're using HDDs made since the early 2000s); consider that:

  • Major stock-exchanges really don't have that many stocks, e.g. NASDAQ only has a few thousand stocks (5,015 apparently).

  • While high-profile stocks (APPL, AMD, MSFT, etc) typically have 30-day sales volumes on the order of 20-130m, that's only the most popular ~50 stocks, most stocks have 30-day volumes far below that.

    • Let's just assume all 5,000 stocks all have a 30-day volume of 3m.
    • That's ~100,000 trades per day, per stock on average.
    • That would require 100,000 * 16 bytes per day per stock.
      • That's 1,600,000 bytes per day per stock.
      • Or 1.5MiB per day per stock.
        • 556MiB per year per stock.
    • For the entire exchange (of 5,000 stocks) that's 7.5GiB/day.
      • Or 2.7TB/year.
    • When using deltas instead of absolute values, then the storage space requirements are halved to ~278MiB/year per stock, or 1.39TB/year for the entire exchange.
  • In practice, historical information would be likely be archived and compressed (likely using a column-major approach to make them more amenable to good compression with general purpose compression schemes, and if data is grouped by symbol then that shaves off another 4 bytes).

    • Even without compression, partitioning by symbol and using deltas means needing around only 870GB/year for the entire exchange.
      • That's small enough to fit into a $40 HDD drive from Amazon.

Are there specific Databases that manage this type of stuff?

Undoubtedly, but I don't think they'd need to optimize for storage-space specifically - more likely write-performance and security.

CodePudding user response:

They use different architectures like Kappa and Lambda and data streaming platforms like Apache Kafka

  • Related