If you’ve ever worked with real-time market data at a bank, hedge fund, or trading desk, there’s a good chance that data was captured, stored, and analyzed using kdb+.
In today’s digital landscape, data floods organizations at an unprecedented rate — from financial transactions to social media interactions. The challenge is clear: how do you process and analyze this data in real time?
Take financial markets, where even a few milliseconds of delay can mean missed opportunities or tangible losses. kdb+ was built to meet exactly this kind of demand. It’s widely adopted across banks, hedge funds, and other financial institutions, but also in manufacturing, aerospace, and telecommunications.
kdb+ is a column-oriented, in-memory database built for extreme performance on time-series data. It ships with q, a concise vector-processing language purpose-built for querying and manipulating large datasets with minimal syntax.
Together, they form a tightly integrated system where the language and the storage engine are designed as one — there’s no impedance mismatch between how you write queries and how data is stored. This is a key reason kdb+ consistently outperforms general-purpose databases on time-series workloads, handling millions of records per second with single-digit millisecond latency.
Each instance of the q interpreter is called a q process, and real-world kdb+ systems consist of many q processes, each assigned a specific role, all communicating with one another.
In this three-part series, we’ll build a complete kdb+ tick architecture from the ground up. We’ll start by exploring the core concepts and components, then set up and run a fully functioning architecture with a tickerplant, real-time database, historical database, and gateway. Finally, we’ll extend it by building Java services that query kdb+, subscribe to live data, and publish trades using the javakdb client library.
Anatomy of a Vanilla Tick Architecture
The diagram below illustrates what a vanilla tick architecture looks like. Let’s walk through each component and understand the role it plays within the ecosystem.
Data Source — The entry point of the architecture
This is a source of real-time data: financial quotes and trades from providers like Bloomberg or Refinitiv, sensor readings in a manufacturing environment, or any high-frequency data stream. These feeds typically arrive in a proprietary format specific to the provider.
Feed Handler — Sits between the data source and the kdb+ system
Its role is to parse incoming data from its proprietary format into a structure that kdb+ can ingest. In practice, multiple feed handlers can run in parallel, each collecting data from a different source and forwarding it to the system for storage and analysis. KX provides Fusion interfaces that connect kdb+ to a range of external technologies, including R, Apache Kafka, Java, Python, and C.
Tickerplant (TP) — The most critical component in the architecture
A q process acting as a tickerplant captures the initial data feed, writes every record to a log file, and publishes messages to all registered subscribers. The primary goal is zero-latency throughput, and it supports ingesting data in batch mode for efficiency.
Beyond data relay, the tickerplant manages subscriptions — adding and removing subscribers, and sending them table definitions — and handles end-of-day (EOD) processing. The script tick.q provides a reference implementation and serves as a starting point for most environments.
A key design principle: tickerplants should remain lightweight. They should not store data and should consume minimal memory. For best resilience and to avoid resource contention, they should run on dedicated cores.
TP Log — The tickerplant log file records every message the tickerplant receives from the feed handler
Its primary purpose is recovery: if the RDB crashes or needs to restart, the log file is replayed to reconstruct the current state and avoid data loss.
For optimal performance, the log file should be stored on a fast local disk to minimize publication delay and I/O wait times.
Real-Time Database (RDB)
A q process that subscribes to the tickerplant and stores all incoming messages in memory, making today’s data available for intraday queries.
At startup, the RDB contacts the tickerplant and receives the data schema, the location of the TP log file, and the number of lines to replay from it — ensuring it catches up with any data received before it came online. From that point on, it receives live updates as the tickerplant publishes them.
At end of day, the RDB writes its intraday data to the historical database, sends an EOD message, and clears its memory for the next trading day. The script r.q provides a reference implementation as a starting point.
Real-Time Subscriber (RTS) — Also known as a real-time engine (RTE) or complex event processor (CEP).
This q process subscribes to the tickerplant just like the RDB, but instead of simply storing raw data, it performs additional logic on each incoming message — for example, calculating an order book, maintaining a subtable with the latest price per instrument, or running streaming analytics and aggregations.
Historical Database (HDB)
A q process that provides a queryable store of historical data — everything that has been saved to disk by the RDB at end of day. Typical use cases include generating reports on order execution times or running sensor failure analyses.
Large tables are stored on disk partitioned by date, with each column saved as its own file. These date-based partitions are a core part of kdb+’s on-disk structure and a key contributor to its query performance: only the relevant partitions and columns are read into memory, minimizing I/O overhead on large datasets.
Gateway (GW)
The single entry point into the kdb+ system for end users and external applications. It receives incoming queries, routes them to the appropriate processes — RDB, HDB, or RTS — and returns the results. In many setups, the gateway connects to both real-time and historical data, allowing users to query across both seamlessly without needing to know where the data resides. It can also combine results from multiple processes into a single response, and handle cross-cutting concerns like permissions and load balancing.
Beyond Vanilla: Alternative Tick Architectures
Chained Tickerplants
If the primary tickerplant runs in zero-latency mode — publishing every update immediately — subscribing a client that only needs to refresh a chart every few seconds becomes wasteful. A chained tickerplant solves this by subscribing to the primary tickerplant like any other consumer, then republishing data to its own subscribers at a lower frequency. Unlike the primary, it doesn’t maintain its own log file.
For example, a GUI-driven dashboard doesn’t need hundreds of updates per second. A chained tickerplant batching updates every 1000 milliseconds is more than sufficient. Multiple levels of chaining are also possible, allowing you to tailor update frequency to each consumer’s needs.
Chained RDBs
The same principle applies to the RDB. A chained RDB subscribes either to the primary tickerplant or to a chained tickerplant, but unlike the default RDB, it has no end-of-day processing beyond clearing its tables. The benefit is isolation: ordinary users query the chained RDB, keeping the primary RDB free to focus on data capture.
A chained RDB also doesn’t need to subscribe to the full dataset. It could hold only the instruments in a particular index, or subscribe to trades but not quotes. The trade-off is memory — each RDB is an independent in-memory process with no shared state.
Write-Only RDB
In some setups, the RDB is never queried intraday — its only purpose is to persist data to the HDB at end of day. In that case, holding the entire day’s data in memory is unnecessarily expensive. A write-only RDB addresses this by buffering incoming records and flushing them to disk in batches once a configured row threshold is reached. At end of day, the remaining data is flushed, sorted on disk, and moved into the appropriate date partition in the HDB.
This approach dramatically reduces the memory footprint, though it comes with a limitation: the process only holds a small, variable-sized buffer at any given time, making it unsuitable for intraday queries.
Mastering Time-Series Data with kdb+
The world of high-frequency data requires more than just storage; it demands precision, speed, and a robust architecture. As we’ve seen in this first part, the kdb+ tick architecture is the backbone of modern electronic trading and real-time analytics.
Ready to implement your own market data system or optimize your existing kdb+ stack? Our experts at MARGO specialize in high-performance computing and financial technology.
Contact our experts!Why is kdb+ preferred over traditional SQL databases for market data?
Traditional RDBMS struggle with the massive volume and velocity of time-series data. kdb+ uses a column-oriented storage model and vector-processing language (q), allowing it to process millions of records per second with sub-millisecond latency.
What is the core difference between an RDB and an HDB?
The Real-Time Database (RDB) holds the current day’s data in RAM for instant access. The Historical Database (HDB) provides queryable access to data from previous days stored on disk, typically partitioned by date.
How does the Tickerplant (TP) prevent data loss during a crash?
The TP writes every incoming record to a TP Log file on disk before publishing it. If a subscriber like the RDB fails, it can replay this log file upon restart to fully recover its state up to the last millisecond.
When should I use a Chained Tickerplant instead of a primary one?
Use a Chained Tickerplant for consumers that don’t need tick-by-tick updates (like GUIs). By batching updates every second, you offload the primary Tickerplant and reduce network overhead for non-critical subscribers.
Is Java compatible with a kdb+ architecture?
Absolutely. Through the javakdb library, Java services can act as feed handlers, subscribe to the Tickerplant for real-time updates, or query the HDB via a Gateway, combining enterprise logic with kdb+ performance.