Realtime Data Subscription
1. Overview
Kafka topic partitioning is deterministic:
partition = hash(key) % number_of_partitionsThe topic partition count and the publisher's partitioning logic MUST match. If a topic is created with 10 partitions, messages with the same key will always go to the same partition.
All messages MUST have a key, and the key MUST be the symbol name.
This ensures:
- Ordering per symbol: All messages for a symbol are processed in order
- Deterministic routing: Same key always goes to same partition
- Even distribution: Keys distribute evenly across partitions
- Consistent replay: Reprocessing produces identical partition assignments
2. Keying Standard
All messages must populate the key field. The key must always be the symbol name.
Examples:
| Asset Class | Key Format | Examples |
|---|---|---|
| Equity | Symbol name | AAPL, TSLA, MSFT |
| Futures | Contract symbol | NIFTY_FUT, ESZ25, NQH25 |
| Options | Root symbol | SPXW, SPY, AAPL |
Key Properties:
- Must be non-empty: Empty keys cause uneven distribution
- Must be consistent: Same symbol = same key across all producers
- Case-sensitive:
AAPL≠aapl(use consistent casing) - Immutable: Key never changes for a given message
Special Case: Options Keying
For options, the key is the root symbol, not the full OCC option symbol:
Key: "SPXW" (root symbol)
Payload: {
"symbol": "SPXW", "chain": "SPXW 251028C06875000", (full OCC symbol in payload) "strike": 6875.0, "put_or_call": "C", ...}Rationale:
- Cache Locality: All options for a root (all strikes, expirations, calls/puts) stay together
- Simpler Subscriptions: Clients can subscribe to "all SPXW options" via single key
- Message Ordering: Maintains ordering for all contracts of a given root
- Implementation: See
src/publishers/mock_market_data_publisher.cpp:310
Trade-off: High-volume roots (SPX, SPY) may bottleneck on single partition. For extreme volume scenarios, consider using full OCC symbol as key.
3. Partitioning Rules
Core Principles:
- Producer provides key: Kafka's client library computes the partition automatically
- Consistent hashing: All Kafka producers use Murmur2 hash algorithm by default
- Consumers don't compute: Kafka assigns partitions to consumers via consumer groups
- Fixed partition count: Changing partition count after production begins breaks ordering guarantees
- Configuration must match: Topic partition count and producer configuration must align
What NOT to Do:
- ❌ Manually compute partition in application code
- ❌ Change partition count on existing topics with data
- ❌ Use different hashing algorithms across producers
- ❌ Send messages without keys (unless order doesn't matter)
4. Topic Specifications
Equity Topics
market_data.equity.tbbo -> key = symbol, partition = hash(symbol) % N
market_data.equity.bars_1s -> key = symbol, partition = hash(symbol) % N
market_data.equity.bars_1m -> key = symbol, partition = hash(symbol) % N
market_data.equity.bars_1d -> key = symbol, partition = hash(symbol) % N
market_data.equity.imbalance -> key = symbol, partition = hash(symbol) % NFutures Topics
market_data.futures.tbbo -> key = symbol, partition = hash(symbol) % N
market_data.futures.bars_1s -> key = symbol, partition = hash(symbol) % N
market_data.futures.bars_1m -> key = symbol, partition = hash(symbol) % N
market_data.futures.bars_1d -> key = symbol, partition = hash(symbol) % NOptions Topics
market_data.options.snaps_1s -> key = root symbol, partition = hash(root) % NOptions Keying Note:
- Key is the root symbol (e.g.,
SPXW,SPY,AAPL) - NOT the full OCC symbol (e.g.,
SPXW 251028C06875000) - Full OCC symbol is included in message payload as
chainfield - This groups all strikes/expirations for a root in the same partition
- Ensures cache locality and simpler subscription model (subscribe to "all SPXW options")
- Trade-off: Single partition per root (may bottleneck for very high-volume roots)
Signals Topics
signals.khawk.quant_features -> key = symbol, partition = hash(symbol) % NNote: N = number of partitions configured for the topic (e.g., 10, 20, 50)
5. Producer Implementation
Best Practice (Recommended)
Let Kafka's producer library handle partition assignment by providing the key:
C++ (librdkafka)
// Configuration
const std::string topic = "market_data.equity.tbbo";
const std::string key = symbol; // e.g., "AAPL"
const std::string payload = json_data;
// Produce message - Kafka computes partition automatically
RdKafka::ErrorCode err = producer->produce(
topic, RdKafka::Topic::PARTITION_UA, // PARTITION_UA = Unassigned (let Kafka decide) RdKafka::Producer::RK_MSG_COPY, const_cast<void*>(static_cast<const void*>(payload.data())), payload.size(), key.c_str(), // Key for partitioning key.size(), 0, nullptr);
// Kafka internally computes: partition = murmur2(key) % num_partitionsPython (kafka-python)
from kafka import KafkaProducer
# Configuration
producer = KafkaProducer(
bootstrap_servers=['localhost:9092'], key_serializer=lambda k: k.encode('utf-8'), value_serializer=lambda v: v.encode('utf-8'))
# Produce message - Kafka computes partition automatically
topic = "market_data.equity.tbbo"
key = symbol # e.g., "AAPL"
payload = json_data
producer.send(
topic, key=key, # Kafka uses this to compute partition value=payload)
# Kafka internally computes: partition = murmur2(key) % num_partitionsWhat Kafka Does Internally
When you provide a key with PARTITION_UA:
- Kafka computes:
hash = murmur2(key.bytes) - Then:
partition = hash % num_partitions - Sends message to computed partition
- Guarantees ordering for all messages with same key
Manual Partitioning (NOT Recommended)
// ❌ AVOID: Manual partition computationint partition = custom_hash(key) % num_partitions;
producer.produce(topic, partition, payload, key);
// Problems:
// - Must replicate Kafka's Murmur2 hash exactly
// - Breaks if Kafka changes hashing algorithm
// - Error-prone and difficult to maintain
// - Inconsistent with standard Kafka practicesKey Takeaways
✅ DO: Provide key, use PARTITION_UA, let Kafka compute partition
❌ DON'T: Manually compute partition in application code
✅ DO: Trust Kafka's built-in Murmur2 partitioner
❌ DON'T: Implement custom partitioning logic
6. Topic Creation Convention
Development (Single Node)
kafka-topics.sh --create \ --topic market_data.equity.tbbo \ --bootstrap-server localhost:9092 \ --partitions 10 \ --replication-factor 1 \ --if-not-exists```
### Production (Cluster)
```bash
kafka-topics.sh --create \ --topic market_data.equity.tbbo \ --bootstrap-server kafka-broker-1:9092,kafka-broker-2:9092,kafka-broker-3:9092 \ --partitions 20 \ --replication-factor 3 \ --if-not-exists```
**Important Notes:**
- **Development**: `--replication-factor 1` (single broker)
- **Production**: `--replication-factor 3` (fault tolerance)
- **Partition count**: Choose based on throughput needs (10-50 typical)
- **Cannot decrease**: Can only increase partitions, but breaks ordering
### Partition Count Guidelines
| Expected Throughput | Symbols | Recommended Partitions |
|---------------------|---------|------------------------|
| Low (< 1k msg/s) | < 100 | 10 |
| Medium (1k-10k msg/s) | 100-1000 | 20 |
| High (10k-100k msg/s) | 1000+ | 50 |
| Very High (> 100k msg/s) | 10000+ | 100+ |
---
## 7. Consumer Behavior
### How Consumers Work
**Consumers DO NOT compute partitions.**
Kafka assigns partitions to consumers in a consumer group automatically.
### Consumer Group MechanicsTopic: market_data.equity.tbbo (10 partitions) Consumer Group: hiveq-dispatcher-group (3 consumers)
Kafka assigns:
- Consumer 1: partitions 0, 1, 2, 3
- Consumer 2: partitions 4, 5, 6
- Consumer 3: partitions 7, 8, 9
### Consumer Code (Simplified)
```cpp
// Create consumer
auto consumer = std::make_shared<KafkaConsumer>(
dispatcher_pool, "localhost:9092", "hiveq-dispatcher-group", num_kafka_partitions // For reference only, not used for partition assignment);
// Subscribe to topics - Kafka assigns partitions automatically
consumer->subscribe({"market_data.equity.tbbo", "market_data.equity.bars_1s"});
// Start consuming - reads from assigned partitions
consumer->start();
// Consumer loop
while (running) {
message = consumer->poll(); // Extract topic, key, payload // Process message}Rebalancing
When consumers join/leave the group:
- Kafka automatically rebalances partition assignments
- Each partition is assigned to exactly one consumer in the group
- Ordering per partition is maintained
8. HiveQ Architecture Notes
Two-Level Routing
HiveQ uses two independent routing mechanisms:
1. Kafka Partitioning (Broker-Level)
Producer → hash(key) % num_kafka_partitions → Kafka Partition- Determines which Kafka partition stores the message
- Handled by Kafka broker using Murmur2 hash
- Ensures ordering per symbol within Kafka
2. Dispatcher Routing (Application-Level)
Consumer → hash(key) % num_dispatcher_workers → Worker Thread- Determines which dispatcher worker processes the message
- Handled by HiveQ application using std::hash
- Distributes processing load across threads
These are independent:
- Message in Kafka partition 5 might go to dispatcher worker 2
- Different hash functions (Murmur2 vs std::hash)
- Different modulos (num_partitions vs num_workers)
- Both contribute to overall system performance
9. Summary Checklist
Before Production:
- Every topic has a fixed number of partitions
- All messages include a key (symbol name)
- Producers use
PARTITION_UAto let Kafka compute partitions - Topics created with appropriate replication factor (3 for production)
- Partition count matches expected throughput
- All producers in the ecosystem use standard Kafka clients
During Production:
- Monitor partition distribution (should be even)
- Monitor consumer lag per partition
- Never change partition count on active topics
- Ensure all messages have keys
- Verify ordering guarantees hold per symbol
Verification Commands:
# List topics and partition counts
kafka-topics.sh --list --bootstrap-server localhost:9092
# Describe topic details
kafka-topics.sh --describe --topic market_data.equity.tbbo --bootstrap-server localhost:9092
# Check consumer group status
kafka-consumer-groups.sh --describe --group hiveq-dispatcher-group --bootstrap-server localhost:9092
# Monitor message flow
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic market_data.equity.tbbo --property print.key=true10. Troubleshooting
Problem: Uneven partition distribution
Cause: Keys not distributed evenly (e.g., only a few symbols) Solution: Use more diverse keys or reduce partition count
Problem: Out-of-order messages for same symbol
Cause: Producer not setting key or using wrong partition
Solution: Verify key is set and using PARTITION_UA
Problem: "Unknown topic or partition" error
Cause: Topic not created before producing
Solution: Use --if-not-exists flag or auto-create topics (as implemented in HiveQ)
Problem: Consumer lag increasing
Cause: Too few partitions for throughput or too few consumers Solution: Increase partitions (new topic) or add more consumers to group
References
- Kafka Partitioning Documentation
- librdkafka Configuration
- HiveQ Implementation:
src/kafka/kafka_publisher.cpp:256(usesPARTITION_UA) - HiveQ Implementation:
src/kafka/kafka_consumer.cpp:123(subscribes to all partitions)
Document Version: 1.0 Last Updated: 2025-11-19 Maintained By: HiveQ Team