Data Providers
Data Provider Configuration
Complete guide for configuring data sources in HiveQ Flow.
Table of Contents
- Overview
- CSV Data Provider
- HiveQ Historical Data
- Databento Live Data
- Custom Data
- Multi-Provider Setup
Overview
HiveQ Flow supports multiple data providers for backtesting and live trading:
- CSV Files: Local CSV files with bar or custom data
- HiveQ Historical: HiveQ's historical market data API
- Databento: Live market data streaming
- Custom Data: User-provided signals, indicators, or any time-series data
Data providers are configured via the data_configs parameter in run_backtest() or run_live().
CSV Data Provider
Bar Data from CSV
Load OHLCV bar data from local CSV files.
Configuration:
import hiveq.flow as hf
data_configs = [{
'type': 'csv',
'data_type': 'bars_1m', # or bars_1d, etc.
'id': '1_MIN_BAR',
'path': 'bars/AAPL_bars.csv',
'use_absolute': False # Path relative to ~/.hiveq/data/
}]
report = hf.run_backtest(
strategy_configs=[...],
symbols=['AAPL'],
start_date='2025-08-01',
end_date='2025-08-02',
data_configs=data_configs
)CSV Format:
timestamp,symbol,open,high,low,close,volume
2025-08-01 09:30:00,AAPL,180.50,181.00,180.25,180.75,1000000
2025-08-01 09:31:00,AAPL,180.75,181.25,180.50,181.00,1100000
2025-08-01 09:32:00,AAPL,181.00,181.50,180.75,181.25,1050000Required Columns:
timestamp: DateTime in format 'YYYY-MM-DD HH:MM:SS'symbol: Trading symbolopen: Opening pricehigh: High pricelow: Low priceclose: Closing pricevolume: Trading volume
Multiple CSV Files
Load data for multiple symbols:
data_configs = [
{
'type': 'csv',
'data_type': 'bars_1m',
'id': 'AAPL_BARS',
'path': 'bars/AAPL_bars.csv'
},
{
'type': 'csv',
'data_type': 'bars_1m',
'id': 'GOOGL_BARS',
'path': 'bars/GOOGL_bars.csv'
}
]
report = hf.run_backtest(
strategy_configs=[...],
symbols=['AAPL', 'GOOGL'],
data_configs=data_configs
)Note: Multiple symbols can also be in a single CSV file (distinguished by symbol column).
Absolute Paths
Use absolute file paths:
data_configs = [{
'type': 'csv',
'data_type': 'bars_1m',
'id': '1_MIN_BAR',
'path': '/home/user/data/AAPL_bars.csv',
'use_absolute': True
}]Bar Intervals
Specify different bar intervals:
# 1-minute bars (smallest available granularity)
{'type': 'csv', 'data_type': 'bars_1m', ...}
# Daily bars
{'type': 'csv', 'data_type': 'bars_1d', ...}HiveQ Historical Data
Equity Data
Access HiveQ's historical US equity data:
Configuration:
data_configs = [{
'type': 'hiveq_historical',
'dataset': 'HIVEQ_US_EQ',
'schema': ['bars_1m'],
}]
report = hf.run_backtest(
strategy_configs=[...],
symbols=['AAPL', 'GOOGL', 'MSFT'],
start_date='2025-08-01',
end_date='2025-08-30',
data_configs=data_configs
)Parameters:
type:'hiveq_historical'dataset: Dataset identifier (use the correct dataset for your asset type!)'HIVEQ_US_EQ': US Equities (stocks only)'HIVEQ_US_OPT': US Options'HIVEQ_US_FUT': US Futures
schema: Data types to fetch['bars_1m']: 1-minute bars (smallest bar granularity)['bars_1d']: Daily bars['eq_trades']: Equity trade ticks (HIVEQ_US_EQ only — see Equity Trade Data)['fut_trades']: Futures trade ticks (HIVEQ_US_FUT only — see Futures Trade Data)
Futures Data
Historical futures bar data:
data_configs = [{
'type': 'hiveq_historical',
'dataset': 'HIVEQ_US_FUT',
'schema': ['bars_1m'],
}]
report = hf.run_backtest(
strategy_configs=[...],
symbols=['ES.1C'], # Continuous front month
start_date='2025-08-01',
end_date='2025-08-30',
data_configs=data_configs
)Equity Trade Data
The eq_trades schema provides individual trade tick data for equities. Unlike bar data (aggregated OHLCV), eq_trades delivers every trade execution with price, size, and aggressor side. This is the recommended schema for equity strategies that use executors (e.g., AlgoInstructionStrategy) as it provides the tick-level resolution needed for POV, TWAP, and other execution algorithms.
Configuration:
data_configs = [{
'type': 'hiveq_historical',
'dataset': 'HIVEQ_US_EQ',
'schema': ['eq_trades'],
}]
report = hf.run_backtest(
strategy_configs=[...],
symbols=['AAPL', 'MSFT'],
start_date='2025-09-01',
end_date='2025-09-05',
data_configs=data_configs
)Trade tick fields delivered to on_trade callback:
| Field | Type | Description |
|---|---|---|
symbol | string | Trading symbol (e.g., "AAPL") |
price | float | Execution price |
size | float | Trade quantity |
aggressor_side | string | "BUY", "SELL", or "NO_AGGRESSOR" |
trade_id | string | Unique trade identifier |
ts_event | int | Event timestamp (nanoseconds) |
Strategy callback:
def on_trade(self, ctx, event):
trade = event.data()
print(f"Trade: {trade.symbol} @ {trade.price} x {trade.size} ({trade.aggressor_side})")When to use eq_trades vs bars_1m:
| Schema | Use Case | Callback |
|---|---|---|
eq_trades | Executor-based strategies (POV, TWAP, VWAP), TCA, tick-level analysis | on_trade() + on_quote() |
bars_1m | Bar-based strategies (SMA crossover, breakout), lower data volume | on_bar() |
Note: eq_trades also delivers quote data (best bid/ask), so strategies will receive both on_trade() and on_quote() callbacks.
Futures Trade Data
The fut_trades schema provides individual trade tick data for futures contracts. Unlike bar data (aggregated OHLCV), fut_trades delivers every trade execution with price, size, and aggressor side. This is the recommended schema for strategies that use executors (e.g., AlgoInstructionStrategy) as it provides the tick-level resolution needed for POV, TWAP, and other execution algorithms.
Configuration:
data_configs = [{
'type': 'hiveq_historical',
'dataset': 'HIVEQ_US_FUT',
'schema': ['fut_trades'],
}]
report = hf.run_backtest(
strategy_configs=[...],
symbols=['ES.c.0'], # Continuous front month
start_date='2025-09-01',
end_date='2025-09-05',
session_start='18:00', # CME Globex: 6 PM ET previous day
session_end='17:00', # CME Globex: 5 PM ET
data_configs=data_configs
)Trade tick fields delivered to on_trade callback:
| Field | Type | Description |
|---|---|---|
symbol | string | Trading symbol (e.g., "ESH6") |
price | float | Execution price |
size | float | Trade quantity |
aggressor_side | string | "BUY", "SELL", or "NO_AGGRESSOR" |
trade_id | string | Unique trade identifier |
ts_event | int | Event timestamp (nanoseconds) |
Strategy callback:
def on_trade(self, ctx, event):
trade = event.data()
print(f"Trade: {trade.symbol} @ {trade.price} x {trade.size} ({trade.aggressor_side})")When to use fut_trades vs bars_1m:
| Schema | Use Case | Callback |
|---|---|---|
fut_trades | Executor-based strategies (POV, TWAP, VWAP), TCA, tick-level analysis | on_trade() + on_quote() |
bars_1m | Bar-based strategies (SMA crossover, breakout), lower data volume | on_bar() |
Note: fut_trades also delivers quote data (best bid/ask), so strategies will receive both on_trade() and on_quote() callbacks.
Futures with Auto-Rollover
For continuous contract trading with automatic position rollover:
data_configs = [{
'type': 'hiveq_historical',
'dataset': 'HIVEQ_US_FUT',
'schema': ['fut_trades'],
'filter_mode': 'continuous', # Auto-injected if enable_auto_rollover=True
}]
backtest_config = BacktestConfig(
symbols=['ES.v.0'], # Volume-weighted front month
start_date='2025-09-16',
end_date='2025-09-22',
session_start='18:00',
session_end='17:00',
enable_auto_rollover=True, # Auto-injects filter_mode and enableFuturesRollover
)Continuous contract symbols:
| Symbol | Description |
|---|---|
ES.c.0 | Calendar front month (rolls by expiration date) |
ES.v.0 | Volume-weighted front month (rolls when next contract has more volume) |
NQ.c.0 | Nasdaq futures calendar front month |
ESH6 | Specific contract (no rollover) |
When enable_auto_rollover=True, the framework automatically:
- Sets
filter_mode='continuous'in data configs - Sets
enableFuturesRollover=Truein strategy params - Injects rollover events when the continuous contract switches contracts
See the AlgoInstructionStrategy Specification for full rollover lifecycle details.
Futures Session Date Mapping (Overnight Sessions)
Futures use overnight sessions (e.g., CME Globex: 18:00 ET → 17:00 ET). When you specify a start_date, the framework automatically starts data from the previous calendar day's evening, not the start_date's evening.
Example: start_date='2026-01-02' with session_start='18:00', session_end='17:00'
Trading day: Jan 2 (Friday)
Session starts: Jan 1 (Thursday) at 18:00 ET ← previous calendar day
Session ends: Jan 2 (Friday) at 17:00 ET
Data fetched: 2026-01-01 23:00 UTC → 2026-01-02 22:00 UTCHow it works: The framework detects an overnight session when session_end <= session_start (e.g., 17:00 ≤ 18:00). It then shifts the session start to the previous calendar day.
| start_date | session_start | Actual data starts from | Actual data ends at |
|---|---|---|---|
| 2026-01-02 | 18:00 ET | Jan 1, 18:00 ET (prev day) | Jan 2, 17:00 ET |
| 2026-01-05 | 18:00 ET | Jan 4, 18:00 ET (prev day, Sunday) | Jan 5, 17:00 ET |
For multi-day backtests (start_date='2026-01-02', end_date='2026-01-05'):
- Day 1 session: Jan 1, 18:00 ET → Jan 2, 17:00 ET
- Day 2 session: Jan 2, 18:00 ET → Jan 3, 17:00 ET
- Day 3 session: Jan 4, 18:00 ET → Jan 5, 17:00 ET (skips weekend)
If session_start and session_end are not set, the framework auto-detects futures datasets and defaults to CME Globex session times (18:00-17:00 ET).
Options Data
Historical options data:
data_configs = [{
'type': 'hiveq_historical',
'dataset': 'HIVEQ_US_OPT',
'schema': ['bars_1m'],
}]
report = hf.run_backtest(
strategy_configs=[...],
symbols=['SPY251014C00450000'],
start_date='2025-10-01',
end_date='2025-10-14',
data_configs=data_configs
)Multiple Schemas
Fetch multiple data types:
data_configs = [{
'type': 'hiveq_historical',
'dataset': 'HIVEQ_US_EQ',
'schema': [
'bars_1m', # 1-minute bars
'bars_1d', # Daily bars
],
}]Your strategy will receive all requested data types.
Data Path Configuration
HiveQ historical data is cached locally:
Default Path: ~/.hiveq/data/
Custom Path:
Databento Live Data
Live Equity Data
Stream real-time equity data:
Configuration:
data_configs = [{
'type': 'databento',
'id': 'databento_client',
'api_key': 'db-your_api_key',
'venue_dataset_map': {
'SIM': 'EQUS.MINI' # Simulated equities
}
}]
hf.run_live(
strategy_configs=[...],
symbols=['AAPL'],
data_configs=data_configs
)Parameters:
type:'databento'id: Client identifier (unique name)api_key: Your Databento API keyvenue_dataset_map: Mapping of venue to dataset'SIM': Simulated venue for testing'NASDAQ','NYSE': Real venues (requires appropriate subscription)
Live Futures Data
Stream real-time futures data:
data_configs = [{
'type': 'databento',
'id': 'databento_client',
'api_key': 'db-your_api_key',
'venue_dataset_map': {
'XCME': 'GLBX.MDP3' # CME Globex
}
}]
hf.run_live(
strategy_configs=[...],
symbols=['ESZ5.XCME', 'NQZ5.XCME'],
data_configs=data_configs
)Venue-Dataset Map:
'XCME': 'GLBX.MDP3'- CME Globex futures'XCBT': 'GLBX.MDP3'- CBOT futures'XNYM': 'GLBX.MDP3'- NYMEX futures
Multiple Venues
Stream from multiple venues:
data_configs = [{
'type': 'databento',
'id': 'databento_client',
'api_key': 'db-your_api_key',
'venue_dataset_map': {
'SIM': 'EQUS.MINI', # Simulated equities
'XCME': 'GLBX.MDP3', # CME futures
'NASDAQ': 'XNAS.ITCH' # NASDAQ Level 2
}
}]Custom Data
Custom Signals/Indicators
Integrate custom signals with market data:
Configuration:
data_configs = [
{
'type': 'hiveq_historical',
'dataset': 'HIVEQ_US_EQ',
'schema': ['bars_1m'],
},
{
'type': 'csv',
'data_type': 'custom',
'id': 'UserSignals',
'path': 'userdata/signals.csv'
}
]
report = hf.run_backtest(
strategy_configs=[...],
symbols=['AAPL'],
start_date='2025-08-01',
end_date='2025-08-30',
data_configs=data_configs
)Custom Data CSV:
timestamp,symbol,signal,value,confidence
2025-08-01 09:30:00,AAPL,BUY,1.5,0.85
2025-08-01 09:31:00,AAPL,HOLD,0.2,0.60
2025-08-01 09:32:00,AAPL,SELL,-1.2,0.90Required Columns:
timestamp: DateTime in format 'YYYY-MM-DD HH:MM:SS'symbol: Trading symbol- Additional columns are user-defined (signal, value, confidence, etc.)
Accessing Custom Data
In your strategy:
import hiveq.flow as hf
from hiveq.flow.config import EventType
class CustomDataStrategy:
def on_hiveq_event(self, ctx: hf.Context, event):
if event.type == EventType.START:
# Subscribe to bars
ctx.subscribe_bars(
symbols=ctx.strategy_config.symbols,
interval='1m'
)
# Subscribe to custom data
ctx.subscribe_data(data_id='UserSignals')
elif event.type == EventType.CUSTOM_DATA:
# Process custom data
data = event.data()
# Access custom fields
symbol = data.symbol
signal = data.signal
value = data.value
confidence = data.confidence
# Use in trading logic
if signal == 'BUY' and confidence > 0.8:
ctx.buy_order(symbol, quantity=100)Multiple Custom Data Sources
Use multiple custom data sources:
data_configs = [
{
'type': 'hiveq_historical',
'dataset': 'HIVEQ_US_EQ',
'schema': ['bars_1m'],
},
{
'type': 'csv',
'data_type': 'custom',
'id': 'MLSignals',
'path': 'userdata/ml_signals.csv'
},
{
'type': 'csv',
'data_type': 'custom',
'id': 'SentimentData',
'path': 'userdata/sentiment.csv'
}
]
# In strategy
def on_start(self, ctx: hf.Context, event):
ctx.subscribe_data(data_id='MLSignals')
ctx.subscribe_data(data_id='SentimentData')
def on_hiveq_event(self, ctx: hf.Context, event):
if event.type == EventType.CUSTOM_DATA:
data = event.data()
# Distinguish by checking fields or metadata
if hasattr(data, 'ml_signal'):
# Process ML signals
pass
elif hasattr(data, 'sentiment_score'):
# Process sentiment data
passHiveQ Quant Signals
Subscribe to signals from the HiveQ Quant Signals dataset:
Option 1: Static symbols in data_config
data_configs = [
{
'type': 'hiveq_historical',
'dataset': 'HIVEQ_US_EQ',
'schema': ['bars_1m'],
},
{
'type': 'hiveq_historical',
'dataset': 'HIVEQ_QUANT_SIGNALS',
'schema': ['signals'],
'id': 'MySignals',
'symbols': ['signal_id_1', 'signal_id_2'] # Static specification
}
]
# In strategy
def on_start(self, ctx, event):
ctx.subscribe_data(data_id='MySignals')Option 2: Dynamic subscription (signals in subscribe_data)
# No 'symbols' in data_config - specified dynamically in strategy
data_configs = [
{
'type': 'hiveq_historical',
'dataset': 'HIVEQ_US_EQ',
'schema': ['bars_1m'],
},
{
'type': 'hiveq_historical',
'dataset': 'HIVEQ_QUANT_SIGNALS',
'schema': ['signals'],
'id': 'MySignals' # No 'symbols' here
}
]
# In strategy - signals determined at runtime
def on_start(self, ctx, event):
# Signals captured during prefetch and used to filter API data fetch
signal_ids = ['signal_id_1', 'signal_id_2'] # Can be dynamic
ctx.subscribe_data(data_id='MySignals', signals=signal_ids)
def on_custom_data(self, ctx, event):
data = event.data()
symbol = data.column_data('symbol') # Signal ID
signal_json = data.column_data('signal_json') # Signal payloadNote: If both data_config['symbols'] and signals parameter are specified, data_config['symbols'] takes priority.
Multi-Provider Setup
Combining Providers
Mix different data providers:
Example 1: CSV Bars + Custom Signals
data_configs = [
{
'type': 'csv',
'data_type': 'bars_1m',
'id': '1_MIN_BAR',
'path': 'bars/AAPL_bars.csv'
},
{
'type': 'csv',
'data_type': 'custom',
'id': 'UserSignals',
'path': 'userdata/signals.csv'
}
]Example 2: HiveQ Historical + Custom Data
data_configs = [
{
'type': 'hiveq_historical',
'dataset': 'HIVEQ_US_EQ',
'schema': ['bars_1m'],
},
{
'type': 'csv',
'data_type': 'custom',
'id': 'AlphaSignals',
'path': 'userdata/alpha_signals.csv'
}
]Example 3: Multiple Asset Classes
data_configs = [
{
'type': 'hiveq_historical',
'dataset': 'HIVEQ_US_EQ',
'schema': ['bars_1m'],
},
{
'type': 'hiveq_historical',
'dataset': 'HIVEQ_US_FUT',
'schema': ['bars_1m'],
},
{
'type': 'csv',
'data_type': 'custom',
'id': 'CrossAssetSignals',
'path': 'userdata/cross_asset.csv'
}
]
report = hf.run_backtest(
strategy_configs=[...],
symbols=['AAPL', 'ES.1C'], # Equity + Futures
data_configs=data_configs
)Data Synchronization
All data sources are automatically synchronized by timestamp:
- Bar Data: Delivered based on bar timestamp
- Custom Data: Matched to nearest bar by timestamp
- Events: Processed in chronological order
Timeline:
09:30:00 - Bar (AAPL)
09:30:00 - Custom Data (Signal)
09:31:00 - Bar (AAPL)
09:31:15 - Custom Data (Signal) → Synced to 09:31:00 bar
09:32:00 - Bar (AAPL)Configuration Examples
Complete Backtest Configuration
import hiveq.flow as hf
from hiveq.flow import StrategyConfig, BacktestConfig
# Strategy
strategy_configs = [
StrategyConfig(
name='MultiAssetStrategy',
type='MyStrategy',
params={
'window': 20,
'trade_size': 100
}
)
]
# Data
data_configs = [
{
'type': 'hiveq_historical',
'dataset': 'HIVEQ_US_EQ',
'schema': ['bars_1m'],
},
{
'type': 'csv',
'data_type': 'custom',
'id': 'UserSignals',
'path': 'userdata/signals.csv'
}
]
# Backtest config
backtest_config = BacktestConfig(
initial_capital=1000000.0,
commission=0.001, # 0.1%
slippage=0.0005, # 0.05%
start_date='2025-08-01',
end_date='2025-08-30'
)
report = hf.run_backtest(
strategy_configs=strategy_configs,
symbols=['AAPL', 'GOOGL'],
data_configs=data_configs,
backtest_config=backtest_config
)
print(results.return_stats.to_string())Complete Live Trading Configuration
import hiveq.flow as hf
from hiveq.flow import StrategyConfig, LiveConfig
# Strategy
strategy_configs = [
StrategyConfig(
name='LiveStrategy',
type='MyStrategy'
)
]
# Live data
data_configs = [{
'type': 'databento',
'id': 'databento_main',
'api_key': 'db-your_api_key',
'venue_dataset_map': {
'SIM': 'EQUS.MINI',
'XCME': 'GLBX.MDP3'
}
}]
# Live config
live_config = LiveConfig(
venue='SIM',
broker='sandbox',
account_id='DEMO-001',
starting_balances=['1000000 USD'],
paper_trading=True
)
hf.run_live(
strategy_configs=strategy_configs,
symbols=['AAPL', 'ESZ5.XCME'],
data_configs=data_configs,
live_config=live_config
)Data Provider Comparison
| Provider | Use Case | Asset Types | Latency | Cost |
|---|---|---|---|---|
| CSV | Backtesting, prototyping | All | N/A | Free |
| HiveQ Historical | Production backtesting | Equities, Futures, Options | N/A | Subscription |
| Databento Live | Live trading | Equities, Futures, Options | Real-time | Usage-based |
| Custom Data | Signals, indicators | Any | N/A | Free |
HiveQ Historical Datasets and Schemas:
| Dataset | Asset Type | Available Schemas |
|---|---|---|
HIVEQ_US_EQ | US Equities | bars_1m, bars_1d, eq_trades |
HIVEQ_US_FUT | US Futures | bars_1m, bars_1d, fut_trades |
HIVEQ_US_OPT | US Options | bars_1m, bars_1d |
HIVEQ_QUANT_SIGNALS | Signals | signals |
Best Practices
1. Data Quality
Ensure CSV data is clean:
- No missing values
- Chronological order
- Consistent timestamp format
- Valid OHLCV values (High >= Low, Close within range)
2. File Paths
Use relative paths for portability:
# Good
'path': 'bars/AAPL_bars.csv'
# Less portable
'path': '/home/user/data/bars/AAPL_bars.csv'3. Data Location
HiveQ Flow data directory:
~/.hiveq/
├── data/
│ ├── bars/ # Bar data CSVs
│ ├── userdata/ # Custom data CSVs
│ └── universe/ # Universe files
├── flow/
│ └── resources/ # Strategy resources
└── logs/ # Log files4. Custom Data Timestamps
Ensure custom data timestamps align with bars:
# Custom data at 09:30:15 will sync to bar at 09:30:00
# Custom data at 09:31:45 will sync to bar at 09:31:005. Multiple Schemas
When using multiple schemas, strategy receives all:
data_configs = [{
'type': 'hiveq_historical',
'dataset': 'HIVEQ_US_EQ',
'schema': ['bars_1m', 'bars_1d'],
}]
# Strategy receives:
# - 1-minute bars
# - Daily bars
# Distinguish by checking bar interval or timestamp
def on_bar(self, ctx: hf.Context, event):
bar = event.data()
# Process based on intervalTroubleshooting
CSV File Not Found
Error: FileNotFoundError: bars/AAPL_bars.csv
Solution:
- Check file exists in
~/.hiveq/data/bars/ - Use absolute path with
use_absolute: True - Verify path separator (use
/not\)
HiveQ Historical Data Not Loading
Error: Failed to fetch historical data
Solution:
- Verify API key is valid
- Check internet connection
- Ensure date range has data available
- Verify dataset name is correct
Custom Data Not Appearing
Issue: Custom data events not firing
Solution:
- Verify
ctx.subscribe_data(data_id='...')is called - Check data_id matches config 'id' field
- Ensure timestamps overlap with bar data
- Verify CSV format is correct
Databento Connection Issues
Error: Failed to connect to Databento
Solution:
- Verify API key is valid
- Check venue-dataset mapping
- Ensure proper subscription for venue
- Test with 'SIM' venue first
Summary
Data provider configuration in HiveQ Flow:
- CSV: Local files, great for prototyping
- HiveQ Historical: Production-grade historical data
- Databento: Real-time live data
- Custom Data: Integrate any signals or indicators
All providers work seamlessly together, synchronized by timestamp.
Configure via data_configs parameter in run_backtest() or run_live().