Data Provider Configuration

Complete guide for configuring data sources in HiveQ Flow.

Overview
CSV Data Provider
HiveQ Historical Data
Databento Live Data
Custom Data
Multi-Provider Setup

Overview

HiveQ Flow supports multiple data providers for backtesting and live trading:

CSV Files: Local CSV files with bar or custom data
HiveQ Historical: HiveQ's historical market data API
Databento: Live market data streaming
Custom Data: User-provided signals, indicators, or any time-series data

Data providers are configured via the data_configs parameter in run_backtest() or run_live().

CSV Data Provider

Bar Data from CSV

Load OHLCV bar data from local CSV files.

Configuration:

import hiveq.flow as hf

data_configs = [{
    'type': 'csv',
    'data_type': 'bars_1m',  # or bars_1d, etc.
    'id': '1_MIN_BAR',
    'path': 'bars/AAPL_bars.csv',
    'use_absolute': False  # Path relative to ~/.hiveq/data/
}]

report = hf.run_backtest(
    strategy_configs=[...],
    symbols=['AAPL'],
    start_date='2025-08-01',
    end_date='2025-08-02',
    data_configs=data_configs
)

CSV Format:

timestamp,symbol,open,high,low,close,volume
2025-08-01 09:30:00,AAPL,180.50,181.00,180.25,180.75,1000000
2025-08-01 09:31:00,AAPL,180.75,181.25,180.50,181.00,1100000
2025-08-01 09:32:00,AAPL,181.00,181.50,180.75,181.25,1050000

Required Columns:

timestamp: DateTime in format 'YYYY-MM-DD HH:MM:SS'
symbol: Trading symbol
open: Opening price
high: High price
low: Low price
close: Closing price
volume: Trading volume

Multiple CSV Files

Load data for multiple symbols:

data_configs = [
    {
        'type': 'csv',
        'data_type': 'bars_1m',
        'id': 'AAPL_BARS',
        'path': 'bars/AAPL_bars.csv'
    },
    {
        'type': 'csv',
        'data_type': 'bars_1m',
        'id': 'GOOGL_BARS',
        'path': 'bars/GOOGL_bars.csv'
    }
]

report = hf.run_backtest(
    strategy_configs=[...],
    symbols=['AAPL', 'GOOGL'],
    data_configs=data_configs
)

Note: Multiple symbols can also be in a single CSV file (distinguished by symbol column).

Absolute Paths

Use absolute file paths:

data_configs = [{
    'type': 'csv',
    'data_type': 'bars_1m',
    'id': '1_MIN_BAR',
    'path': '/home/user/data/AAPL_bars.csv',
    'use_absolute': True
}]

Bar Intervals

Specify different bar intervals:

# 1-minute bars (smallest available granularity)
{'type': 'csv', 'data_type': 'bars_1m', ...}

# Daily bars
{'type': 'csv', 'data_type': 'bars_1d', ...}

HiveQ Historical Data

Equity Data

Access HiveQ's historical US equity data:

Configuration:

data_configs = [{
    'type': 'hiveq_historical',
    'dataset': 'HIVEQ_US_EQ',
    'schema': ['bars_1m'],
}]

report = hf.run_backtest(
    strategy_configs=[...],
    symbols=['AAPL', 'GOOGL', 'MSFT'],
    start_date='2025-08-01',
    end_date='2025-08-30',
    data_configs=data_configs
)

Parameters:

type: 'hiveq_historical'
dataset: Dataset identifier (use the correct dataset for your asset type!)
- 'HIVEQ_US_EQ': US Equities (stocks only)
- 'HIVEQ_US_OPT': US Options
- 'HIVEQ_US_FUT': US Futures
schema: Data types to fetch
- ['bars_1m']: 1-minute bars (smallest bar granularity)
- ['bars_1d']: Daily bars
- ['eq_trades']: Equity trade ticks (HIVEQ_US_EQ only — see Equity Trade Data)
- ['fut_trades']: Futures trade ticks (HIVEQ_US_FUT only — see Futures Trade Data)

Futures Data

Historical futures bar data:

data_configs = [{
    'type': 'hiveq_historical',
    'dataset': 'HIVEQ_US_FUT',
    'schema': ['bars_1m'],
}]

report = hf.run_backtest(
    strategy_configs=[...],
    symbols=['ES.1C'],  # Continuous front month
    start_date='2025-08-01',
    end_date='2025-08-30',
    data_configs=data_configs
)

The eq_trades schema provides individual trade tick data for equities. Unlike bar data (aggregated OHLCV), eq_trades delivers every trade execution with price, size, and aggressor side. This is the recommended schema for equity strategies that use executors (e.g., AlgoInstructionStrategy) as it provides the tick-level resolution needed for POV, TWAP, and other execution algorithms.

Configuration:

data_configs = [{
    'type': 'hiveq_historical',
    'dataset': 'HIVEQ_US_EQ',
    'schema': ['eq_trades'],
}]

report = hf.run_backtest(
    strategy_configs=[...],
    symbols=['AAPL', 'MSFT'],
    start_date='2025-09-01',
    end_date='2025-09-05',
    data_configs=data_configs
)

Trade tick fields delivered to on_trade callback:

Field	Type	Description
`symbol`	string	Trading symbol (e.g., "AAPL")
`price`	float	Execution price
`size`	float	Trade quantity
`aggressor_side`	string	"BUY", "SELL", or "NO_AGGRESSOR"
`trade_id`	string	Unique trade identifier
`ts_event`	int	Event timestamp (nanoseconds)

Strategy callback:

def on_trade(self, ctx, event):
    trade = event.data()
    print(f"Trade: {trade.symbol} @ {trade.price} x {trade.size} ({trade.aggressor_side})")

When to use eq_trades vs bars_1m:

Schema	Use Case	Callback
`eq_trades`	Executor-based strategies (POV, TWAP, VWAP), TCA, tick-level analysis	`on_trade()` + `on_quote()`
`bars_1m`	Bar-based strategies (SMA crossover, breakout), lower data volume	`on_bar()`

Note: eq_trades also delivers quote data (best bid/ask), so strategies will receive both on_trade() and on_quote() callbacks.

Futures Trade Data

The fut_trades schema provides individual trade tick data for futures contracts. Unlike bar data (aggregated OHLCV), fut_trades delivers every trade execution with price, size, and aggressor side. This is the recommended schema for strategies that use executors (e.g., AlgoInstructionStrategy) as it provides the tick-level resolution needed for POV, TWAP, and other execution algorithms.

Configuration:

data_configs = [{
    'type': 'hiveq_historical',
    'dataset': 'HIVEQ_US_FUT',
    'schema': ['fut_trades'],
}]

report = hf.run_backtest(
    strategy_configs=[...],
    symbols=['ES.c.0'],     # Continuous front month
    start_date='2025-09-01',
    end_date='2025-09-05',
    session_start='18:00',  # CME Globex: 6 PM ET previous day
    session_end='17:00',    # CME Globex: 5 PM ET
    data_configs=data_configs
)

Trade tick fields delivered to on_trade callback:

Field	Type	Description
`symbol`	string	Trading symbol (e.g., "ESH6")
`price`	float	Execution price
`size`	float	Trade quantity
`aggressor_side`	string	"BUY", "SELL", or "NO_AGGRESSOR"
`trade_id`	string	Unique trade identifier
`ts_event`	int	Event timestamp (nanoseconds)

Strategy callback:

def on_trade(self, ctx, event):
    trade = event.data()
    print(f"Trade: {trade.symbol} @ {trade.price} x {trade.size} ({trade.aggressor_side})")

When to use fut_trades vs bars_1m:

Schema	Use Case	Callback
`fut_trades`	Executor-based strategies (POV, TWAP, VWAP), TCA, tick-level analysis	`on_trade()` + `on_quote()`
`bars_1m`	Bar-based strategies (SMA crossover, breakout), lower data volume	`on_bar()`

Note: fut_trades also delivers quote data (best bid/ask), so strategies will receive both on_trade() and on_quote() callbacks.

Futures with Auto-Rollover

For continuous contract trading with automatic position rollover:

data_configs = [{
    'type': 'hiveq_historical',
    'dataset': 'HIVEQ_US_FUT',
    'schema': ['fut_trades'],
    'filter_mode': 'continuous',  # Auto-injected if enable_auto_rollover=True
}]

backtest_config = BacktestConfig(
    symbols=['ES.v.0'],          # Volume-weighted front month
    start_date='2025-09-16',
    end_date='2025-09-22',
    session_start='18:00',
    session_end='17:00',
    enable_auto_rollover=True,   # Auto-injects filter_mode and enableFuturesRollover
)

Continuous contract symbols:

Symbol	Description
`ES.c.0`	Calendar front month (rolls by expiration date)
`ES.v.0`	Volume-weighted front month (rolls when next contract has more volume)
`NQ.c.0`	Nasdaq futures calendar front month
`ESH6`	Specific contract (no rollover)

When enable_auto_rollover=True, the framework automatically:

Sets filter_mode='continuous' in data configs
Sets enableFuturesRollover=True in strategy params
Injects rollover events when the continuous contract switches contracts

See the AlgoInstructionStrategy Specification for full rollover lifecycle details.

Futures Session Date Mapping (Overnight Sessions)

Futures use overnight sessions (e.g., CME Globex: 18:00 ET → 17:00 ET). When you specify a start_date, the framework automatically starts data from the previous calendar day's evening, not the start_date's evening.

Example: start_date='2026-01-02' with session_start='18:00', session_end='17:00'

Trading day: Jan 2 (Friday)
  Session starts: Jan 1 (Thursday) at 18:00 ET  ← previous calendar day
  Session ends:   Jan 2 (Friday)   at 17:00 ET

Data fetched: 2026-01-01 23:00 UTC → 2026-01-02 22:00 UTC

How it works: The framework detects an overnight session when session_end <= session_start (e.g., 17:00 ≤ 18:00). It then shifts the session start to the previous calendar day.

start_date	session_start	Actual data starts from	Actual data ends at
2026-01-02	18:00 ET	Jan 1, 18:00 ET (prev day)	Jan 2, 17:00 ET
2026-01-05	18:00 ET	Jan 4, 18:00 ET (prev day, Sunday)	Jan 5, 17:00 ET

For multi-day backtests (start_date='2026-01-02', end_date='2026-01-05'):

Day 1 session: Jan 1, 18:00 ET → Jan 2, 17:00 ET
Day 2 session: Jan 2, 18:00 ET → Jan 3, 17:00 ET
Day 3 session: Jan 4, 18:00 ET → Jan 5, 17:00 ET (skips weekend)

If session_start and session_end are not set, the framework auto-detects futures datasets and defaults to CME Globex session times (18:00-17:00 ET).

Options Data

Historical options data:

data_configs = [{
    'type': 'hiveq_historical',
    'dataset': 'HIVEQ_US_OPT',
    'schema': ['bars_1m'],
}]

report = hf.run_backtest(
    strategy_configs=[...],
    symbols=['SPY251014C00450000'],
    start_date='2025-10-01',
    end_date='2025-10-14',
    data_configs=data_configs
)

Multiple Schemas

Fetch multiple data types:

data_configs = [{
    'type': 'hiveq_historical',
    'dataset': 'HIVEQ_US_EQ',
    'schema': [
        'bars_1m',    # 1-minute bars
        'bars_1d',    # Daily bars
    ],
}]

Your strategy will receive all requested data types.

Data Path Configuration

HiveQ historical data is cached locally:

Default Path: ~/.hiveq/data/

Custom Path:

Databento Live Data

Live Equity Data

Stream real-time equity data:

Configuration:

data_configs = [{
    'type': 'databento',
    'id': 'databento_client',
    'api_key': 'db-your_api_key',
    'venue_dataset_map': {
        'SIM': 'EQUS.MINI'  # Simulated equities
    }
}]

hf.run_live(
    strategy_configs=[...],
    symbols=['AAPL'],
    data_configs=data_configs
)

Parameters:

type: 'databento'
id: Client identifier (unique name)
api_key: Your Databento API key
venue_dataset_map: Mapping of venue to dataset
- 'SIM': Simulated venue for testing
- 'NASDAQ', 'NYSE': Real venues (requires appropriate subscription)

Live Futures Data

Stream real-time futures data:

data_configs = [{
    'type': 'databento',
    'id': 'databento_client',
    'api_key': 'db-your_api_key',
    'venue_dataset_map': {
        'XCME': 'GLBX.MDP3'  # CME Globex
    }
}]

hf.run_live(
    strategy_configs=[...],
    symbols=['ESZ5.XCME', 'NQZ5.XCME'],
    data_configs=data_configs
)

Venue-Dataset Map:

'XCME': 'GLBX.MDP3' - CME Globex futures
'XCBT': 'GLBX.MDP3' - CBOT futures
'XNYM': 'GLBX.MDP3' - NYMEX futures

Multiple Venues

Stream from multiple venues:

data_configs = [{
    'type': 'databento',
    'id': 'databento_client',
    'api_key': 'db-your_api_key',
    'venue_dataset_map': {
        'SIM': 'EQUS.MINI',      # Simulated equities
        'XCME': 'GLBX.MDP3',     # CME futures
        'NASDAQ': 'XNAS.ITCH'    # NASDAQ Level 2
    }
}]

Custom Data

Custom Signals/Indicators

Integrate custom signals with market data:

Configuration:

data_configs = [
    {
        'type': 'hiveq_historical',
        'dataset': 'HIVEQ_US_EQ',
        'schema': ['bars_1m'],
    },
    {
        'type': 'csv',
        'data_type': 'custom',
        'id': 'UserSignals',
        'path': 'userdata/signals.csv'
    }
]

report = hf.run_backtest(
    strategy_configs=[...],
    symbols=['AAPL'],
    start_date='2025-08-01',
    end_date='2025-08-30',
    data_configs=data_configs
)

Custom Data CSV:

timestamp,symbol,signal,value,confidence
2025-08-01 09:30:00,AAPL,BUY,1.5,0.85
2025-08-01 09:31:00,AAPL,HOLD,0.2,0.60
2025-08-01 09:32:00,AAPL,SELL,-1.2,0.90

Required Columns:

timestamp: DateTime in format 'YYYY-MM-DD HH:MM:SS'
symbol: Trading symbol
Additional columns are user-defined (signal, value, confidence, etc.)

Accessing Custom Data

In your strategy:

import hiveq.flow as hf
from hiveq.flow.config import EventType


class CustomDataStrategy:

    def on_hiveq_event(self, ctx: hf.Context, event):
        if event.type == EventType.START:
            # Subscribe to bars
            ctx.subscribe_bars(
                symbols=ctx.strategy_config.symbols,
                interval='1m'
            )

            # Subscribe to custom data
            ctx.subscribe_data(data_id='UserSignals')

        elif event.type == EventType.CUSTOM_DATA:
            # Process custom data
            data = event.data()

            # Access custom fields
            symbol = data.symbol
            signal = data.signal
            value = data.value
            confidence = data.confidence

            # Use in trading logic
            if signal == 'BUY' and confidence > 0.8:
                ctx.buy_order(symbol, quantity=100)

Multiple Custom Data Sources

Use multiple custom data sources:

data_configs = [
    {
        'type': 'hiveq_historical',
        'dataset': 'HIVEQ_US_EQ',
        'schema': ['bars_1m'],
    },
    {
        'type': 'csv',
        'data_type': 'custom',
        'id': 'MLSignals',
        'path': 'userdata/ml_signals.csv'
    },
    {
        'type': 'csv',
        'data_type': 'custom',
        'id': 'SentimentData',
        'path': 'userdata/sentiment.csv'
    }
]

# In strategy
def on_start(self, ctx: hf.Context, event):
    ctx.subscribe_data(data_id='MLSignals')
    ctx.subscribe_data(data_id='SentimentData')

def on_hiveq_event(self, ctx: hf.Context, event):
    if event.type == EventType.CUSTOM_DATA:
        data = event.data()

        # Distinguish by checking fields or metadata
        if hasattr(data, 'ml_signal'):
            # Process ML signals
            pass
        elif hasattr(data, 'sentiment_score'):
            # Process sentiment data
            pass

HiveQ Quant Signals

Subscribe to signals from the HiveQ Quant Signals dataset:

Option 1: Static symbols in data_config

data_configs = [
    {
        'type': 'hiveq_historical',
        'dataset': 'HIVEQ_US_EQ',
        'schema': ['bars_1m'],
    },
    {
        'type': 'hiveq_historical',
        'dataset': 'HIVEQ_QUANT_SIGNALS',
        'schema': ['signals'],
        'id': 'MySignals',
        'symbols': ['signal_id_1', 'signal_id_2']  # Static specification
    }
]

# In strategy
def on_start(self, ctx, event):
    ctx.subscribe_data(data_id='MySignals')

Option 2: Dynamic subscription (signals in subscribe_data)

# No 'symbols' in data_config - specified dynamically in strategy
data_configs = [
    {
        'type': 'hiveq_historical',
        'dataset': 'HIVEQ_US_EQ',
        'schema': ['bars_1m'],
    },
    {
        'type': 'hiveq_historical',
        'dataset': 'HIVEQ_QUANT_SIGNALS',
        'schema': ['signals'],
        'id': 'MySignals'  # No 'symbols' here
    }
]

# In strategy - signals determined at runtime
def on_start(self, ctx, event):
    # Signals captured during prefetch and used to filter API data fetch
    signal_ids = ['signal_id_1', 'signal_id_2']  # Can be dynamic
    ctx.subscribe_data(data_id='MySignals', signals=signal_ids)

def on_custom_data(self, ctx, event):
    data = event.data()
    symbol = data.column_data('symbol')  # Signal ID
    signal_json = data.column_data('signal_json')  # Signal payload

Note: If both data_config['symbols'] and signals parameter are specified, data_config['symbols'] takes priority.

Multi-Provider Setup

Combining Providers

Mix different data providers:

Example 1: CSV Bars + Custom Signals

data_configs = [
    {
        'type': 'csv',
        'data_type': 'bars_1m',
        'id': '1_MIN_BAR',
        'path': 'bars/AAPL_bars.csv'
    },
    {
        'type': 'csv',
        'data_type': 'custom',
        'id': 'UserSignals',
        'path': 'userdata/signals.csv'
    }
]

Example 2: HiveQ Historical + Custom Data

data_configs = [
    {
        'type': 'hiveq_historical',
        'dataset': 'HIVEQ_US_EQ',
        'schema': ['bars_1m'],
    },
    {
        'type': 'csv',
        'data_type': 'custom',
        'id': 'AlphaSignals',
        'path': 'userdata/alpha_signals.csv'
    }
]

Example 3: Multiple Asset Classes

data_configs = [
    {
        'type': 'hiveq_historical',
        'dataset': 'HIVEQ_US_EQ',
        'schema': ['bars_1m'],
    },
    {
        'type': 'hiveq_historical',
        'dataset': 'HIVEQ_US_FUT',
        'schema': ['bars_1m'],
    },
    {
        'type': 'csv',
        'data_type': 'custom',
        'id': 'CrossAssetSignals',
        'path': 'userdata/cross_asset.csv'
    }
]

report = hf.run_backtest(
    strategy_configs=[...],
    symbols=['AAPL', 'ES.1C'],  # Equity + Futures
    data_configs=data_configs
)

Data Synchronization

All data sources are automatically synchronized by timestamp:

Bar Data: Delivered based on bar timestamp
Custom Data: Matched to nearest bar by timestamp
Events: Processed in chronological order

Timeline:

09:30:00 - Bar (AAPL)
09:30:00 - Custom Data (Signal)
09:31:00 - Bar (AAPL)
09:31:15 - Custom Data (Signal) → Synced to 09:31:00 bar
09:32:00 - Bar (AAPL)

Configuration Examples

Complete Backtest Configuration

import hiveq.flow as hf
from hiveq.flow import StrategyConfig, BacktestConfig

# Strategy
strategy_configs = [
    StrategyConfig(
        name='MultiAssetStrategy',
        type='MyStrategy',
        params={
            'window': 20,
            'trade_size': 100
        }
    )
]

# Data
data_configs = [
    {
        'type': 'hiveq_historical',
        'dataset': 'HIVEQ_US_EQ',
        'schema': ['bars_1m'],
    },
    {
        'type': 'csv',
        'data_type': 'custom',
        'id': 'UserSignals',
        'path': 'userdata/signals.csv'
    }
]

# Backtest config
backtest_config = BacktestConfig(
    initial_capital=1000000.0,
    commission=0.001,  # 0.1%
    slippage=0.0005,   # 0.05%
    start_date='2025-08-01',
    end_date='2025-08-30'
)

report = hf.run_backtest(
    strategy_configs=strategy_configs,
    symbols=['AAPL', 'GOOGL'],
    data_configs=data_configs,
    backtest_config=backtest_config
)

print(results.return_stats.to_string())

Complete Live Trading Configuration

import hiveq.flow as hf
from hiveq.flow import StrategyConfig, LiveConfig

# Strategy
strategy_configs = [
    StrategyConfig(
        name='LiveStrategy',
        type='MyStrategy'
    )
]

# Live data
data_configs = [{
    'type': 'databento',
    'id': 'databento_main',
    'api_key': 'db-your_api_key',
    'venue_dataset_map': {
        'SIM': 'EQUS.MINI',
        'XCME': 'GLBX.MDP3'
    }
}]

# Live config
live_config = LiveConfig(
    venue='SIM',
    broker='sandbox',
    account_id='DEMO-001',
    starting_balances=['1000000 USD'],
    paper_trading=True
)

hf.run_live(
    strategy_configs=strategy_configs,
    symbols=['AAPL', 'ESZ5.XCME'],
    data_configs=data_configs,
    live_config=live_config
)

Data Provider Comparison

Provider	Use Case	Asset Types	Latency	Cost
CSV	Backtesting, prototyping	All	N/A	Free
HiveQ Historical	Production backtesting	Equities, Futures, Options	N/A	Subscription
Databento Live	Live trading	Equities, Futures, Options	Real-time	Usage-based
Custom Data	Signals, indicators	Any	N/A	Free

HiveQ Historical Datasets and Schemas:

Dataset	Asset Type	Available Schemas
`HIVEQ_US_EQ`	US Equities	`bars_1m`, `bars_1d`, `eq_trades`
`HIVEQ_US_FUT`	US Futures	`bars_1m`, `bars_1d`, `fut_trades`
`HIVEQ_US_OPT`	US Options	`bars_1m`, `bars_1d`
`HIVEQ_QUANT_SIGNALS`	Signals	`signals`

Best Practices

1. Data Quality

Ensure CSV data is clean:

No missing values
Chronological order
Consistent timestamp format
Valid OHLCV values (High >= Low, Close within range)

2. File Paths

Use relative paths for portability:

# Good
'path': 'bars/AAPL_bars.csv'

# Less portable
'path': '/home/user/data/bars/AAPL_bars.csv'

3. Data Location

HiveQ Flow data directory:

~/.hiveq/
├── data/
│   ├── bars/          # Bar data CSVs
│   ├── userdata/      # Custom data CSVs
│   └── universe/      # Universe files
├── flow/
│   └── resources/     # Strategy resources
└── logs/              # Log files

4. Custom Data Timestamps

Ensure custom data timestamps align with bars:

# Custom data at 09:30:15 will sync to bar at 09:30:00
# Custom data at 09:31:45 will sync to bar at 09:31:00

5. Multiple Schemas

When using multiple schemas, strategy receives all:

data_configs = [{
    'type': 'hiveq_historical',
    'dataset': 'HIVEQ_US_EQ',
    'schema': ['bars_1m', 'bars_1d'],
}]

# Strategy receives:
# - 1-minute bars
# - Daily bars

# Distinguish by checking bar interval or timestamp
def on_bar(self, ctx: hf.Context, event):
    bar = event.data()
    # Process based on interval