Introduction

Quantiacs offers historical data for major financial markets, including stocks, futures (like Bitcoin futures), and cryptocurrencies:

  • Stocks: Market data for NASDAQ-listed, S&P500-listed companies, past and present.

  • Futures: Market data for liquid global futures contracts with various underlying assets.

  • Cryptocurrencies: Market data for top cryptocurrencies by market capitalization.

Additional Datasets:

Working with Data

Quantiacs provides a Python library called qnt.data for loading and working with financial data. You can load different types of data using the following functions:

import qnt.data as qndata

# Load daily stock data for the Q22 S&P500 contest
stocks = qndata.stocks.load_spx_data(min_date="2005-06-01")

# Load daily stock data for the Q20 Nasdaq-100 contest
stocks_nasdaq = qndata.stocks.load_ndx_data(min_date="2005-06-01")

# Load cryptocurrency daily data for the Q16/Q17 contests
cryptodaily = qndata.cryptodaily.load_data(min_date="2005-06-01")

# Load futures data for the Q15 contest
futures = qndata.futures.load_data(min_date="2005-06-01")

# Load BTC futures data for the Q15 contest
crypto_futures = qndata.cryptofutures.load_data(min_date="2005-06-01")

print(stocks, stocks_nasdaq, cryptodaily, futures, crypto_futures)

All datasets share the same structure. You can access open and close prices, high and low prices, trading volumes, and other data fields.

import qnt.data as qndata

data = qndata.stocks.load_spx_data(min_date="2005-06-01")

price_open = data.sel(field="open")
price_close = data.sel(field="close")
price_high = data.sel(field="high")
price_low = data.sel(field="low")
volume_day = data.sel(field="vol")
is_liquid = data.sel(field="is_liquid")
Data field Description Code example
open Daily open price, i.e. the price at which a security trades when the exchange opens. price_open = data.sel(field="open")
close Daily close price. price_close = data.sel(field="close")
high Daily high price. price_high = data.sel(field="high")
low Daily low price. price_low = data.sel(field="low")
vol Daily number of traded shares. volume_day = data.sel(field="vol")
divs Dividends from shares. dividends = data.sel(field="divs")
split Indicates a stock split. Split = 2.0 means that on this day there was a split of shares 2 to 1. split = data.sel(field="split")
split_cumprod The product of split values from the very beginning. It can be used for restoring original prices. split_cumprod = data.sel(field="split_cumprod")
is_liquid Determines whether this stock is liquid enough for trading. A stock is liquid if it is part of the top 500 stocks. is_liquid = data.sel(field="is_liquid")

Frequently used functions

Description Code Example
View a list of all tickers data.asset.to_pandas().to_list()
See which fields are available data.field.to_pandas().to_list()
Load specific tickers data = qndata.stocks.load_spx_data(min_date="2005-06-01", assets=["NAS:AAPL", "NAS:AMZN"])
Select specific tickers after loading all data def get_data_filter(data, assets):
filtered = data.sel(asset=assets)
return filtered

get_data_filter(data, ["NAS:AAPL", "NAS:AMZN"])
Load a list of NASDAQ-listed stocks stocks_list = qndata.stocks.load_ndx_list(min_date='2006-01-01')
Load a list of available futures contracts. future_list = qndata.futures.load_list()
List of sectors. sectors = [x['sector'] for x in stocks_list]
Filter a list of asset IDs for the specified sector. assets_for_sector = [x['id'] for x in stocks_list if x['sector'] == "Energy"]
Load specific tickers for a sector data = qndata.stocks.load_spx_data(min_date="2005-06-01", assets=assets_for_sector)

Xarray

Quantiacs data is stored in xarray DataArray format for working with labeled multi-dimensional arrays. DataArrays support arithmetic operations, mathematical functions, broadcasting, aggregations, slicing, combining, and custom functions.

import qnt.data as qndata

# Load data
stocks = qndata.stocks.load_spx_data(min_date="2005-06-01")
print(stocks.dims)

# Example output: ('field', 'time', 'asset')

The main data structure in xarray is the DataArray. It has the following properties:

  • values: a numpy.ndarray containing the array’s values.

  • dims: dimension names for each axis.

  • coords: a container of arrays (coordinates) that label each point.

  • attrs: a dictionary for storing arbitrary metadata (attributes).

import qnt.data as qndata
import numpy as np
import qnt.ta as qnta
import qnt.xr_talib as talib

# Load data
data = qndata.stocks.load_spx_data(min_date="2005-06-01")

# Select close prices
price_close = data.sel(field="close")

# Normalize close prices
price_close_100 = price_close / 100.0

# Calculate log of close prices
log_price = np.log(price_close)

# Select close prices for a specific time range
slice_price = price_close.sel(time=slice("2006-01-01", "2006-12-31"))

# Calculate simple moving average (SMA) using qnt.ta
close_price_sma = qnta.sma(price_close, 2)

# Calculate SMA using qnt.xr_talib
close_price_sma_talib = talib.SMA(price_close, 2)

Pandas

You can also convert xarray DataArrays into pandas DataFrames, run computations with standard pandas methods, and convert the results back to xarray.

Example 1

import qnt.data as qntdata

# Load data
data = qntdata.stocks.load_spx_data(min_date="2005-06-01")

# Calculate percentage change of close prices
def get_price_pct_change(prices):
    prices_pandas = prices.to_pandas()
    assets = data.coords["asset"].values
    for asset in assets:
        prices_pandas[asset] = prices_pandas[asset].pct_change()
    return prices_pandas

prices = data.sel(field="close") * 1.0
prices_pct_change = get_price_pct_change(prices).unstack().to_xarray()

Example 2

import qnt.data as qntdata

# Load data
data = qntdata.stocks.load_spx_data(min_date="2005-06-01")

# Convert close prices to pandas DataFrame
close = data.sel(field="close").to_pandas()

# Calculate simple moving average (SMA) for close prices
close_sma = ((close - close.shift(10)) / close.shift(10)).rolling(30).mean()

# Normalize SMA values
norm = abs(close_sma).sum(axis=1)
weights = close_sma.div(norm, axis=0)

# Convert weights back to xarray DataArray
final_weights = weights.unstack().to_xarray()

How to create a dataset yourself

import pandas as pd
import xarray as xr
import numpy as np


class Fields:
    OPEN = "open"
    LOW = "low"
    HIGH = "high"
    CLOSE = "close"
    VOL = "vol"
    DIVS = "divs"
    SPLIT = "split"
    SPLIT_CUMPROD = "split_cumprod"
    IS_LIQUID = 'is_liquid'


class Dimensions:
    TIME = 'time'
    FIELD = 'field'
    ASSET = 'asset'


def get_base_df():
    sample_data = {"schema": {"fields": [{"name": "time", "type": "datetime"}, {"name": "open", "type": "integer"},
                                         {"name": "close", "type": "integer"}, {"name": "low", "type": "integer"},
                                         {"name": "high", "type": "integer"}, {"name": "vol", "type": "integer"},
                                         {"name": "divs", "type": "integer"}, {"name": "split", "type": "integer"},
                                         {"name": "split_cumprod", "type": "integer"},
                                         {"name": "is_liquid", "type": "integer"}], "primaryKey": ["time"],
                              "pandas_version": "0.20.0"}, "data": [
        {"time": "2021-01-30T00:00:00.000Z", "open": 1, "close": 2, "low": 1, "high": 2, "vol": 1000, "divs": 0,
         "split": 0, "split_cumprod": 0, "is_liquid": 1},
        {"time": "2021-01-31T00:00:00.000Z", "open": 2, "close": 3, "low": 2, "high": 3, "vol": 1000, "divs": 0,
         "split": 0, "split_cumprod": 0, "is_liquid": 1},
        {"time": "2021-02-01T00:00:00.000Z", "open": 3, "close": 4, "low": 3, "high": 4, "vol": 1000, "divs": 0,
         "split": 0, "split_cumprod": 0, "is_liquid": 1},
    ]}
    columns = [Dimensions.TIME, Fields.OPEN, Fields.CLOSE, Fields.LOW, Fields.HIGH, Fields.VOL, Fields.DIVS,
               Fields.SPLIT, Fields.SPLIT_CUMPROD, Fields.IS_LIQUID]
    rows = sample_data['data']
    for r in rows:
        r['time'] = np.datetime64(r['time'])
    df = pd.DataFrame(columns=columns, data=rows)
    df.set_index(Dimensions.TIME, inplace=True)
    prices_array = df.to_xarray().to_array(Dimensions.FIELD)
    prices_array.name = "BTC_TEST"
    prices_array_r = xr.concat([prices_array], pd.Index(['BTC'], name='asset'))
    return prices_array_r


weights = get_base_df()
print(weights)