strategy

Futures - BLS Macro Data¶

This template uses data from the Bureau of Labor Statistics for trading futures contracts.

You can clone and edit this example there (tab Examples).

The U.S. Bureau of Labor Statistics is the principal agency for the U.S. government in the field of labor economics and statistics. It provides macroeconomic data in several interesting categories: prices, employment and unemployment, compensation and working conditions and productivity.

Quantiacs has implemented these datasets on its cloud and makes them also available for local use on your machine.

In this template we show how to use the BLS data for creating a trading algorithm.

Need help? Check the Documentation and find solutions/report problems in the Forum section.

More help with Jupyter? Check the official Jupyter page.

Check the BLS documentation on the Quantiacs macroeconomics help page.

Once you are done, click on Submit to the contest and take part to our competitions.

API reference:

data: check how to work with data;
backtesting: read how to run the simulation and check the results.

Need to use the optimizer function to automate tedious tasks?

optimization: read more on our article.

In [1]:

import pandas as pd
import numpy as np

import qnt.data as qndata

In [2]:

%%javascript
window.IPython && (IPython.OutputArea.prototype._should_scroll = function(lines) { return false; })
// disable widget scrolling

First of all we list the 34 available datasets and inspect them:

In [3]:

dbs = qndata.blsgov.load_db_list()

display(pd.DataFrame(dbs)) # convert to pandas for better formatting

100% (3935 of 3935) |####################| Elapsed Time: 0:00:00 Time:  0:00:00

	id	modified	name
0	EN	2019-10-15T11:01:00	Quarterly Census of Employment and Wages
1	CS	2022-02-15T12:49:00	Nonfatal cases involving days away from work: ...
2	OE	2022-03-31T10:09:00	Occupational Employment Statistics
3	FM	2022-04-20T10:04:00	Marital and family labor force statistics from...
4	TU	2022-06-23T11:29:00	American Time Use
5	EP	2022-09-08T10:03:00	Employment Projections by Industry
6	NB	2022-09-22T10:02:00	National Compensation Survey-Benefits
7	CX	2022-10-25T12:00:00	Consumer Expenditure Survey
8	IS	2022-11-09T10:00:00	Occupational injuries and illnesses industry data
9	OR	2022-11-17T10:00:00	Occupational Requirements
10	MP	2022-11-18T10:00:00	Major Sector Multifactor Productivity
11	WM	2022-12-08T10:00:00	Wage Modeling
12	CM	2022-12-15T10:00:00	Employer Costs for Employee Compensation
13	FW	2022-12-16T10:00:00	Census of Fatal Occupational Injuries (2011 fo...
14	IP	2023-01-05T10:00:00	Industry Productivity
15	WS	2023-01-10T10:00:00	Work Stoppage Data
16	AP	2023-01-12T08:30:00	Consumer Price Index - Average Price Data
17	CU	2023-01-12T08:30:00	Consumer Price Index - All Urban Consumers
18	CW	2023-01-12T08:30:00	Consumer Price Index - Urban Wage Earners and ...
19	SU	2023-01-12T08:30:00	Consumer Price Index - Chained Consumer Price ...
20	EI	2023-01-13T08:30:00	Import/Export Price Indexes
21	ND	2023-01-18T08:30:00	Producer Price Index Industry Data
22	PC	2023-01-18T08:30:00	Producer Price Index Industry Data
23	WD	2023-01-18T08:30:00	Producer Price Index Commodity-Discontinued Se...
24	WP	2023-01-18T08:30:00	Producer Price Index-Commodities
25	LE	2023-01-19T10:00:00	Weekly and hourly earnings data from the Curre...
26	LU	2023-01-19T10:00:00	Union affiliation data from the Current Popula...
27	SM	2023-01-24T10:00:00	State and Area Employment, Hours, and Earnings
28	BD	2023-01-25T10:00:00	Business Employment Dynamics
29	CI	2023-01-31T08:30:00	Employment Cost Index
30	JT	2023-02-01T10:00:00	Job Openings and Labor Turnover Survey
31	LA	2023-02-01T10:00:00	Local Area Unemployment Statistics
32	PR	2023-02-02T08:30:00	Major Sector Productivity and Costs
33	CE	2023-02-03T08:30:00	Employment, Hours, and Earnings from the Curre...
34	LN	2023-02-03T08:30:00	Labor Force Statistics from the Current Popula...

For each dataset you can see the identifier, the name and the date of the last available update. Each dataset contains several time series which can be used as indicators.

In this example we use AP. Average consumer Prices are calculated for household fuel, motor fuel and food items from prices collected for the Consumer Price Index (CPI). The full description is available in the metadata.

Let us load and display the time series contained in the AP dataset:

In [4]:

series_list = list(qndata.blsgov.load_series_list('AP'))

display(pd.DataFrame(series_list).set_index('id')) # convert to pandas for better formatting

100% (478963 of 478963) |################| Elapsed Time: 0:00:00 Time:  0:00:00
100% (2 of 2) |##########################| Elapsed Time: 0:00:00 Time:  0:00:00

	area_code	item_code	series_title	footnote_codes	begin_year	begin_period	end_year	end_period
id
APU0000701111	0000	701111	Flour, white, all purpose, per lb. (453.6 gm) ...		1980	M01	2022	M12
APU0000701311	0000	701311	Rice, white, long grain, precooked (cost per p...		1980	M01	1981	M12
APU0000701312	0000	701312	Rice, white, long grain, uncooked, per lb. (45...		1980	M01	2022	M12
APU0000701321	0000	701321	Spaghetti (cost per pound/453.6 grams) in U.S....		1980	M01	1981	M03
APU0000701322	0000	701322	Spaghetti and macaroni, per lb. (453.6 gm) in ...		1984	M01	2022	M12
...	...	...	...	...	...	...	...	...
APUS49G74713	S49G	74713	Gasoline, leaded premium (cost per gallon/3.8 ...		1978	M01	1981	M04
APUS49G74714	S49G	74714	Gasoline, unleaded regular, per gallon/3.785 l...		1978	M01	2022	M12
APUS49G74715	S49G	74715	Gasoline, unleaded midgrade, per gallon/3.785 ...		2021	M06	2022	M12
APUS49G74716	S49G	74716	Gasoline, unleaded premium, per gallon/3.785 l...		1981	M09	2022	M12
APUS49G7471A	S49G	7471A	Gasoline, all types, per gallon/3.785 liters i...		1978	M01	2022	M12

1482 rows × 8 columns

As you see, the AP Average Price Data dataset contains 1479 time series.

Let us see how we can learn the meaning of the 8 columns. Some of them are obvious, like series_title, begin_year or end_year, but others are not, like area_code, item_code, begin_period, end_period.

Inspect the metadata¶

The Quantiacs toolbox allows you to inspect the meaning of all fields:

In [5]:

meta = qndata.blsgov.load_db_meta('AP')

for k in meta.keys():
    print('### ' + k + " ###")
    m = meta[k]
    
    if type(m) == str:
        # Show only the first line if this is a text entry.
        print(m.split('\n')[0])
        print('...')
        # Uncomment the next line to see the full text. It will give you more details about the database.
        # print(m) 

    if type(m) == dict:
        # convert dictionaries to pandas DataFrame for better formatting:
        df = pd.DataFrame(meta[k].values())
        df = df.set_index(np.array(list(meta[k].keys())))
        display(df)

100% (26925 of 26925) |##################| Elapsed Time: 0:00:00 Time:  0:00:00

### area ###

	0
0000	U.S. city average
0100	Northeast
0110	New England
0120	Middle Atlantic
0200	Midwest
...	...
S49C	Riverside-San Bernardino-Ontario, CA
S49D	Seattle-Tacoma-Bellevue WA
S49E	San Diego-Carlsbad, CA
S49F	Urban Hawaii
S49G	Urban Alaska

74 rows × 1 columns

### footnote ###

	0
footnote_code	footnote_text

### item ###

	0
701111	Flour, white, all purpose, per lb. (453.6 gm)
701311	Rice, white, long grain, precooked (cost per p...
701312	Rice, white, long grain, uncooked, per lb. (45...
701321	Spaghetti (cost per pound/453.6 grams)
701322	Spaghetti and macaroni, per lb. (453.6 gm)
...	...
FJ4101	Yogurt, per 8 oz. (226.8 gm)
FL2101	Lettuce, romaine, per lb. (453.6 gm)
FN1101	All soft drinks, per 2 liters (67.6 oz)
FN1102	All soft drinks, 12 pk, 12 oz., cans, per 12 o...
FS1101	Butter, stick, per lb. (453.6 gm)

160 rows × 1 columns

### period ###

	period	period_abbr	period_name
M01	M01	JAN	January
M02	M02	FEB	February
M03	M03	MAR	March
M04	M04	APR	April
M05	M05	MAY	May
M06	M06	JUN	June
M07	M07	JUL	July
M08	M08	AUG	August
M09	M09	SEP	September
M10	M10	OCT	October
M11	M11	NOV	November
M12	M12	DEC	December
M13	M13	AN AV	Annual Average

### seasonal ###

	0
S	Seasonally Adjusted
U	Not Seasonally Adjusted

### contacts ###
Consumer Price Indexes Contacts
...
### txt ###
				Average Price Data (AP)
...

These tables allows you to quickly understand the meaning of the fields for each times series in the Average Price Data.

The area_code column reflects the U.S. area connected to the time series, for example 0000 for the entire U.S.

Let us select only time series related to the entire U.S.:

In [6]:

us_series_list = [s for s in series_list if s['area_code'] == '0000']

display(pd.DataFrame(us_series_list).set_index('id')) # convert to pandas for better formatting

	area_code	item_code	series_title	footnote_codes	begin_year	begin_period	end_year	end_period
id
APU0000701111	0000	701111	Flour, white, all purpose, per lb. (453.6 gm) ...		1980	M01	2022	M12
APU0000701311	0000	701311	Rice, white, long grain, precooked (cost per p...		1980	M01	1981	M12
APU0000701312	0000	701312	Rice, white, long grain, uncooked, per lb. (45...		1980	M01	2022	M12
APU0000701321	0000	701321	Spaghetti (cost per pound/453.6 grams) in U.S....		1980	M01	1981	M03
APU0000701322	0000	701322	Spaghetti and macaroni, per lb. (453.6 gm) in ...		1984	M01	2022	M12
...	...	...	...	...	...	...	...	...
APU0000FJ4101	0000	FJ4101	Yogurt, per 8 oz. (226.8 gm) in U.S. city aver...		2018	M04	2022	M12
APU0000FL2101	0000	FL2101	Lettuce, romaine, per lb. (453.6 gm) in U.S. c...		2006	M01	2022	M12
APU0000FN1101	0000	FN1101	All soft drinks, per 2 liters (67.6 oz) in U.S...		2018	M04	2022	M12
APU0000FN1102	0000	FN1102	All soft drinks, 12 pk, 12 oz., cans, per 12 o...		2018	M04	2022	M12
APU0000FS1101	0000	FS1101	Butter, stick, per lb. (453.6 gm) in U.S. city...		2018	M04	2022	M12

160 rows × 8 columns

We have 160 time series out of the original 1479. These are global U.S. time series which are more relevant for forecasting global financial markets. Let us select time series which are currently being updated and have at least 20 years of history:

In [7]:

actual_us_series_list = [s for s in us_series_list if s['begin_year'] <= '2000' and s['end_year'] == '2021' ]

display(pd.DataFrame(actual_us_series_list).set_index('id')) # convert to pandas for better formatting

	area_code	item_code	series_title	footnote_codes	begin_year	begin_period	end_year	end_period
id
APU0000711417	0000	711417	Grapes, Thompson Seedless, per lb. (453.6 gm) ...		1980	M07	2021	M09

In [8]:

len(actual_us_series_list)

Out[8]:

We have 55 time series whose history is long enough for our purpose. Now we can load one of these series and use it for our strategy. Let us focus on energy markets. We consider fuel oil APU000072511 on a monthly basis:

In [9]:

series_data = qndata.blsgov.load_series_data('APU000072511', tail = 30*365)

# convert to pandas.DataFrame
series_data = pd.DataFrame(series_data)
series_data = series_data.set_index('pub_date')

# remove yearly average data, see period dictionary
series_data = series_data[series_data['period'] != 'M13']

series_data

100% (38265 of 38265) |##################| Elapsed Time: 0:00:00 Time:  0:00:00

Out[9]:

	year	period	footnote_codes	value
pub_date
1994-05-14	1994	M04	[]	0.935
1994-06-14	1994	M05	[]	0.919
1994-07-14	1994	M06	[]	0.906
1994-08-14	1994	M07	[]	0.898
1994-09-14	1994	M08	[]	0.894
...	...	...	...	...
2022-09-14	2022	M08	[]	4.953
2022-10-14	2022	M09	[]	4.815
2022-11-14	2022	M10	[]	5.786
2022-12-14	2022	M11	[]	5.240
2023-01-14	2022	M12	[]	4.344

345 rows × 4 columns

Next, let us consider Futures contracts in the Energy sector:

In [10]:

futures_list = qndata.futures_load_list()

energy_futures_list = [f for f in futures_list if f['sector'] == 'Energy']

pd.DataFrame(energy_futures_list)

100% (7168 of 7168) |####################| Elapsed Time: 0:00:00 Time:  0:00:00

Out[10]:

	id	name	sector	point_value
0	F_BC	Crude Oil Brent	Energy	$1,000
1	F_BG	Gasoil Low Sulphur	Energy	$100
2	F_HO	Heating Oil	Energy	$42,000
3	F_NG	UK Natural Gas	Energy	GBP 1,000
4	F_RB	JPX Gasoline	Energy	JPY 50
5	F_CL	United States Oil Fund	Energy	1

We consider Brent Crude Oil, F_BC, and define a strategy using a multi-pass approach:

In [11]:

import xarray as xr
import numpy as np
import pandas as pd

import qnt.ta as qnta
import qnt.backtester as qnbt
import qnt.data as qndata


def load_data(period):
    
    futures = qndata.futures_load_data(assets=['F_BC'], tail=period, dims=('time','field','asset'))
    
    ap = qndata.blsgov.load_series_data('APU000072511', tail=period)
    
    # convert to pandas.DataFrame
    ap = pd.DataFrame(ap) 
    ap = ap.set_index('pub_date') 

    # remove yearly average data, see period dictionary
    ap = ap[ap['period'] != 'M13']
    
    # convert to xarray
    ap = ap['value'].to_xarray().rename(pub_date='time').assign_coords(time=pd.to_datetime(ap.index.values))
    
    # return both time series
    return dict(ap=ap, futures=futures), futures.time.values


def window(data, max_date: np.datetime64, lookback_period: int):
    # the window function isolates data which are needed for one iteration
    # of the backtester call
    
    min_date = max_date - np.timedelta64(lookback_period, 'D')
    
    return dict(
        futures = data['futures'].sel(time=slice(min_date, max_date)),
        ap = data['ap'].sel(time=slice(min_date, max_date))
    )


def strategy(data, state):
    
    close = data['futures'].sel(field='close')
    ap = data['ap']
    
    # the strategy complements indicators based on the Futures price with macro data
    # and goes long/short or takes no exposure:
    
    if ap.isel(time=-1) > ap.isel(time=-2) \
            and close.isel(time=-1) > close.isel(time=-20):
        return xr.ones_like(close.isel(time=-1)), 1
    
    elif ap.isel(time=-1) < ap.isel(time=-2) \
            and ap.isel(time=-2) < ap.isel(time=-3) \
            and ap.isel(time=-3) < ap.isel(time=-4) \
            and close.isel(time=-1) < close.isel(time=-40):
        return -xr.ones_like(close.isel(time=-1)), 1 
    
    # When the state is None, we are in the beginning and no weights were generated.
    # We use buy'n'hold to fill these first days.
    elif state is None: 
        return xr.ones_like(close.isel(time=-1)), None
    
    else:
        return xr.zeros_like(close.isel(time=-1)), 1


weights, state = qnbt.backtest(
    competition_type='futures',
    load_data=load_data,
    window=window,
    lookback_period=365,
    start_date="2006-01-01",
    strategy=strategy,
    analyze=True,
    build_plots=True
)

Run last pass...
Load data...

100% (35558112 of 35558112) |############| Elapsed Time: 0:00:00 Time:  0:00:00
100% (2 of 2) |##########################| Elapsed Time: 0:00:00 Time:  0:00:00

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-11-dfba7cfcdb6d> in <module>
     75     strategy=strategy,
     76     analyze=True,
---> 77     build_plots=True
     78 )

/usr/local/lib/python3.7/site-packages/qnt/backtester.py in backtest(competition_type, strategy, load_data, lookback_period, test_period, start_date, end_date, window, step, analyze, build_plots, collect_all_states)
    273     log_info("Run last pass...")
    274     log_info("Load data...")
--> 275     data = load_data(lookback_period)
    276     try:
    277         if data.name == 'stocks' and competition_type != 'stocks' and competition_type != 'stocks_long'\

<ipython-input-11-dfba7cfcdb6d> in load_data(period)
     16     # convert to pandas.DataFrame
     17     ap = pd.DataFrame(ap)
---> 18     ap = ap.set_index('pub_date')
     19 
     20     # remove yearly average data, see period dictionary

/usr/local/lib/python3.7/site-packages/pandas/core/frame.py in set_index(self, keys, drop, append, inplace, verify_integrity)
   4725 
   4726         if missing:
-> 4727             raise KeyError(f"None of {missing} are in the columns")
   4728 
   4729         if inplace:

KeyError: "None of ['pub_date'] are in the columns"