statsmodels Library

Statistical modeling library for time series analysis, forecasting, and hypothesis testing. Essential for pairs trading (cointegration), ARIMA price forecasting, and market analysis.

Difficulty: Advanced

Category: Statistical Analysis

Installation

$ pip install statsmodels

Key Modules for Trading

ARIMA / SARIMAX

Time series forecasting with autoregressive integrated moving average models

Cointegration Tests

Test if two currency pairs move together — essential for pairs trading

Stationarity Tests (ADF)

Test if a price series is mean-reverting or trending

OLS Regression

Linear regression for factor analysis and beta calculations

GARCH Models

Volatility modeling for risk management and option pricing

Granger Causality

Test if one time series can predict another

Code Examples

Installation

Install statsmodels

Python

pip install statsmodels

# Verify

python -c "import statsmodels; print(statsmodels.__version__)"

Stationarity Test (ADF)

Test if price data is stationary — critical for choosing the right strategy

Python

from statsmodels.tsa.stattools import adfuller

import yfinance as yf

df = yf.download("EURUSD=X", period="1y")

close = df['Close'].values.flatten()

# Augmented Dickey-Fuller test

result = adfuller(close)

print(f"ADF Statistic: {result[0]:.4f}")

print(f"p-value: {result[1]:.4f}")

print(f"Used lags: {result[2]}")

if result[1] < 0.05:

print("Result: STATIONARY — mean-reversion strategies may work")

else:

print("Result: NON-STATIONARY — trend-following strategies preferred")

# Test on returns instead (usually stationary)

returns = close[1:] / close[:-1] - 1

result_returns = adfuller(returns)

print(f"\nReturns ADF p-value: {result_returns[1]:.4f}")

print(f"Returns are {'stationary' if result_returns[1] < 0.05 else 'non-stationary'}")

Price Forecasting with ARIMA

Forecast future prices using ARIMA models

Python

from statsmodels.tsa.arima.model import ARIMA

import yfinance as yf

import numpy as np

df = yf.download("EURUSD=X", period="1y")

close = df['Close'].values.flatten()

# Fit ARIMA model on returns (stationary)

returns = np.diff(np.log(close)) # Log returns

model = ARIMA(returns, order=(5, 0, 2)) # (p, d, q)

fitted = model.fit()

# Print model summary

print(fitted.summary().tables[1])

# Forecast next 5 days

forecast = fitted.forecast(steps=5)

print("\nForecasted returns (next 5 days):")

for i, f in enumerate(forecast):

direction = "UP" if f > 0 else "DOWN"

print(f" Day {i+1}: {f:.6f} ({direction})")

Cointegration Test for Pairs Trading

Find pairs of currencies that move together

Python

from statsmodels.tsa.stattools import coint

import yfinance as yf

import numpy as np

# Download two correlated pairs

eurusd = yf.download("EURUSD=X", period="1y")['Close'].values.flatten()

gbpusd = yf.download("GBPUSD=X", period="1y")['Close'].values.flatten()

# Align lengths

min_len = min(len(eurusd), len(gbpusd))

eurusd = eurusd[-min_len:]

gbpusd = gbpusd[-min_len:]

# Test cointegration

score, pvalue, _ = coint(eurusd, gbpusd)

print(f"Cointegration Score: {score:.4f}")

print(f"p-value: {pvalue:.4f}")

if pvalue < 0.05:

print("COINTEGRATED — pairs trading viable!")

# Calculate spread

from statsmodels.regression.linear_model import OLS

from statsmodels.tools import add_constant

X = add_constant(eurusd)

model = OLS(gbpusd, X).fit()

hedge_ratio = model.params[1]

spread = gbpusd - hedge_ratio * eurusd

print(f"Hedge Ratio: {hedge_ratio:.4f}")

print(f"Current Spread: {spread[-1]:.5f}")

print(f"Spread Mean: {spread.mean():.5f}")

print(f"Z-Score: {(spread[-1] - spread.mean()) / spread.std():.2f}")

else:

print("NOT cointegrated — pairs trading not recommended")

Granger Causality Test

Test if one pair can predict another

Python

from statsmodels.tsa.stattools import grangercausalitytests

import yfinance as yf

import pandas as pd

import numpy as np

# Download data

eurusd = yf.download("EURUSD=X", period="1y")['Close']

usdjpy = yf.download("USDJPY=X", period="1y")['Close']

# Align

combined = pd.DataFrame({

'EURUSD': eurusd,

'USDJPY': usdjpy

}).dropna()

# Use returns (stationary)

combined = combined.pct_change().dropna()

print("Testing: Does USDJPY Granger-cause EURUSD?")

print("=" * 50)

result = grangercausalitytests(

combined[['EURUSD', 'USDJPY']],

maxlag=5, verbose=True

)

Factor Regression Analysis

Analyze how different factors affect currency returns

Python

from statsmodels.regression.linear_model import OLS

from statsmodels.tools import add_constant

import yfinance as yf

import pandas as pd

import numpy as np

# Download multiple factors

eurusd = yf.download("EURUSD=X", period="1y")['Close'].pct_change()

dxy = yf.download("DX-Y.NYB", period="1y")['Close'].pct_change()

gold = yf.download("GC=F", period="1y")['Close'].pct_change()

# Combine

factors = pd.DataFrame({

'EURUSD': eurusd,

'DXY': dxy,

'GOLD': gold

}).dropna()

# Run regression: EURUSD = α + β1*DXY + β2*GOLD + ε

X = add_constant(factors[['DXY', 'GOLD']])

model = OLS(factors['EURUSD'], X).fit()

print(model.summary())

print(f"\nR-squared: {model.rsquared:.4f}")

print(f"DXY Beta: {model.params['DXY']:.4f}")

print(f"GOLD Beta: {model.params['GOLD']:.4f}")

Best Practices

Test Stationarity First

Always run ADF test before applying ARIMA — most price data is non-stationary

Use Returns, Not Prices

Work with log returns or percentage returns which are more likely stationary

ARIMA Limitations

ARIMA forecasts are unreliable beyond a few periods — use for short-term only

Cointegration Changes

Cointegration relationships can break down — retest regularly

Resources

Next Steps

SciPy - Signal Processing scikit-learn - ML Models