P
PipsGrowth

statsmodels Library

Statistical modeling library for time series analysis, forecasting, and hypothesis testing. Essential for pairs trading (cointegration), ARIMA price forecasting, and market analysis.

Difficulty: Advanced
Category: Statistical Analysis

Installation

$ pip install statsmodels

Key Modules for Trading

ARIMA / SARIMAX

Time series forecasting with autoregressive integrated moving average models

Cointegration Tests

Test if two currency pairs move together — essential for pairs trading

Stationarity Tests (ADF)

Test if a price series is mean-reverting or trending

OLS Regression

Linear regression for factor analysis and beta calculations

GARCH Models

Volatility modeling for risk management and option pricing

Granger Causality

Test if one time series can predict another

Code Examples

Installation

Install statsmodels

Python
pip install statsmodels
# Verify
python -c "import statsmodels; print(statsmodels.__version__)"

Stationarity Test (ADF)

Test if price data is stationary — critical for choosing the right strategy

Python
from statsmodels.tsa.stattools import adfuller
import yfinance as yf
df = yf.download("EURUSD=X", period="1y")
close = df['Close'].values.flatten()
# Augmented Dickey-Fuller test
result = adfuller(close)
print(f"ADF Statistic: {result[0]:.4f}")
print(f"p-value: {result[1]:.4f}")
print(f"Used lags: {result[2]}")
if result[1] < 0.05:
print("Result: STATIONARY — mean-reversion strategies may work")
else:
print("Result: NON-STATIONARY — trend-following strategies preferred")
# Test on returns instead (usually stationary)
returns = close[1:] / close[:-1] - 1
result_returns = adfuller(returns)
print(f"\nReturns ADF p-value: {result_returns[1]:.4f}")
print(f"Returns are {'stationary' if result_returns[1] < 0.05 else 'non-stationary'}")

Price Forecasting with ARIMA

Forecast future prices using ARIMA models

Python
from statsmodels.tsa.arima.model import ARIMA
import yfinance as yf
import numpy as np
df = yf.download("EURUSD=X", period="1y")
close = df['Close'].values.flatten()
# Fit ARIMA model on returns (stationary)
returns = np.diff(np.log(close)) # Log returns
model = ARIMA(returns, order=(5, 0, 2)) # (p, d, q)
fitted = model.fit()
# Print model summary
print(fitted.summary().tables[1])
# Forecast next 5 days
forecast = fitted.forecast(steps=5)
print("\nForecasted returns (next 5 days):")
for i, f in enumerate(forecast):
direction = "UP" if f > 0 else "DOWN"
print(f" Day {i+1}: {f:.6f} ({direction})")

Cointegration Test for Pairs Trading

Find pairs of currencies that move together

Python
from statsmodels.tsa.stattools import coint
import yfinance as yf
import numpy as np
# Download two correlated pairs
eurusd = yf.download("EURUSD=X", period="1y")['Close'].values.flatten()
gbpusd = yf.download("GBPUSD=X", period="1y")['Close'].values.flatten()
# Align lengths
min_len = min(len(eurusd), len(gbpusd))
eurusd = eurusd[-min_len:]
gbpusd = gbpusd[-min_len:]
# Test cointegration
score, pvalue, _ = coint(eurusd, gbpusd)
print(f"Cointegration Score: {score:.4f}")
print(f"p-value: {pvalue:.4f}")
if pvalue < 0.05:
print("COINTEGRATED — pairs trading viable!")
# Calculate spread
from statsmodels.regression.linear_model import OLS
from statsmodels.tools import add_constant
X = add_constant(eurusd)
model = OLS(gbpusd, X).fit()
hedge_ratio = model.params[1]
spread = gbpusd - hedge_ratio * eurusd
print(f"Hedge Ratio: {hedge_ratio:.4f}")
print(f"Current Spread: {spread[-1]:.5f}")
print(f"Spread Mean: {spread.mean():.5f}")
print(f"Z-Score: {(spread[-1] - spread.mean()) / spread.std():.2f}")
else:
print("NOT cointegrated — pairs trading not recommended")

Granger Causality Test

Test if one pair can predict another

Python
from statsmodels.tsa.stattools import grangercausalitytests
import yfinance as yf
import pandas as pd
import numpy as np
# Download data
eurusd = yf.download("EURUSD=X", period="1y")['Close']
usdjpy = yf.download("USDJPY=X", period="1y")['Close']
# Align
combined = pd.DataFrame({
'EURUSD': eurusd,
'USDJPY': usdjpy
}).dropna()
# Use returns (stationary)
combined = combined.pct_change().dropna()
print("Testing: Does USDJPY Granger-cause EURUSD?")
print("=" * 50)
result = grangercausalitytests(
combined[['EURUSD', 'USDJPY']],
maxlag=5, verbose=True
)

Factor Regression Analysis

Analyze how different factors affect currency returns

Python
from statsmodels.regression.linear_model import OLS
from statsmodels.tools import add_constant
import yfinance as yf
import pandas as pd
import numpy as np
# Download multiple factors
eurusd = yf.download("EURUSD=X", period="1y")['Close'].pct_change()
dxy = yf.download("DX-Y.NYB", period="1y")['Close'].pct_change()
gold = yf.download("GC=F", period="1y")['Close'].pct_change()
# Combine
factors = pd.DataFrame({
'EURUSD': eurusd,
'DXY': dxy,
'GOLD': gold
}).dropna()
# Run regression: EURUSD = α + β1*DXY + β2*GOLD + ε
X = add_constant(factors[['DXY', 'GOLD']])
model = OLS(factors['EURUSD'], X).fit()
print(model.summary())
print(f"\nR-squared: {model.rsquared:.4f}")
print(f"DXY Beta: {model.params['DXY']:.4f}")
print(f"GOLD Beta: {model.params['GOLD']:.4f}")

Best Practices

Test Stationarity First

Always run ADF test before applying ARIMA — most price data is non-stationary

Use Returns, Not Prices

Work with log returns or percentage returns which are more likely stationary

ARIMA Limitations

ARIMA forecasts are unreliable beyond a few periods — use for short-term only

Cointegration Changes

Cointegration relationships can break down — retest regularly

PipsGrowth - Expert Broker Reviews, Trading Strategies & Tools