Detecting Trend Changes in Time-Series Data: A Frequentist and Parametric Approach#
Introduction#
Detecting trend changes in time-series can offer valuable insights for various applications. There are multiple ways to approach this problem, and each method comes with its own set of assumptions and intricacies.
This post aims to explore one such method—employing frequentist and parametric techniques to identify change-points in time-series data. While there are numerous non-parametric and Bayesian methods available, the focus here is on methods that utilize linear regression as a foundation.
We’ll walk you through two different model specifications—one allowing for changes in both the intercept and the slope, and another that focuses solely on the slope. By the end of this article, you will have learned how to construct hypotheses tests to assess the presence of structural changes and how to carry out these tests using Ordinary Least Squares (OLS) estimators.
Prerequisite knowledge in linear regression will be helpful to get the most out of this post.
Now, let’s delve into the specifics and learn how to identify those crucial turning points in your time-series data.
Illustration of different type of trend changes (source [Zuo et al., 2020])
How to detect trend changes in time series#
Consider a time-series
Step 1: define your linear model: unknown parameters + design matrix#
Model 1 (Structural change in both intercept and slope)#
This specification allows for a simultaneous change in the intercept and slope.
Unknown Parameters#
Design Matrix#
Model 2 (Structural change in slope)#
This specification allows for a one-time change in the slope of the trend without affecting the level. In mathematical terms, the model can be represented as:
Step 2: Define Your Testing Hypothesis#
There are several tests that can be formulated to assess whether a structural change has occurred in the time series. Depending on the specific aspect you want to test, the null hypothesis (H0) could take various forms. In the case of Model 1, for example, you could consider:
which tests for equality between the slopes before and after the change point. . This is a joint hypothesis which tests for equality in both the slope and the intercept at the time .
If either of these null hypotheses is rejected, it would imply that a structural change has occurred at the change point
In a more generalized form, these hypotheses can be expressed as:
Here,
For the first test
, could be a row vector and would be .For the second joint hypothesis
, could be a matrix
This flexible framework allows for various types of hypotheses to be tested, depending on your specific research questions and the data at hand.
Step 3: Construct your test#
For a given hypothesis expressed above, we are going to see how to test the above generalized from hypothesis using OLS estimators.
To test the null hypothesis
Here,
We are going to see that under the null hypothesis
In finite sample, under
Wald test#
OLS reminder
As a reminder OLS estimation consists in:
We obtain the estimator:
From the expression above, we obtain the following decomposition:
By applying Slutsky’s theorem, we get:
Let V denote
It follows that
If we can find an estimator of V, i.e.
For a detailed explanation, please refer to chapter 3 and 4 of Wooldridge’s “Econometric Analysis of Cross Section and Panel Data” [Wooldridge, 2010] or chapter 5 of [Greene, 2003].
Let’s recap:
Assumption 1:
(often referred as OLS 1, e.g. in [Wooldridge, 2010])Some form of Law of Large Numbers and Central Limit Theorem can be applied. Please refer to [White, 2014] to look at the appropriate one to use.
We need to find an estimator
of . As a reminder, . The main work lies in making the right assumption of the structure of to estimate .Under
,
What could be a good covariance matrix estimator?#
Let’s start simple: Homoscedasticity assumption#
In linear regression we often make the Homoscedasticity assumption
It follows that
A natural unbiased estimator of
Let’s show that
Since,
By denoting
Because
And we get that
Hence the unbiased estimator
If we assume further that the error terms are normally distributed, leveraging
Chi square property
We obtain that
In finite sample,
F-distribution definition
Note 1: Relation with t-test
You might be more familiar with t-tests.
If j=1,
Note 2: on the normality assumption
One point to note is that the necessity for the normality assumption decreases as the sample size increases. If we have a very large sample size, the Wald statistic can be used directly without turning it into an approximate F-distribution.
Newey-West Estimator#
But in time-series analysis, it’s generally unrealistic to assume that error terms are uncorrelated across time. Wooldridge illustrates in Chapter 12 of his book how traditional estimators can be misleading under these circumstances. Specifically, he provides an example where the error term follows an AR(1) process with a positive coefficient. In such a scenario, conventional estimators that assume uncorrelated errors would underestimate the true standard errors, leading to incorrect inferences [Wooldridge, 2015].
We would like an estimator that is:
robust to heteroscedasticity and autocorrelation of error terms
semidefinite positive (otherwise some linear combination of the elements of
is asserted to have a negative variance, which is problematic for an estimator of a variance-covariance matrix)
To meet these requirements, Newey and west suggested the following estimator ([Newey and West, 1987]):
We therefore get:
Application: Example with model 1#
Running Slope Difference (RSD) Test#
In [Zuo et al., 2019], the following test is suggested:
where
While the paper does not use matrix formualtion, we are going to see how we can obtain this result.
Problem reformulation#
The paper does not use a matrix form like we do, but the formalization of their problem can be reformulated using Model 1 (Structural change in both intercept and slope) :
Let’s see if we find obtain the same results.
All that is left to compute is:
Since:
We get:
Hence,
Bonus: simple code to implement the the RSD-test#
Code snippet
from typing import Union, List, Dict, Optional
import numpy as np
from scipy import stats
from statsmodels.tsa.stattols import acf
def slope_difference_test(
univariate_timeseries: Union[List[float], np.ndarray],
separation_point: int,
robust: Optional[bool] = True,
) -> Dict:
"""
Statistical test for slop difference described in the following paper:
Zuo, Bin, et al. "A new statistical method for detecting trend turning."
Theoretical and Applied Climatology 138.1 (2019): 201-213.
The null hypothesis is : no-slope difference between the two segments of the time series
Parameters
univariate_timeseries : 1d numpy array of shape (length_timeseries,)
separation_point: Point on which we test if there is a trend shift
Returns
p_value : p value
"""
min_float = 1e-10
# Creating the left and the right window for slope detection
l_window = univariate_timeseries[:separation_point]
r_window = univariate_timeseries[separation_point:]
l_window_length = len(l_window)
r_window_length = len(r_window)
ts_length = l_window_length + r_window_length
eff_df = 4
if robust:
# Finding the effective degrees of freedom
auto_corr = acf(univariate_timeseries, nlags=ts_length)
auto_corr[np.isnan(auto_corr)] = 1
for i in range(1, ts_length):
eff_df = eff_df + (
((ts_length - i) / float(ts_length)) * auto_corr[i]
)
eff_df = max(
1,
int(
np.rint(
1
/ (
(1 / float(ts_length))
+ ((2 / float(ts_length)) * eff_df)
)
)
),
)
# Creating the left and right indices for running the regression
l_x, r_x = np.arange(l_window_length), np.arange(
r_window_length
)
# Linear regression on the left and the right window
(
l_slope,
l_intercept,
l_r_value,
l_p_value,
l_std_err,
) = stats.linregress(l_x, l_window)
(
r_slope,
r_intercept,
r_r_value,
r_p_value,
r_std_err,
) = stats.linregress(r_x, r_window)
# t-test for slope shift
l_window_hat = (l_slope * l_x) + l_intercept
r_window_hat = (r_slope * r_x) + r_intercept
l_sse = np.sum((l_window - l_window_hat) ** 2)
r_sse = np.sum((r_window - r_window_hat) ** 2)
l_const = (
l_window_length
* (l_window_length + 1)
* (l_window_length - 1)
/ 12
)
r_const = (
r_window_length
* (r_window_length + 1)
* (r_window_length - 1)
/ 12
)
C = (l_const * r_const) / (l_const + r_const)
total_sse = l_sse + r_sse
std_err = max(
np.sqrt(
total_sse
/ (C * (l_window_length + r_window_length - 4))
),
min_float,
)
t_stat = abs(l_slope - r_slope) / std_err
p_value = (1 - stats.t.cdf(t_stat, df=ts_length - eff_df)) * 2
if np.isnan(p_value):
p_value = 1
return {
"p_value": p_value,
"t_stat": t_stat,
"l_slope": l_slope,
"l_intercept": l_intercept,
"r_slope": r_slope,
"r_intercept": r_intercept,
"separation_point": int(separation_point),
"l_r_value": l_r_value,
"r_r_value": r_r_value,
}
Conclusion#
The test framework we’ve established is valid under the assumption that our error term is stationary.
For extensions beyond the stationary error term assumption, consider consulting the work by Pierre Perron, which proposes a test that accommodates an I(1) error term ([Perron and Yabu, 2009]).
References#
- Gre03
William H Greene. Econometric analysis. Pearson Education India, 2003.
- Ham20
James D Hamilton. Time series analysis. Princeton university press, 2020.
- NW87
Whitney K Newey and Kenneth D West. Hypothesis testing with efficient method of moments estimation. International Economic Review, pages 777–787, 1987.
- PY09
Pierre Perron and Tomoyoshi Yabu. Testing for shifts in trend with an integrated or stationary noise component. Journal of Business & Economic Statistics, 27(3):369–396, 2009.
- Whi14
Halbert White. Asymptotic theory for econometricians. Academic press, 2014.
- Woo10(1,2)
Jeffrey M Wooldridge. Econometric analysis of cross section and panel data. MIT press, 2010.
- Woo15
Jeffrey M Wooldridge. Introductory econometrics: A modern approach. Cengage learning, 2015.
- ZHZ+20
Bin Zuo, Zhaolu Hou, Fei Zheng, Lifang Sheng, Yang Gao, and Jianping Li. Robustness assessment of the rsd t-test for detecting trend turning in a time series. Earth and Space Science, 7(5):e2019EA001042, 2020.
- ZLSZ19
Bin Zuo, Jianping Li, Cheng Sun, and Xin Zhou. A new statistical method for detecting trend turning. Theoretical and Applied Climatology, 138(1):201–213, 2019.