Wed

07

Nov

2012

When it comes to causality tests, the typical Granger-causality test can be problematic. Testing for Granger-causality using *F*-statistics when one or both time series are non-stationary can lead to spurious causality (He & Maekawa, 1999).

Professor Giles gives an excellent example of how the TY method can be implemented.

More formal explanations can be found in the original TY (1995) paper or for example here.

In this post, I will show how Professor Giles' example can be implemented in R.

The procedure is based on the following steps:

1. Test for integration (structural breaks need to be taken into account). Determine max order of integration (*m*). If none of the series in
integrated, the usual Granger-causality test can be done.

2. Set up a VAR-model in the levels (do not difference the data).

3. Determine lag length. Let the lag length be p. The VAR model is thus VAR(p).

4. Carry out tests for misspecification, especially for residual serial correlation.

5. Add the maximum order of integration to the number of lags. This is the augmented VAR-model, VAR(p+m).

6. Carry out a Wald test for the ** first p variables only** with p degrees of freedom.

You may want to do a test of cointegration. If series are cointegrated, there must be a causality. However, Toda and Yamamoto (1995) noted that one advantage of the TY-method is that you don't have to test for cointegration and, therefore, a pretest bias can be avoided.

__________

The example is about causalities between prices in Robusta and Arabica coffee. The excel-file can be downloaded here. But in order to be loaded into R, the data should be put in the csv. format. The csv. file is available here.

Update: If you want examine the data interactively, have a look here.

The script below tests for causality between these two time series. The script is annotated, but let me know if I can clarify anything or if there is room for improvement.

________________________________________________________________________

**Update (28.9.14): Since several people
seemed to get errors with the old code, I made some changes which made the code shorter, a little easier to follow and hopefully more stable. I am currently very busy with other projects,
so apologies for taking so long to respond****.**

```
library(fUnitRoots)
library(urca)
library(vars)
library(aod)
library(zoo)
library(tseries)
#Load data
cof <- read.csv("http://www.christophpfeiffer.org/app/download/6938079586/coffee_data.csv", header=T,sep=";")
names(cof)
#Adjust Date format
cof["Date"]<-paste(sub("M","-",cof$Date),"-01",sep="")
#Visualize
plot(as.Date(cof$Date),cof$Arabica,type="l",col="black",lwd=2)
lines(as.Date(cof$Date),cof$Robusta,col="blue",lty=2,lwd=1)
legend("topleft",c("Arabica","Robusta"),col=c("black","blue"),lty=c(1,2),lwd=c(2,1),bty="n")
#Possible structural break in 1970s. Therefore only values from 1976:01 onwards are regarded
cof1<-cof[193:615,]
#Visualize
plot(as.Date(cof1$Date),cof1$Arabica,type="l",col="black",lwd=2,ylim=range(cof1$Robusta))
lines(as.Date(cof1$Date),cof1$Robusta,col="blue",lty=2,lwd=1)
legend("topright",c("Arabica","Robusta"),col=c("black","blue"),lty=c(1,2),lwd=c(2,1),bty="n")
#Test for unit roots
adf.test(cof$Arabica)
adf.test(cof$Robusta)
kpss.test(cof$Arabica)
kpss.test(cof$Arabica)
adf.test(diff(cof$Arabica,1))
adf.test(diff(cof$Robusta,1))
kpss.test(diff(cof$Arabica,1))
kpss.test(diff(cof$Robusta,1))
# Since first order differencing eliminates the unit root, the maximum order of integration
# is concluded to be I(1).
#Set up VAR-Model
#select lag order // either 2 or 6
VARselect(cof1[,2:3],lag=20,type="both")
#VAR Model, lag=2
V.2<-VAR(cof1[,2:3],p=2,type="both")
serial.test(V.2)
#VAR-Model, lag=6
V.6<-VAR(cof1[,2:3],p=6,type="both")
serial.test(V.6)
#Stability analysis
1/roots(V.6)[[1]] # ">1"
1/roots(V.6)[[2]] # ">1"
#Alternative stability analyis
plot(stability(V.6)) ## looks fine
# Model with p=6 is less likely to be serially correlated. Thus model with p=6 is selected.
# Wald-test for the first 6 lags
# The test can be directly done with the VAR model, however using the correct
# variables is a little more tricky
#VAR-Model, lag=7 (additional lag, though not tested)
V.7<-VAR(cof1[,2:3],p=7,type="both")
V.7$varresult
summary(V.7)
#Wald-test (H0: Robusta does not Granger-cause Arabica)
wald.test(b=coef(V.7$varresult[[1]]), Sigma=vcov(V.7$varresult[[1]]), Terms=c(2,4,6,8,10,12))
# Could not be rejected (X2=8.6; p=0.2)
#Wald.test (H0: Arabica does not Granger-cause Robusta)
wald.test(b=coef(V.7$varresult[[2]]), Sigma=vcov(V.7$varresult[[2]]), Terms= c(1,3,5,7,9,11))
# Could be rejected at 10% (X2=12.3; p=0.056)
# It seems that Arabica Granger-causes Robusta prices, but not the other way around.
```

You can download the R-code as well as the csv. file in "Files".

Let me know if you have any suggestions.

--- C

**References**

He, Z.; Maekawa, K. (1999). On spurious Granger causality. *Economic letters*, 73(3), 307–313.

Toda H.Y.; Yamamoto T. (1995). Statistical inference in vector autoregressions with possibly integrated
processes. *Journal of Econometrics*, 66, 225–250.

Write a comment

98#1Dave Giles (

Wednesday, 07 November 2012 23:42)Christoph. This is just great! Thanks for sharing this.

#2Claudio D. Shikida (

Thursday, 08 November 2012 16:10)Just change this for the tests.

It's not adf.test, but ur.df. For example:

ur.df(cof$Arabica)

ur.df(cof$Robusta)

ur.kpss(cof$Arabica)

ur.kpss(cof$Arabica)

Great job! Thanks!

#3Christoph(Thursday, 08 November 2012 17:38)Claudio, thanks for pointing out.

adf.test(), kpss.test() work es well, but we need the "tseries" package loaded. Should be fine now.

#4Cindy M.(Thursday, 06 December 2012 17:53)Thanks for sharing this! Two questions for you.

In lm1<-lm(arab~arab.l[,2:8]+robu.l[,2:8]+index(arab)) , can you please explain why you need to have the index(arab) term in the regression?

Also in:

>#Wald.test (H0: Arabica does not Granger-cause Robusta)

>vcov(lm2)

>wald.test(b=coef(lm1), Sigma=vcov(lm1), Terms= c(2:7),df=6)

Should this wald.test be using lm2 instead of lm1?

#5christophpfeiffer (

Thursday, 06 December 2012 18:08)Hi Cindy,

thanks for the catch, it is indeed lm2 and not lm1.

index(arab) caputres the trend index(robu) would of course also work and yields the same result.

-- Christoph

#6Cindy M.(Thursday, 06 December 2012 18:23)Thanks for the quick response!

#7london(Thursday, 21 March 2013 20:12)Dear Christoph,

I used the code on a data set which has 22 observations and 9 variables all variables entering into the vAR model. I determined p=k+dmax=3. this gives me 27 coefficient estimates. I have NAs in the results. Just wondering if I am doing something wrong or overfitting?

#8christophpfeiffer (

Thursday, 21 March 2013 21:52)Hi,

it seems that you have relatively few observations which makes statistical analysis difficult. Any chance obtain more data? Have you considered bootstrapping?

-- Christoph

#9london(Thursday, 21 March 2013 23:31)Dear Christoph,

I am trying to run a bootstrap regression, however, the regression is producing NAs in place of coefficients for the last 2 coefficients after every bootstrap.

would appreciate any suggestion!

#10christophpfeiffer (

Friday, 22 March 2013 17:32)If you like, you can send me your R-code and the data and I'll have a look.

christophpp@gmail.com

#11london(Friday, 22 March 2013 23:54)Dear Christoph,

Many thanks, I will email you my code and data set.

#12Aviral Kumar Tiwari(Sunday, 19 May 2013 10:08)I tried to use your R-codes but in my computer it it shows "Error: could not find function "wald.test""

when i used

> wald.test(b=coef(lm1), Sigma=vcov(lm1), Terms= c(9:14),df=6)

Please help me ..

#13christophpfeiffer (

Sunday, 19 May 2013 12:14)Hi Aviral,

make sure you have all packages installed. So you shouldn't get any error messages for the first 6 lines. You can install packages with the function

install.packages("...")

where ... is the name of the package. Specifically, for the function wald.test() you need the package "aod" to be installed. For this you need to execute the following line:

install.packages("aod")

once.

Hope this helps

Christoph

#14Aviral Kumar Tiwari(Sunday, 19 May 2013 13:55)thankssssss it worked out ...

#15Mike(Friday, 07 June 2013 00:49)Christoph,

The output for wald.test also gives the F-statistic, but your comments show the test is determined by the X2 results. What is the interpretation of the F output results in this case? Thanks!

#16christophpfeiffer (

Sunday, 16 June 2013 12:05)Mike,

sorry for the late answer, I just got back from vacation.

Toda and Yamamoto have shown that if you add an additional lag to a correctly specified VAR-model for which at least one time-series is integrated, the parameters asymptotically follow a chi-squared distribution. Assuming these two conditions are met, looking at the F-statistic for the augmented model would simply yield a meaningless answer.

#17Rick(Sunday, 14 July 2013 02:35)In your wald.test, why are you using Terms c(9:14) and c(2:7)? I would have thought it would be c(8:13) and (1:6) because the last lag is supposed to be exogenous and just there for the asymptotics?

#18Christoph(Monday, 15 July 2013 09:36)Hi Rick,

for the first test we need to assess if the coefficients of Robusta are significant for the price of Arabica. For the second test we are testing if the coefficients of Arabica are significant for the price of Robusta. These are the models:

Arabica = Intercept+A*X_1+B*X_2 + Trend # lm1

No. of terms: (1) (7) (7) (1) = 16

Robusta = Intercept+A*X_1+B*X_2 + Trend # lm2

No. of terms: (1) (7) (7) (1) = 16

With A being the coefficients for the values of Arabica, B coefficients for Robusta, X_1 and X_2 the respective lagged values. For each matrix of lagged values we have 7 terms but only want to test the first 6. Then, for the first Wald test we need to look at terms 9 to 14 and for the second model at terms 2 to 7. You can look at the coefficients of each model with coef(lm1) or coef(lm2).

#19Martin(Monday, 28 October 2013 13:21)I'm having trouble interpreting the serial.tests. What's the significance of the degrees of freedom and the p-values? (The p-value for serial.test(V.6) is .5.)

#20Bob(Wednesday, 11 December 2013 19:44)Any way to do this without the packages?

#21christophpfeiffer (

Saturday, 14 December 2013 15:14)Bob,

of course, but why would you do it without the packages? If you want to look deeper, you can just look at the functions provided by the packages.

#22christophpfeiffer (

Saturday, 14 December 2013 15:20)Martin,

sorry for the late reply. The serial test used here is a portmanteau test with the null hypothesis that residual errors are not serially correlated. A higher p-value indicates a favorable model which is why a model with 6 vs. 2 lags is chosen.

#23Yolande(Friday, 14 March 2014 20:18)I was wondering how the method and the R code would change if one of the series has unit root and the other does not.

#24Christoph(Friday, 14 March 2014 21:36)Yolande,

good question. The first step says:

1. Test for integration (structural breaks need to be taken into account). Determine max order of integration (m). If none of the series in integrated, the usual Granger-causality test can be done.

So, if one series has an order of integration of 1 and the other is not integrated (order of integration is 0), the maximum order of integration would be m=1.

Jumping to step 5:

5. Add the maximum order of integration to the number of lags. This is the augmented VAR-model, VAR(p+m).

This means we would still have to use the augmented model even if one series is not integrated at all. It's the maximum order of integration that counts. The only case in which you can use the standard Granger test for non-causality is when neither of the series are integrated.

#25Cory (

Monday, 17 March 2014 18:00)Christoph,

I was wondering how I could change the code in my blog to follow the procedure you outline above. I don't understand the idea of including an additional lag, but not testing for it.

http://r-datameister.blogspot.com/2014_02_01_archive.html

#26Yolande(Monday, 17 March 2014 18:00)Thank you for the prompt reply.

#27Yolande(Tuesday, 18 March 2014 15:06)I do have an additional question: Where in the code and algorithm the model v.7 was used after it is set

#28Pablo(Thursday, 03 April 2014 22:38)Hi,

I was wondering if you've ever used the function "causality" from the package "vars"? Are the results from this function compatible with the ones from your code?

Thanks a lot

#29Christoph(Saturday, 05 April 2014 16:17)Cory,

there are some issues with differencing the data first for integrated time series and then using the standard Granger causality test (e.g. Enders, W. Applied Econometric Times Series ).

You can think of the idea to add an additional lag and then not using it for the test as a trick to get rid of the problems involved in testing integrated time series for causality.

#30Christoph(Saturday, 05 April 2014 16:18)Yolande,

you are very welcome. :)

Good point, model v.7 is actually not necessary for the following tests.

#31Christoph(Saturday, 05 April 2014 16:20)Hi Pablo,

as far as I know, the causality function from 'vars' only uses the standard Granger causality test and would not work in this context.

#32Yolande(Monday, 07 April 2014 21:33)Hi Christoph,

This is a general question.

Why would I got -Infinity in my varselect. The lag length is 3 based on all criteria.

> VARselect(M,lag.max=20,type="both")

$selection

AIC(n) HQ(n) SC(n) FPE(n)

3 3 3 3

$criteria

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

AIC(n) 5.517709 4.919171 -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf

HQ(n) 5.139389 4.351692 -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf

SC(n) 5.693020 5.182138 -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf

FPE(n) 284.562815 237.788270 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

18 19 20

AIC(n) -Inf -Inf -Inf

HQ(n) -Inf -Inf -Inf

SC(n) -Inf -Inf -Inf

FPE(n) 0 0 0

Thanks

#33christophpfeiffer (

Friday, 18 April 2014 19:31)Hi Yolande,

this probably happens because the number of lags available in your dataset is too low to calculate the criteria for lags higher than two. If this does not explain it, you can send me the dataset and I will have a look.

#34Lohit(Friday, 02 May 2014 11:50)Hi Christoph,

When I execute the code I get the following error. How to debug this. Can you please help me with it. I am new R software. Thank you.

> #Wald-test (H0: Robusta does not Granger-cause Arabica)

> vcov(lm1)

Error in if (is.finite(resvar) && resvar < (mean(f)^2 + var(f)) * 1e-30) warning("essentially perfect fit: summary may be unreliable") :

missing value where TRUE/FALSE needed

#35Erdogan CEVHER(Saturday, 17 May 2014 00:32)Hi Christoph,

Your presentation is very nice. By the way, allowing the others to ask questions and your immediate answers are very nice. I grasped the topic. I want to gift you my econometry book as soon as it is finished.

#36Erdogan CEVHER(Saturday, 17 May 2014 15:45)Hi Christoph,

First, thanks millions of times for the above R code.

In #29, you said: "there are some issues with differencing the data first for integrated time series and then using the standard Granger causality test (e.g. Enders, W. Applied Econometric Times Series )."

However, what W. Enders' say in "Applied Econometric Times Series" 3E (2010) p.321 (equivalent of above 347 in the below link) that:

"The issue of differencing is important. If the VAR can be written entirely in first differences, hypothesis tests can be performed on any equation or any set of equations using t-tests or F-tests. This follows because all of the variables are stationary. As you will see in the next chapter, it is possible to write the VAR in first differences if the variables are I(1) and are not cointegrated. If the variables in question are cointegrated, the VAR cannot be written in first differences; hence, causality tests cannot be performed using t-tests or F-tests."

Book: http://tr.scribd.com/doc/215018030/180318659-Applied-Econometric-Time-Series-3rd-Edition-Walter-Enders-PDF

That is to say, the "issues" you mentioned are restricted to both of the variables are nonstationary AND COINTEGRATED.

When only one of the variables in a system of 2 variables is non-stationary, and the variables are not cointegrated, then Enders (2010) does not mention anything about the problematic nature of classical G-causality(1969).

Hence all in all: in a system of 2 variables:

Cases of Variables Solution Source

--------------- ----------------------- ------------------------

1. Both Stationary .. Classical G-causality(1969) .. Granger (1969)

2. Only one nonstationary ..(AFAIK:) Toda-Yamamoto(1995) .. ??????

3. Both nonstationary and not cointegrated .. TY(1995) .. TY(1995)

4. Both nonstationary and cointegrated (VAR-->VECM) .. (I am not sure:)TY(1995) .. ??????

I will be glad if you clarify my "AFAIK"s, "I am not sure"s and "????"s.

Best and Warm Regards,

#37Erdogan CEVHER(Saturday, 17 May 2014 16:16)Hi Chris, again me!

Look what Enders 3E 2010 p. 397 say (for my cases of Variables, 3rd case):

"There are three consequences if the I(1) variables are not cointegrated and you estimate the VAR in levels...:

For a VAR in levels, test for Granger causality conducted on the I(1) variables do not have a standard F-distribution. If you use first differences, you can use the standard F-distribution to test for Granger causality".

Hence, as far as I understand, (in the case of non-cointegrated non-stationary I(1) variables) Enders2010 suggest classical G-causality(1969) on 1st differences. However, many books says, "in a 2-variable system in which AT LEAST ONE OF THE VARIABLES IS NONSTATIONARY, TY1995 is applied. Hence, for the above 3rd case, does it seem that Both G1969 (on 1st differences) and TY1995 (on levels) be applied?

Also, what do you think for my above 4th case?

Regards...

#38Erdogan Cevher(Saturday, 17 May 2014 17:39)As if it seems no matter (stationary/non-stationary, integrated/cointegrated) what the variables in 2-variable system, TY1995 correctly finds G-causality:

Eiji Kurozumi, Khashbaatar Dashtseren 2011: "Statistical Inference in Possibly Integrated/Cointegrated Vector Autoregressions: Application to Testing for Structural Changes":

".....Toda and Yamamoto (1995) propose to estimate a model with intentionally augmented lags, and show that the estimated parameter of interest has a limiting normal distribution IRRESPECTIVE OF WHETHER THE VARIABLES ARE (TREND) STATIONARY, INTEGRATED, OR COINTEGRATED...."

Anyway, Chris, I wanna learn what you think as well.

#39Erdogan CEVHER(Saturday, 17 May 2014 22:06)An offer to the above code: The revealing of why the relevant packages was loaded; that is to say, for which function... would be better:

library(fUnitRoots)

library(urca)

library(vars) # VARselect, serial.test are in vars

library(aod) # wald.test is in aod

library(zoo)

library(tseries) #adf.test, kpss.test are in tseries

#40Erdogan CEVHER(Sunday, 18 May 2014 00:35)Another offer to the above code: I looked at the above code and observed the following code snippet:

# Model with additional lag is set up.

V.7<-VAR(cof1[,2:3],p=7,type="both")

is upset since it is not being used. I decided the above snippet to make happy! Notice that the difference between your R code and Dave Giles' Eviews code is the absence of STABILITY ANALYSIS OF THE VAR!

Here, you can add the following lines to the code to perform VAR stability analysis, and completely equalize your R code with Giles' Eviews:

# Stability analysis of VAR(7)

1/roots(V.7)[[1]] # result became ">1"

1/roots(V.7)[[2]] # result became ">1"

# Since the above two value exceeds (in absolute value) 1, V.7 is stable.

# Fact: VAR is stable if the eigenvalues of RHO part of VAR exceeds 1

equivalently you can use:

roots(V.7)[[1]] # result became "<1"

roots(V.7)[[2]] # result became "<1"

and get the same stability of the VAR(7).

Best and Warm Regards,

Erdogan CEVHER

#41Erdogan CEVHER(Sunday, 18 May 2014 01:03)Third Offer to the above code:

Though p values of serial.test's are enough to decide, it would be nicer to show the ACFs of the residuals of the VARs as well. Hence, I offer to add the followings just after the relevant tests:

plot(serial.test(V.2))

plot(serial.test(V.6))

#42Erdogan CEVHER(Sunday, 18 May 2014 01:07)In my #39,

Instead of:

library(vars) # VARselect, serial.test are in vars

....etc......

the following is nicer:

library(vars) # for VARselect, serial.test

....etc......

#43christophpfeiffer (

Saturday, 24 May 2014 17:16)Hi Erdogan,

thanks for making my argument more precise. The issues in differencing integrated time series first and then applying the standard Granger-causality test are that the following two conditions have to be met:

1. The time-series have to be integrated by the same order.

2. The time-series must not be conintegrated.

Personally, I would simply stick to the TY-method as soon as one series is integrated.

The TY-method also works when time series are cointegrated. There is no need to test for cointegration first (see TY 1995).

Thanks for the stability analysis.

All the best and good luck for your book!

#44christophpfeiffer (

Saturday, 24 May 2014 17:39)Lohit,

make sure you have installed all the packages (you only need to due this once):

install.packages("fUnitRoots")

install.packages("...")

Let me know if this helped.

#45agapi Somwaru(Sunday, 08 June 2014 03:11)Hi Christoph,

I installed all packages but I still get the error below

>vcov(lm1)

Error in if (is.finite(resvar) && resvar < (mean(f)^2 + var(f)) * 1e-30) warning("essentially perfect fit: summary may be unreliable") :

missing value where TRUE/FALSE needed

best regards

Agapi

#46christophpfeiffer (

Sunday, 08 June 2014 12:53)Agapi,

if you just copy and paste the entire code and successfully download the data this should not happen.

A few things to look at:

1. Have you copied the entire code?

2. Do you get any previous warnings?

3. Have you installed R correctly?

Looking at the warning, it seems that there is something wrong with the data you were supposed to download ('essentially perfect fit').

#Load data

cof <- read.csv("http://www.christophpfeiffer.org/app/download/6938079586/coffee_data.csv", header=T,sep=";")

#47NicholasG(Tuesday, 10 June 2014 01:47)Hi Christopher,

Thanks for the great post. I'm testing for Granger Causality under the TY approach you outline here but when using the function wald.test with d.o.f=p the response when the function is run is a Chi-squared test with df=p+m.

If you look up the help for the package wald.test {aod}, under the arguments for df it says:

"A numeric vector giving the degrees of freedom to be used in an F test, i.e. the degrees of freedom of the residuals of the model from which b and Sigma were fitted"

Given that we want an asymptotically chi-square statistic distributed with p df under the null, does the package wald.test have the ability to specify the degrees of freedom for a Chi-squared test, or does df only work for an F test?

If this is the case do you know any other Wald test package for R that would allow you to specify the degrees of freedom specifically for the Chi-Squared statistic?

#48Alexei V(Tuesday, 10 June 2014 22:27)Hi Christoph, thanks for the post. Question - looking at these two lines:

lm1<-lm(arab~arab.l[,2:8]+robu.l[,2:8]+index(arab))

lm2<-lm(robu~arab.l[,2:8]+robu.l[,2:8]+index(arab))

In the expression for lm2, shouldn't it be index(robu)? Seems that comment #18 above suggests that also?

Thanks!

#49christophpfeiffer (

Wednesday, 11 June 2014 00:34)Hi Alexei,

it does not matter for the end result. index(arab) and index(robu) should be identical.

#50agapi(Wednesday, 11 June 2014 02:34)Hi Christopher,

One of your points were if R was correctly installed.

I had to re-installed 32bit version of R.

Many thanks!

Agapi

#51Alexei V(Wednesday, 11 June 2014 19:49)Thanks.

Would you be able to point me in the direction of how to build a calibration dataset for this kind of modeling in general? E.g., generate two dataseries with specific lags, coefficients, amounts of noise, etc?

#52christophpfeiffer (

Monday, 16 June 2014 23:25)Hello Alexei,

sure. But I need some more specifics on your dataset.

What kind of data do you have and what is your outcome?

#53christophpfeiffer (

Tuesday, 17 June 2014 23:24)Hi Nicholas,

the df parameter in the Wald test only refers to the F-distribtuion and is actually not necessary for the TY-test.

I am not aware of any other wald test function in R. But in any case have a look here:

http://www.statlect.com/Wald_test.htm

and here:

http://www.utstat.toronto.edu/~brunner/oldclass/appliedf12/lectures/2101f12WaldWith

#54ADIL MUJTABA(Tuesday, 08 July 2014 20:35)Hi Christopher,

Fist of all thanks for the code & amazing explanations. It really makes it easy for someone like me who is a beginner to understand things.

You mentioned about the Portmanteau test & the fact that it is used to see if residuals are serially related.

Well for my data i am getting very low p values when i am doing serial test however when i do the stability analysis i have values >1.

Does that mean residuals are serially correlated & if yes what additional steps do i need to do ?

#55Greg W(Wednesday, 23 July 2014 16:52)I'm having the exact same problem as Lohit.

I have all of the packages installed and I have downloaded the correct data. I am directly copying and pasting your code.

The problem lies in `lm1`. If you simply run `summary(lm1)` you will get the same error. By the way, the error does not indicate a perfect fit: it indicates that one of `summary.lm`'s internal checks (which just happens to be the check for perfect fit) is failing due to an unexpected missing value. If you go back and try to run the offending part of `summary` manually, you will find that the problem lies in computing `(mean(f)^2 + var(f))`. This is because there are `NA` values in `f`, which apparently shouldn't be there, since this check is not equipped to handle them.

`f` in this case is a copy of `lm1$fitted.values`, which is calculated deep inside R. Note that there should, in fact, _never_ be `NA` values in the `fitted.values` element of an `lm` object. To see this, you can run:

ir <- head(iris); ir[1,1] = NA; l <- lm(ir$Sepal.Length~ir$Sepal.Width)

and see that `l$residuals` and `l$fitted.values` do not have `NA` values, but are simply shortened. Moreover, the `NA` values in `lm1$fitted.values` do not correspond to the `NA` values in the model matrix (which are due to the lags).

So I'm completely stumped as to what's going wrong.

The workaround is to manually compute fitted values and replace the line `f <- z$fitted.values` (in `summary.lm`) with `f <- fitted_values_you_computed_manually`.

#56Greg W(Wednesday, 23 July 2014 19:57)As an addendum to my last comment, this actually does have to do with how R (or at least my installation of R) handles dropping NA values. I'm a bit surprised this is an issue here, I've never had a problem with NA values in `lm` before, and I'm tempted to file a bug report on it.

In any case, this is a workaround.

lm1 = lm(arab.l[,1] ~ arab.l[,-1] + robu.l[,-1] + index(X), subset=8:arab))

lm2 = lm(robu.l[,1] ~ arab.l[,-1] + robu.l[,-1] + index(X), subset=8:length(robu))

Or something along the lines of

omit = 1:7

lm1 = lm(arab.l[-omit,1] ~ arab.l[-omit,-1] + robu.l[-omit,-1] + index(X)[-omit])

lm2 = lm(robu.l[-omit,1] ~ arab.l[-omit,-1] + robu.l[-omit,-1] + index(X)[-omit])

#57Greg W(Wednesday, 23 July 2014 21:30)And sorry to triple-comment, but the entire issue is avoided by simply accessing the `lm` objects _inside_ the VAR object rather than re-computing them.

`V.6$varresult` is actually just a list of `lm` objects that are effectively identical to lm1 and lm2. Then you can just do run:

wald.test(b=coef(V.6$varresult[[1]]), Sigma=vcov(V.6$varresult[[1]]), Terms= c(8:13),df=6)

Maybe a tiny bit less readable, but it a) avoids the strange problem with NAs and b) leads to shorter code with fewer distinct variables and (in my opinion) fewer chances to make mistakes. Note, however, that I changed the `Terms` argument because the order of the columns in `vcov(V.6$varresult[[1]])` is different, although it's arguably more intuitive.

#58Marsi(Thursday, 14 August 2014 19:29)hello

I have the case of the presence of three variables. To study the non-causality in the following equations be sufficient add the third variable, and the analysis is the same ?? and in the case when we have the presence of structural breaks in our series how you can proceed?

#59Jonathan(Thursday, 28 August 2014)Hello Chris and thank you for this useful post.

I am wondering if you could point to a paper that employs a straightforward implementation of the TY Granger, that does not focus on the method, per se, but more on the specific application. I am an ecologist, not an economist, so am trying to get a feel for what should be reported.

Second I was wondering if you had an opinion on setting a maximum value for the lag term? Or, should the data just speak for themselves (i.e. minimum AIC)

thanks so much.

Jonathan

#60ambooj(Saturday, 13 September 2014 12:23)I downloaded coffee_data.csv and then while running the code, I get the following error

> vcov(lm1)

Error in if (is.finite(resvar) && resvar < (mean(f)^2 + var(f)) * 1e-30) warning("essentially perfect fit: summary may be unreliable") :

missing value where TRUE/FALSE needed

I reinstalled the packages and loaded all the libraries but still getting the error. except I didnt reinstalled R.

#61ambooj(Monday, 15 September 2014)I downloaded coffee csv from files section and ran the above code but still getting the error as belowError in if (is.finite(resvar) && resvar < (mean(f)^2 + var(f)) * 1e-30) warning("essentially perfect fit: summary may be unreliable") :

>vcov(lm1)

missing value where TRUE/FALSE needed.

#62james(Monday, 22 September 2014 20:20)Hi Christopher,

What is the best way to build a predictive model once you have some indication of Granger causality? Would you use:

lm.predict1 <- lm(robu~arab.l[,2:7]+)

lm.predict2 <- lm(robu~arab.l[,2:7]+index(arab))

lm.predict3 <- lm(robu~robu.l[,2:7]+arab.l[,2:7]+index(arab))

or some other method?

Thanks,

James

#63Ashutosh Kumar(Saturday, 27 September 2014 17:15)Hi Christoph,

Error in if (is.finite(resvar) && resvar < (mean(f)^2 + var(f)) * 1e-30) warning("essentially perfect fit: summary may be unreliable") :

I installed all packages but I still get the error below

>vcov(lm1)

missing value where TRUE/FALSE needed

When I showed it to my senior he suggested me " the error message means the lhs and rhs of your regression model are pretty much equal with very little stochastic variation. Check your regression specification and data."

Kindly explain to correct this error.

best regards

Ashutosh Kumar

#64christophpfeiffer (

Sunday, 28 September 2014 18:11)Hi everyone,

I made some changes to the code which should take care of the NA error you were getting. Let me know if it works.

#65christophpfeiffer (

Sunday, 28 September 2014 18:13)Greg,

thanks for the input and sorry for my delay in responding. Using the existing VAR model is indeed more straightforward than the manual computation.

We only need to change the V.6 model to a V.7 model and use the correct terms which can be a bit tedious.

#66Ashutosh Kumar(Thursday, 02 October 2014 15:48)Hi Christopher,

Thanks a lot... Your new code is working alright...

#67lingaraj(Friday, 17 October 2014 22:16)Thanks a lot Sir for giving such codes to estimate TY causality . Thank You Very much.

#68Marwa Bouzidi(Monday, 20 October 2014 20:16)First I want to thank you for this code of TY Granger causality. I have a problem while trying to set the VAR model through VARselect, I get no output "Error in y - z$residuals " What should I do please? Thanks in advance

#69Christoph(Monday, 27 October 2014 16:31)Marwa,

are you copying & pasting the above code or are you working on your own dataset? If you work with your own code, please share it.

#70Marwa Bouzidi(Wednesday, 05 November 2014 12:06)Dear Christoph it has worked thank you!

But I want to know whether the adf.test and the kpss.test are including the trend and the drift and how can I test their significance!

thank you

#71renard(Friday, 07 November 2014 23:15)Thanks for this nice code example!

However, I have two questons:

1) Why do we separately check for serial correlation in the error terms. Could this point not be addressed using a Newy-West estimator for the covariance matrix in the WaldTest?

2) How would it be possible to test whether the cumulative impact is zero (i.e. the sum of the coefficients is zero).

Thanks in advance for your reply!

#72Christoph(Saturday, 07 March 2015 10:53)Renard,

apologies for my late reply.

1) I am sure there are other methods that work as well.

2) The Wald test is pretty close to the cumulative test. At least for the no-impact version. If all coefficients are zero, the cumulative impact is zero as well. For the other case it may be that effects cancel each other out. To assess this, you could have a look at the coefficients and see if they can possibly cancel each other out.

#73Basile(Tuesday, 02 June 2015 05:43)Thank you very much for sharing the code Christoph.

One quick question: why is "cof" and not "cof1" used in the UR tests?

#74Christoph(Saturday, 06 June 2015 17:34)Hi Basile,

you are right, cof1 probably makes more sense, since that is the data we are building the model for. The results are probably not affected, but still a good point.

#75Braulio Quintero (

Monday, 22 June 2015 22:35)Hi Christoph:

Thanks for the code in R! I appreciate it greatly. However, (1) could you share a code snippet on how to do the Wald Test with the VAR? You comment: " # The test can be directly done with the VAR model, however using the correct & # variables is a little more tricky" and (2) why is it that you do the Wald Test with one additional lag?

#76Braulio Quintero(Friday, 26 June 2015 22:52)I have three variables: GDP, CO2 emissions, and Energy Consumption for Puerto Rico. I have chosen the VAR model (it has 9 lags, don't know if having that many lags is affects my analysis), but since I have three variables I am a bit confused on how to run the Wald Test. Especially on how to choose the "Terms" argument for each test. I hope that you can help me with my question.

#77sam(Friday, 17 July 2015 17:42)what if the two time series that you have used namely " arabica" and "robusta " have different order of integration (d1 and d2 respectively )then which one should be used for the augmented VAR model ..

Thanks

#78sam(Friday, 17 July 2015 19:47)I went through your code and looked very promising , i really appreciate the amount of time that you are spending in replying to everybody's query its laudatory . I tried to run your code but is showing a problem ,

Error in if (is.finite(resvar) && resvar < (mean(f)^2 + var(f)) * 1e-30) warning("essentially perfect fit: summary may be unreliable") :

Error in if (is.finite(resvar) && resvar < (mean(f)^2 + var(f)) * 1e-30) warning("essentially perfect fit: summary may be unreliable") :

Error in if (is.finite(resvar) && resvar < (mean(f)^2 + var(f)) * 1e-30) warning("essentially perfect fit: summary may be unreliable") :

Error in if (is.finite(resvar) && resvar < (mean(f)^2 + var(f)) * 1e-30) warning("essentially perfect fit: summary may be unreliable") :

#Wald-test (H0: Robusta does not Granger-cause Arabica)

> vcov(lm1)

missing value where TRUE/FALSE needed

> wald.test(b=coef(lm1), Sigma=vcov(lm1), Terms= c(9:14),df=6)

missing value where TRUE/FALSE needed

> # Could not be rejected (X2=8.6; p=0.2)

>

> #Wald.test (H0: Arabica does not Granger-cause Robusta)

> vcov(lm2)

missing value where TRUE/FALSE needed

> wald.test(b=coef(lm2), Sigma=vcov(lm2), Terms= c(2:7),df=6)

missing value where TRUE/FALSE needed ....

I went through all the questions and answers posted up there and came across people who had faced similar kind of problem and you have provided the remedy in the form of changes in the code but I am still facing that problem ... I have installed R properly and have loaded all the required packages ...

One more question is regarding the values that you have taken

#####

lm1<-lm(arab~arab.l[,2:8]+robu.l[,2:8]+index(arab))

lm2<-lm(robu~arab.l[,2:8]+robu.l[,2:8]+index(arab))

Why you have taken arab[,2:8] and robu too ??

#Wald-test (H0: Robusta does not Granger-cause Arabica)

vcov(lm1)

wald.test(b=coef(lm1), Sigma=vcov(lm1), Terms= c(9:14),df=6)

In Terms=c(9:14) , what's the reason behind taking 9:14 value , would you elaborate it please

# Could not be rejected (X2=8.6; p=0.2)

######

Thank you

#79VARS Newby(Friday, 07 August 2015 15:35)Do you have script for var model using exogenous variables for scenario anlysis?

#80Christoph(Sunday, 09 August 2015 14:46)Hi Braulio,

the alternative code for the Wald test is:

#Wald-test (H0: Robusta does not Granger-cause Arabica)

vcov(lm1)

wald.test(b=coef(lm1), Sigma=vcov(lm1), Terms= c(9:14),df=6)

#Wald.test (H0: Arabica does not Granger-cause Robusta)

vcov(lm2)

wald.test(b=coef(lm2), Sigma=vcov(lm2), Terms= c(2:7),df=6)

#81Christoph(Sunday, 09 August 2015 14:47)Using the additional lag in the VAR model but not testing for it is the trick of the TY-method.

#82Christoph(Sunday, 09 August 2015 14:49)Hi Sam,

you always use the highest order of integration for the augmented var model.

#83Christoph(Sunday, 09 August 2015 14:54)Hi VARS Newby,

you can include exogenous variables directly in the definition of the VAR model.

for example: http://stackoverflow.com/questions/22513336/var-with-exogenous-variables

#84Christoph(Sunday, 09 August 2015 14:57)Sam,

I just tested the code. It seems to be working fine for me. Have you figured out a solution in the meanwhile? If not you can send me the code you are actually using and I will have a look.

#85Christoph(Sunday, 09 August 2015 14:59)Looks like we are getting close to a hundred comments. Seems about time to organize all comments into a FAQ to make them more accessible. If anyone is willing to help out with this, let me know!

#86Rodrigo(Wednesday, 19 August 2015 22:36)Christoph,

Why did you use the following term?

Terms=c(2,4,6,8,10,12), Terms= c(1,3,5,7,9,11)

#Wald-test (H0: Robusta does not Granger-cause Arabica)

wald.test(b=coef(V.7$varresult[[1]]), Sigma=vcov(V.7$varresult[[1]]), Terms=c(2,4,6,8,10,12))

# Could not be rejected (X2=8.6; p=0.2)

#Wald.test (H0: Arabica does not Granger-cause Robusta)

wald.test(b=coef(V.7$varresult[[2]]), Sigma=vcov(V.7$varresult[[2]]), Terms= c(1,3,5,7,9,11))

# Could be rejected at 10% (X2=12.3; p=0.056)

#87Rodrigo(Wednesday, 19 August 2015 22:38)Is there a way to calculate the terms range?

#88Chris(Wednesday, 16 September 2015 01:36)Hello Christoph,

could you explain what you mean by:

#Alternative stability analyis

plot(stability(V.6)) ## looks fine

Is a rather flat line the best of rather a high fluctuation?

#89Thiago.MAF(Wednesday, 09 December 2015 00:07)Hey,

Thanks a lot for this awesome article.

I have similar data. A bit bigger; several time series calculated from the polynomial fit of my experimental data.

Every time I try to run wald.test() it returns: "Error in L %*% V : non-conformable arguments"

Any suggestion where I can find info?

Best wishes,

thiago

#90Chris(Wednesday, 12 October 2016 21:23)Hi Chrostoph,

this is really great work! One question. What if

#Stability analysis

1/roots(V.6)[[1]] # ">1"

1/roots(V.6)[[2]] # ">1"

the test always turns our pairs where at least one of the values is <1. Still choosing the lag with the lowest distortion or giving up the analysis?

#91Christoph(Thursday, 13 October 2016 09:47)Chris,

I would also look at the stability analysis from

plot(stability(V.6))

This can show you when the results are stable and when they are not. You may then want to exclude certain periods from your analysis to account for that and redo the test after the adjustment.

#92Christoph(Thursday, 13 October 2016 09:52)Thiago,

I realize my response is quite delayed to say the least. Shoot me an email, if the problem persists.

#93Chris(Saturday, 15 October 2016 19:10)Dear Christoph,

thank you for your quick reply! Yes, I already did that and that's exactly how I would also approach it now. I concluded that I could solve the problem by excluding one very extreme outlier from the dataset.

One last question: This paper here http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.198.9108&rep=rep1&type=pdf mentions a bootstrap simulations for the MWALD test. Do you have any recommendation how to do this? That one didn't really help: http://stackoverflow.com/questions/32057573/wald-testing-bootstrapped-estimates-in-r

#94Manuel(Wednesday, 19 October 2016 07:37)Dear Christoph,

great work! Only one question: Why is there a constant+trend ("both") considered for the VAR model, but not for the Wald test?

#95Christoph(Wednesday, 26 October 2016 22:13)Thank you Manuel. I see what you are getting at.

The trend is implicitly included in the Wald test since that is the model that is used for the Wald test, however for the test we only want to focus on whether there is a causality between Robusta and Arabica. So only these variables are included.

#96Nick(Tuesday, 27 December 2016 22:30)Hello Christoph,

In case that I have more than 2 variables how I can check if x is caused by y and z.

I have tried to adj a bit your code from the aor library but it does not work.

If you have hints that would be helpful.

All the best

Nick

#97Jozefina(Wednesday, 04 January 2017 11:03)Hi Christoph,

Thank you very much for the amazing script! I would like to ask you that in the case of more than 1 structural break, let´s say 3 or 4 how should we proceed with the following process?

Thank you very much.

Jozefina

#98Christoph(Wednesday, 04 January 2017 12:12)Hi Jozefina,

I would recommend finding the structural breaks using Bai-Perron and then testing for each phase separately. You can find the Bai-Perron test in the strucchange package.

Thank you

Christoph