In this post I will explore, how correlations between long- and short-term stock movements can be visualized using R and ggplot2. I will look at the current German composite index DAX, but any other set of stocks--for which data are available--is feasible.
Frist, the data need to be downloaded. For simplicity, I'll download the necessary data twice, first, for the long-term changes, then, for the short-term changes. The data are first stored in a list--which greatly facilitates calculating returns in case of different time series lengths--and then transformed into a data frame, since that is what ggplot2 requires. The market share relative to the composite index was downloaded from an external source and then imported manually. Finally, the data are visualized using the ggplot2 package, which allows including multiple variables (aesthetics) into a single plot. While the y- and x-axis respectively reflect the short and long term change in the valuation of the stock, the size of the bubbles is set to indicate the stock's market share within the composite index using current valuations. The color of a bubble indicates its long-term movement.
The benefit of this approach is that short/long term valuations can easily be compared. Only the two variables, t1 and t2, need to be adjusted to look at different timeframes.
In this blog entry, I will use data from the Global Terrorism Database to briefly explore some aspects of how the composition of terrorist events has changed over time.
The overall number of terrorist attacks steadily increased from 1970 until the end of the cold war. It then declined until the beginning of the new millennium after which a sharp increase followed.
When it comes to causality tests, the typical Granger-causality test can be problematic. Testing for Granger-causality using F-statistics when one or both time series are non-stationary can lead to spurious causality (He & Maekawa, 1999).
More formal explanations can be found in the original TY (1995) paper or for example here.
In this post, I will show how Professor Giles' example can be implemented in R.
The procedure is based on the following steps:
1. Test for integration (structural breaks need to be taken into account). Determine max order of integration (m). If none of the series in integrated, the usual Granger-causality test can be done.
2. Set up a VAR-model in the levels (do not difference the data).
3. Determine lag length. Let the lag length be p. The VAR model is thus VAR(p).
4. Carry out tests for misspecification, especially for residual serial correlation.
5. Add the maximum order of integration to the number of lags. This is the augmented VAR-model, VAR(p+m).
6. Carry out a Wald test for the first p variables only with p degrees of freedom.
You may want to do a test of cointegration. If series are cointegrated, there must be a causality. However, Toda and Yamamoto (1995) noted that one advantage of the TY-method is that you don't have to test for cointegration and, therefore, a pretest bias can be avoided.
The example is about causalities between prices in Robusta and Arabica coffee. The excel-file can be downloaded here. But in order to be loaded into R, the data should be put in the csv. format. The csv. file is available here.
Update: If you want examine the data interactively, have a look here.
The script below tests for causality between these two time series. The script is annotated, but let me know if I can clarify anything or if there is room for improvement.
Empirical contributions that focus on terrorism and its reception by the media face the fundamental difficultly of quantifying media coverage of terrorism, which has led to a relatively thin body of empirical research on the subject:
“Most scholarly research on terrorism and the media is not empirical […]. This is understandable, for a few reasons. First of all, measuring media attention is not precise. One must compare and combine front page coverage, top stories of a newscast, general commentaries, stories which entangle multiple incidents […] and the like, in an effort to be precise about the quantity of media coverage. Second, measuring media coverage is often an exhausting task involving measuring column inches in a newspaper or timing television broadcasts.” (Scott, 2001, p. 222)
Using data generated by social-networks (Lewis et al., 2008) or search-engines has become increasingly popular in social sciences. E.g., the number of Google-searches has been used to monitor the spread of influenza (Ginsberg et al. 2009) or to estimate public interest in science (Baram-Tsabari & Segev, 2009). Schietle (2012) shows that estimates of public sentiments based on the use of search engines closely correlate with the results from questionnaires. Besides simplicity, using data generated by the use of an internet search engine has the advantage of measuring public attention almost directly.
Data on Google searches for ‘terrorism’ are highly correlated with the number of articles in the NYT that contain the word ‘terrorism’.