-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathts.tex
8535 lines (6966 loc) · 405 KB
/
ts.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[]{book}
\usepackage{lmodern}
\usepackage{amssymb,amsmath}
\usepackage{ifxetex,ifluatex}
\usepackage{fixltx2e} % provides \textsubscript
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\else % if luatex or xelatex
\ifxetex
\usepackage{mathspec}
\else
\usepackage{fontspec}
\fi
\defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
\fi
% use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
% use microtype if available
\IfFileExists{microtype.sty}{%
\usepackage{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\usepackage[margin=1in]{geometry}
\usepackage{hyperref}
\hypersetup{unicode=true,
pdftitle={Applied Time Series Analysis with R},
pdfauthor={Stéphane Guerrier, Roberto Molinari, Haotian Xu and Yuming Zhang},
pdfborder={0 0 0},
breaklinks=true}
\urlstyle{same} % don't use monospace font for urls
\usepackage{natbib}
\bibliographystyle{acm}
\usepackage{color}
\usepackage{fancyvrb}
\newcommand{\VerbBar}{|}
\newcommand{\VERB}{\Verb[commandchars=\\\{\}]}
\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}}
% Add ',fontsize=\small' for more characters per line
\usepackage{framed}
\definecolor{shadecolor}{RGB}{248,248,248}
\newenvironment{Shaded}{\begin{snugshade}}{\end{snugshade}}
\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{\textbf{#1}}}
\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{#1}}
\newcommand{\DecValTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}}
\newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}}
\newcommand{\FloatTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}}
\newcommand{\ConstantTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\CharTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\SpecialCharTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\StringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\VerbatimStringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\SpecialStringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\ImportTok}[1]{#1}
\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textit{#1}}}
\newcommand{\DocumentationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\AnnotationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\CommentVarTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{#1}}
\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\VariableTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\ControlFlowTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{\textbf{#1}}}
\newcommand{\OperatorTok}[1]{\textcolor[rgb]{0.81,0.36,0.00}{\textbf{#1}}}
\newcommand{\BuiltInTok}[1]{#1}
\newcommand{\ExtensionTok}[1]{#1}
\newcommand{\PreprocessorTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textit{#1}}}
\newcommand{\AttributeTok}[1]{\textcolor[rgb]{0.77,0.63,0.00}{#1}}
\newcommand{\RegionMarkerTok}[1]{#1}
\newcommand{\InformationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\WarningTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\AlertTok}[1]{\textcolor[rgb]{0.94,0.16,0.16}{#1}}
\newcommand{\ErrorTok}[1]{\textcolor[rgb]{0.64,0.00,0.00}{\textbf{#1}}}
\newcommand{\NormalTok}[1]{#1}
\usepackage{longtable,booktabs}
\usepackage{graphicx,grffile}
\makeatletter
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
\makeatother
% Scale images if necessary, so that they will not overflow the page
% margins by default, and it is still possible to overwrite the defaults
% using explicit options in \includegraphics[width, height, ...]{}
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
\setcounter{secnumdepth}{5}
% Redefines (sub)paragraphs to behave more like sections
\ifx\paragraph\undefined\else
\let\oldparagraph\paragraph
\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
\fi
\ifx\subparagraph\undefined\else
\let\oldsubparagraph\subparagraph
\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
\fi
%%% Use protect on footnotes to avoid problems with footnotes in titles
\let\rmarkdownfootnote\footnote%
\def\footnote{\protect\rmarkdownfootnote}
%%% Change title format to be more compact
\usepackage{titling}
% Create subtitle command for use in maketitle
\providecommand{\subtitle}[1]{
\posttitle{
\begin{center}\large#1\end{center}
}
}
\setlength{\droptitle}{-2em}
\title{Applied Time Series Analysis with R}
\pretitle{\vspace{\droptitle}\centering\huge}
\posttitle{\par}
\author{Stéphane Guerrier, Roberto Molinari, Haotian Xu and Yuming Zhang}
\preauthor{\centering\large\emph}
\postauthor{\par}
\predate{\centering\large\emph}
\postdate{\par}
\date{August 21 2019}
\usepackage{booktabs}
\usepackage{amsthm}
\makeatletter
\def\thm@space@setup{%
\thm@preskip=8pt plus 2pt minus 4pt
\thm@postskip=\thm@preskip
}
\makeatother
\usepackage{amsthm}
\newtheorem{theorem}{Theorem}[chapter]
\newtheorem{lemma}{Lemma}[chapter]
\theoremstyle{definition}
\newtheorem{definition}{Definition}[chapter]
\newtheorem{corollary}{Corollary}[chapter]
\newtheorem{proposition}{Proposition}[chapter]
\theoremstyle{definition}
\newtheorem{example}{Example}[chapter]
\theoremstyle{definition}
\newtheorem{exercise}{Remark}[chapter]
\theoremstyle{remark}
\newtheorem*{remark}{Remark}
\newtheorem*{solution}{Solution}
\let\BeginKnitrBlock\begin \let\EndKnitrBlock\end
\begin{document}
\maketitle
{
\setcounter{tocdepth}{1}
\tableofcontents
}
\part{Foundation}\label{part-foundation}
\chapter{Introduction}\label{introduction}
Welcome to ``Applied Time Series Analysis with \texttt{R}''. This book
is intended as a support for the course of STAT 463 (Applied Time Series
Analysis) given at Penn State University. It contains an overview of the
basic procedures to adequately approach a time series analysis with
insight to more advanced analysis of time series. It firstly introduces
the basic concepts and theory to appropriately use the applied tools
that are presented in the second (and main) part of the book. In the
latter part the reader will learn how to use descriptive analysis to
identify the important characteristics of a time series and then employ
modelling and inference techniques (made available through \texttt{R}
funtions) that allow to describe a time series and make predictions. The
last part of the book will give introductory notions on more advanced
analysis of time series where the reader will achieve a basic
understanding of the tools available to analyse more complex
characteristics of time series.
\BeginKnitrBlock{rmdimportant}
This document is \textbf{under development} and it is therefore
preferable to always access the text online to be sure you are using the
most up-to-date version. Due to its current development, you may
encounter errors ranging from broken code to typos or poorly explained
topics. If you do, please let us know! Simply add an issue to the GitHub
repository used for this document (which can be accessed here
\url{https://github.com/SMAC-Group/ts/issues}) and we will make the
changes as soon as possible. In addition, if you know RMarkdown and are
familiar with GitHub, make a pull request and fix an issue yourself.
\EndKnitrBlock{rmdimportant}
\section{Conventions}\label{conventions}
Throughout this book, \texttt{R} code will be typeset using a
\texttt{monospace} font which is syntax highlighted. For example:
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{a =}\StringTok{ }\NormalTok{pi}
\NormalTok{b =}\StringTok{ }\FloatTok{0.5}
\KeywordTok{sin}\NormalTok{(a}\OperatorTok{*}\NormalTok{b)}
\end{Highlighting}
\end{Shaded}
Similarly, \texttt{R} output lines (that usally appear in your Console)
will begin with \texttt{\#\#} and will not be syntax highlighted. The
output of the above example is the following:
\begin{verbatim}
## [1] 1
\end{verbatim}
Aside from \texttt{R} code and its outputs, this book will also insert
some boxes that will draw the reader's attention to important, curious
or otherwise informative details. An example of these boxes was seen at
the beginning of this introduction where an important aspect was pointed
out to the reader regarding the ``under construction'' nature of this
book. Therefore the following boxes and symbols can be used to represent
information of different nature:
\BeginKnitrBlock{rmdimportant}
This is an important piece of information.
\EndKnitrBlock{rmdimportant}
\BeginKnitrBlock{rmdnote}
This is some additional information that could be useful to the reader.
\EndKnitrBlock{rmdnote}
\BeginKnitrBlock{rmdcaution}
This is something that the reader should pay caution to but should not
create major problems if not considered.
\EndKnitrBlock{rmdcaution}
\BeginKnitrBlock{rmdwarning}
This is a warning which should be heeded by the reader to avoid problems
of different nature.
\EndKnitrBlock{rmdwarning}
\BeginKnitrBlock{rmdtip}
This is a tip for the reader when following or developing something
based on this book.
\EndKnitrBlock{rmdtip}
Using the same convention as in \citet{friedman2001elements}, the symbol
😱 indicates a technically difficult section which may be skipped without
interrupting the flow of the discussion.
\section{Bibliographic Note}\label{bibliographic-note}
This is not the first (or the last) book that has been written on time
series analysis. Indeed, this can be seen as a book that brings together
and reorganizes information and material from other sources structuring
and tailoring it to a course in basic time series analysis. The main and
excellent references (which are far from being an exhaustive review of
literature) that can be used to have a more in-depth view of different
aspects treated in this book are \citet{cochrane2005time},
\citet{hamilton1994time} and \citet{shumway2010time}.
\section{Acknowledgements}\label{acknowledgements}
The text has benefited greatly from the contributions of many people who
have provided extremely useful comments, suggestions and corrections.
These are:
\begin{itemize}
\tightlist
\item
\href{https://github.com/zionward}{Ziying Wang}
\item
\href{https://github.com/Lyle-Haoxian}{Haoxian Zhong}
\item
\href{https://www.linkedin.com/in/zhihan-xiong-988152114}{Zhihan
Xiong}
\item
\href{https://github.com/Nathanael-Claussen}{Nathanael Claussen}
\item
\href{https:://github.com/munsheet}{Justin Lee}
\end{itemize}
The authors are particularly grateful to James Balamuta who introduced
them to the use of the different tools provided by the RStudio
environment and greatly contributed to an earlier version of this book:
\begin{itemize}
\tightlist
\item
\href{https::/github.com/coatless}{James Balamuta}
\end{itemize}
\section{License}\label{license}
You can redistribute it and/or modify this book under the terms of the
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
License (CC BY-NC-SA) 4.0 License.
\chapter{Basic Elements of Time Series}\label{introtimeseries}
\begin{quote}
``\emph{Prévoir consiste à projeter dans l'avenir ce qu'on a perçu dans
le passé.}'' -- Henri Bergson
\end{quote}
\BeginKnitrBlock{rmdimportant}
To make use of the R code within this chapter you will need to install
(if not already done) and load the following libraries:
\begin{itemize}
\tightlist
\item
\href{http://simts.smac-group.com/}{simts};
\item
\href{https://cran.r-project.org/web/packages/astsa/index.html}{astsa};
\item
\href{https://cran.r-project.org/web/packages/mgcv/index.html}{mgcv}.
\end{itemize}
These libraries can be install as follows:
\EndKnitrBlock{rmdimportant}
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{install.packages}\NormalTok{(}\KeywordTok{c}\NormalTok{(}\StringTok{"devtools"}\NormalTok{, }\StringTok{"astsa"}\NormalTok{, }\StringTok{"mgcv"}\NormalTok{))}
\NormalTok{devtools}\OperatorTok{::}\KeywordTok{install_github}\NormalTok{(}\StringTok{"SMAC-Group/simts"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
and simply load them using:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{library}\NormalTok{(astsa)}
\KeywordTok{library}\NormalTok{(mgcv)}
\KeywordTok{library}\NormalTok{(simts)}
\end{Highlighting}
\end{Shaded}
We can start the discussion on the basic elements of time series by
using a practical example from real data made available through the
\texttt{R} software. The data represent the global mean land--ocean
temperature shifts from 1880 to 2015 (with base index being the average
temperatures from 1951 to 1980) and this time series is represented in
the plot below.
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Load data}
\KeywordTok{data}\NormalTok{(globtemp, }\DataTypeTok{package =} \StringTok{"astsa"}\NormalTok{)}
\CommentTok{# Construct gts object}
\NormalTok{globtemp =}\StringTok{ }\KeywordTok{gts}\NormalTok{(globtemp, }\DataTypeTok{start =} \DecValTok{1880}\NormalTok{, }\DataTypeTok{freq =} \DecValTok{1}\NormalTok{, }\DataTypeTok{unit_ts =} \StringTok{"C"}\NormalTok{, }\DataTypeTok{name_ts =} \StringTok{"Global Temperature Deviations"}\NormalTok{, }\DataTypeTok{data_name =} \StringTok{"Evolution of Global Temperatures"}\NormalTok{)}
\CommentTok{# Plot time series}
\KeywordTok{plot}\NormalTok{(globtemp)}
\end{Highlighting}
\end{Shaded}
\begin{center}\includegraphics{ts_files/figure-latex/glotempExample-1} \end{center}
These data have been used as a support in favour of the argument that
the global temperatures are increasing and that global warming has
occured over the last half of the twentieth century. The first approach
that one would take is to try and measure the average increase by
fitting a model having the form:
\[
X_t = f(t) + \varepsilon_t,
\] where \(X_t\) denotes the global temperatures deviation and
\(f(\cdot)\) is a ``smooth'' function such that
\(\mathbb{E}[X_t] - f(t) = 0\) for all \(t\). In general,
\(\varepsilon_t\) is assumed to follow a normal distribution for
simplicity. The goal in this context would therefore be to evaluate if
\(f(t)\) (or a suitable estimator of this function) is an increasing
function (especially over the last decades). In order to do so, we would
require the residuals from the fitted model to be independently and
identically distributed (iid). Let us fit a (nonparametric) model with
the years (time) as explanatory variable using the code below:
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{time =}\StringTok{ }\KeywordTok{gts_time}\NormalTok{(globtemp)}
\NormalTok{fit =}\StringTok{ }\KeywordTok{gam}\NormalTok{(globtemp }\OperatorTok{~}\StringTok{ }\KeywordTok{s}\NormalTok{(time))}
\end{Highlighting}
\end{Shaded}
and check the residuals from this model using:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{check}\NormalTok{(fit, }\DataTypeTok{simple =} \OtherTok{TRUE}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## Warning in if (class(model) == "fitsimts") {: the condition has length > 1
## and only the first element will be used
\end{verbatim}
\begin{verbatim}
## Warning in check(fit, simple = TRUE): If 'lm' model is considered, only the
## full diagnostic plots can be provided, not the simple version.
\end{verbatim}
\begin{center}\includegraphics{ts_files/figure-latex/gamresid-1} \end{center}
It can be seen from the upper left plot that the trend appears to be
removed and, if looking at the residuals as one would usually do in a
regression framework, the residual plots seem to suggest that the
modelling has done a relatively good job since no particular pattern
seems to emerge and their distribution is quite close to being Gaussian.
However, is it possible to conclude from the plots that the data are
\emph{iid} (i.e.~independent and identically distributed)? More
specifically, can we assume that the residuals are independent? This is
a fundamental question in order for inference procedures to be carried
out in an appropriate manner and to limit false conclusions. Let us
provide an example through a simulated data set where we know that there
is an upward trend through time and our goal would be to show that this
trend exists. In order to do so we consider a simple model where
\(f(t)\) has a simple parametric form, i.e. \(f(t) = \beta \cdot t\) and
we employ the following data generating process:
\[X_t = \beta \cdot t + Y_t,\] where
\[Y_t = \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + \varepsilon_t,\] and where
\(\varepsilon_t \sim \mathcal{N}(0, \sigma^2)\). Intuitively, \(Y_t\) is
not an \emph{iid} sequence of random variables except in the case where
\(\phi_1 = \phi_2 = 0\). In the following chapters we shall see that
this intuition is correct and that this model is known as an AR(2)
model. Considering this, we simulate two cases where, in the first, the
residuals are actually \emph{iid} Gaussian while, in the second, the
residuals are Gaussian but are dependent over time. In the first case,
the only parameters that explain \(X_t\) are \(\beta = 5 \cdot 10^{-3}\)
and \(\sigma^2 = 1\) since the residuals \(Y_t\) are \emph{iid} (i.e.
\(\phi_1 = \phi_2 = 0\)). In the second case however, aside from the
mentioned parameters we also have \(\phi_1 = 0.8897\),
\(\phi_2 = -0.4858\). In both cases, we perform the hypothesis test:
\[
\begin{aligned}
\text{H}_0:& \;\;\; \beta = 0\\
\text{H}_1:& \;\;\; \beta > 0
\end{aligned}
\] as our hope is to prove, similarly to the global temperature
deviation example, that \(f(t)\) is an increasing function. Our syntetic
data are simulated as follows:
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Set seed for reproducibility}
\KeywordTok{set.seed}\NormalTok{(}\DecValTok{9}\NormalTok{)}
\CommentTok{# Define sample size}
\NormalTok{n =}\StringTok{ }\DecValTok{100}
\CommentTok{# Define beta}
\NormalTok{beta =}\StringTok{ }\FloatTok{0.005}
\CommentTok{# Define sigma2}
\NormalTok{sigma2 =}\StringTok{ }\DecValTok{1}
\CommentTok{# Simulation of Yt}
\NormalTok{Yt_case1 =}\StringTok{ }\KeywordTok{gen_gts}\NormalTok{(}\KeywordTok{WN}\NormalTok{(}\DataTypeTok{sigma2 =}\NormalTok{ sigma2), }\DataTypeTok{n =}\NormalTok{ n)}
\NormalTok{Yt_case2 =}\StringTok{ }\KeywordTok{gen_gts}\NormalTok{(}\KeywordTok{AR}\NormalTok{(}\DataTypeTok{phi =} \KeywordTok{c}\NormalTok{(}\FloatTok{0.95}\NormalTok{, }\OperatorTok{-}\FloatTok{0.5}\NormalTok{), }\DataTypeTok{sigma2 =}\NormalTok{ sigma2), }\DataTypeTok{n =}\NormalTok{ n)}
\CommentTok{# Define explanatory variable (time)}
\NormalTok{time =}\StringTok{ }\DecValTok{1}\OperatorTok{:}\NormalTok{n}
\CommentTok{# Simulation of Xt}
\NormalTok{Xt_case1 =}\StringTok{ }\NormalTok{beta}\OperatorTok{*}\NormalTok{time }\OperatorTok{+}\StringTok{ }\NormalTok{Yt_case1}
\NormalTok{Xt_case2 =}\StringTok{ }\NormalTok{beta}\OperatorTok{*}\NormalTok{time }\OperatorTok{+}\StringTok{ }\NormalTok{Yt_case2}
\CommentTok{# Fit a linear models}
\NormalTok{model1 <-}\StringTok{ }\KeywordTok{lm}\NormalTok{(Xt_case1 }\OperatorTok{~}\StringTok{ }\NormalTok{time }\OperatorTok{+}\StringTok{ }\DecValTok{0}\NormalTok{)}
\NormalTok{model2 <-}\StringTok{ }\KeywordTok{lm}\NormalTok{(Xt_case2 }\OperatorTok{~}\StringTok{ }\NormalTok{time }\OperatorTok{+}\StringTok{ }\DecValTok{0}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
The ``summary'' of our model on the first dataset is given by
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{summary}\NormalTok{(model1)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
##
## Call:
## lm(formula = Xt_case1 ~ time + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.5985 -0.7023 -0.1398 0.4444 2.7098
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## time 0.003930 0.001647 2.386 0.019 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9583 on 99 degrees of freedom
## Multiple R-squared: 0.05436, Adjusted R-squared: 0.04481
## F-statistic: 5.691 on 1 and 99 DF, p-value: 0.01895
\end{verbatim}
As can be seen, in the first case the estimated slope (\(\approx\)
0.004) is close to the true slope (0.005) and is significant (i.e.~the
p-value is smaller than the common rejection level 0.05) since the
p-value of the above mentioned test is given by 0.0095. Hence, from this
inference procedure we can conclude at the 5\% significance level that
the slope is significantly larger than zero and is roughly equal to
0.004 (which is relatively close to the truth). However, let us perform
the same analysis when the residuals are not independent (the second
case) by examining its ``summary'':
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{summary}\NormalTok{(model2)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
##
## Call:
## lm(formula = Xt_case2 ~ time + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6916 -1.1184 0.2323 1.1253 2.6198
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## time 0.0009877 0.0026435 0.374 0.709
##
## Residual standard error: 1.538 on 99 degrees of freedom
## Multiple R-squared: 0.001408, Adjusted R-squared: -0.008679
## F-statistic: 0.1396 on 1 and 99 DF, p-value: 0.7095
\end{verbatim}
In this case we can observe that the p-value of the above mentioned test
is given by 0.3547 and is therefore greater than the arbitrary value of
0.05. Consequently, we don't have evidence to conclude that the slope
coefficient is larger than zero (i.e.~we fail to reject H\(_0\))
although it is actually so in reality. Therefore, the inference
procedures can be misleading when not taking into account other possible
significant variables or, in this case, forms of dependence that can
hide true underlying effects. The above is only one example and there
are therefore cases where, despite dependence in the residuals, the
estimated slope would be deemed significant even when not considering
this dependence structure. However, if we decided to repeat this
experiment using a larger quantity of simulated samples, we would
probably see that we fail to reject the null hypothesis much more
frequently in the case where we don't consider dependence when there
actually is.
These examples therefore highlight how the approach to analysing time
series does not only rely on finding an appropriate model that describes
the evolution of a variable as a function of time (which is
deterministic). Indeed, one of the main focuses of time series analysis
consists in modelling the dependence structure that describes how random
variables impact each other as a function of time. In other words, a
time series is a collection of random variables whose interaction and
dependence structure is indexed by time. Based on this structure, one of
the main goals of time series analysis is to correctly estimate the
dependence mechanism and consequently deliver forecasts that are as
accurate as possible considering the deterministic functions of time
(and other variables) as well as the random dependence structure.
\section{The Wold Decomposition}\label{the-wold-decomposition}
The previous discussion highlighted how a time series can be decomposed
into a deterministic component and a random component. Leaving aside
technical rigour, this characteristic of time series was put forward in
Wold's Decomposition Theorem who postulated that a time series \((Y_t)\)
(where \(t = 1,...,n\) represents the time index) can be very
generically represented as follows:
\[Y_t = D_t + W_t,\]
where \(D_t\) represents the deterministic part (or \emph{signal}) that
can be modelled through the standard modelling techniques (e.g.~linear
regression) and \(W_t\) that, restricting ourselves to a general class
of processes, represents the random part (\emph{noise}) that requires
the analytical and modelling approaches that will be tackled in this
book.
Typically, we have \(\mathbb{E}[Y_t] \neq 0\) while
\(\mathbb{E}[W_t] = 0\) (although we may have
\(\mathbb{E}[W_t | W_{t-1}, ..., W_1] \neq 0\)). Such models impose some
parametric structure which represents a convenient and flexible way of
studying time series as well as a means to evaluate \emph{future} values
of the series through forecasting. As we will see, predicting future
values is one of the main aspects of time series analysis. However,
making predictions is often a daunting task or as famously stated by
Nils Bohr:
\begin{quote}
``\emph{Prediction is very difficult, especially about the future.}''
\end{quote}
There are plenty of examples of predictions that turned out to be
completely erroneous. For example, three days before the 1929 crash,
Irving Fisher, Professor of Economics at Yale University, famously
predicted:
\begin{quote}
``\emph{Stock prices have reached what looks like a permanently high
plateau}''.
\end{quote}
Another example is given by Thomas Watson, president of IBM, who said in
1943:
\begin{quote}
``\emph{I think there is a world market for maybe five computers.}''
\end{quote}
Let us now briefly discuss the two components of a time series.
\subsection{The Deterministic Component
(Signal)}\label{the-deterministic-component-signal}
Before shifting our focus to the random component of time series, we
will first just underline the main features that should be taken into
account for the deterministic component. The first feature that should
be analysed is the \emph{trend} that characterises the time series, more
specifically the behaviour of the variable of interest as a specific
function of time (as the global temperature time series seen earlier).
Let us consider another example borrowed from \citet{shumway2010time} of
time series based on real data, i.e.~the quarterly earnings of Johnson
\& Johnson between 1960 and 1980 represented below.
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Load data}
\KeywordTok{data}\NormalTok{(jj, }\DataTypeTok{package =} \StringTok{"astsa"}\NormalTok{)}
\CommentTok{# Construct gts object}
\NormalTok{jj =}\StringTok{ }\KeywordTok{gts}\NormalTok{(jj, }\DataTypeTok{start =} \DecValTok{1960}\NormalTok{, }\DataTypeTok{freq =} \DecValTok{4}\NormalTok{, }\DataTypeTok{unit_ts =} \StringTok{"$"}\NormalTok{, }\DataTypeTok{name_ts =} \StringTok{"Quarterly Earnings per Share"}\NormalTok{, }\DataTypeTok{data_name =} \StringTok{"Johnson & Johnson Quarterly Earnings"}\NormalTok{)}
\CommentTok{# Plot time series}
\KeywordTok{plot}\NormalTok{(jj)}
\end{Highlighting}
\end{Shaded}
\begin{center}\includegraphics{ts_files/figure-latex/jjexample-1} \end{center}
As can be seen from the plot, the earnings appear to grow over time,
therefore we can imagine fitting a straight line to this data to
describe its behaviour by considering the following model:
\begin{equation}
X_t = \alpha + \beta t + \varepsilon_t,
\label{eq:modeljjexample}
\end{equation}
where \(\varepsilon_t\) is iid Gaussian. The results are presented in
the graph below:
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Fit linear regression}
\NormalTok{time_jj =}\StringTok{ }\KeywordTok{gts_time}\NormalTok{(jj)}
\NormalTok{fit_jj1 =}\StringTok{ }\KeywordTok{lm}\NormalTok{(}\KeywordTok{as.vector}\NormalTok{(jj) }\OperatorTok{~}\StringTok{ }\NormalTok{time_jj)}
\CommentTok{# Plot results and add regression line}
\KeywordTok{plot}\NormalTok{(jj)}
\KeywordTok{lines}\NormalTok{(time_jj, }\KeywordTok{predict}\NormalTok{(fit_jj1), }\DataTypeTok{col =} \StringTok{"red"}\NormalTok{)}
\KeywordTok{legend}\NormalTok{(}\StringTok{"bottomright"}\NormalTok{, }\KeywordTok{c}\NormalTok{(}\StringTok{"Time series"}\NormalTok{, }\StringTok{"Regression line"}\NormalTok{), }
\DataTypeTok{col =} \KeywordTok{c}\NormalTok{(}\StringTok{"blue4"}\NormalTok{, }\StringTok{"red"}\NormalTok{), }\DataTypeTok{bty =} \StringTok{"n"}\NormalTok{, }\DataTypeTok{lwd =} \DecValTok{1}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\begin{center}\includegraphics{ts_files/figure-latex/jjexample2-1} \end{center}
Although the line captures a part of the behaviour, it is quite clear
that the trend of the time series is not linear as can be observed from
the diagnotic plot below:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{check}\NormalTok{(fit_jj1, }\DataTypeTok{simple =} \OtherTok{TRUE}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## Warning in check(fit_jj1, simple = TRUE): If 'lm' model is considered, only
## the full diagnostic plots can be provided, not the simple version.
\end{verbatim}
\begin{center}\includegraphics{ts_files/figure-latex/lmresid-1} \end{center}
It could therefore be more appropriate to define another function of
time to describe it and, consequently, we add a quadratic term of time
to obtain the following fit. Therefore, the model considered in
\eqref{eq:modeljjexample} becomes:
\begin{equation}
X_t = \alpha + \beta_1 t + \beta_2 t^2 + \varepsilon_t,
\label{eq:modeljjexample2}
\end{equation}
The results of this regression are presented on the graphs below:
\begin{center}\includegraphics{ts_files/figure-latex/unnamed-chunk-16-1} \end{center}
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{check}\NormalTok{(fit_jj2, }\DataTypeTok{simple =} \OtherTok{TRUE}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## Warning in check(fit_jj2, simple = TRUE): If 'lm' model is considered, only
## the full diagnostic plots can be provided, not the simple version.
\end{verbatim}
\begin{center}\includegraphics{ts_files/figure-latex/lmresid2-1} \end{center}
We can see now that the quadratic function of time allows to better fit
the observed time series and closely follow the observations. However,
there still appears to be a pattern in the data that isn't captured by
this quadratic model. This pattern appears to be repeated over time:
peaks and valleys that seem to occur at regular intervals along the time
series. This behaviour is known as \emph{seasonality} which, in this
case, can be explained by the effect of a specific quarter on the
behaviour of the earnings. Indeed, it is reasonable to assume that the
seasons have impacts on different variables measured over time
(e.g.~temperatures, earnings linked to sales that vary with seasons,
etc.). Let us therefore take the quarters as an explanatory variable and
add it to the model considered in \eqref{eq:modeljjexample2}, which
becomes:
\begin{equation}
X_t = \alpha + \beta_1 t + \beta_2 t^2 + \sum_{i = 1}^4 \gamma_i I_{t \in \mathcal{A}_i} + \varepsilon_t,
\label{eq:modeljjexample3}
\end{equation}
where
\begin{equation*}
I_{t \in \mathcal{A}} \equiv \left\{
\begin{array}{ll}
1 & \mbox{if } t \in \mathcal{A} \\
0 & \mbox{if } t \not\in \mathcal{A}
\end{array}
\right. ,
\end{equation*}
and where
\[
\mathcal{A}_i \equiv \left\{x \in \mathbb{N} | x = i \; \text{mod} \; 4\right\}.
\]
The results are presented below:
\begin{center}\includegraphics{ts_files/figure-latex/unnamed-chunk-17-1} \end{center}
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{check}\NormalTok{(fit_jj3, }\DataTypeTok{simple =} \OtherTok{TRUE}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## Warning in if (class(model) == "fitsimts") {: the condition has length > 1
## and only the first element will be used
\end{verbatim}
\begin{verbatim}
## Warning in check(fit_jj3, simple = TRUE): If 'lm' model is considered, only
## the full diagnostic plots can be provided, not the simple version.
\end{verbatim}
\begin{center}\includegraphics{ts_files/figure-latex/lmresid3-1} \end{center}
This final fit appears to well describe the behaviour of the earnings
although there still appears to be a problem of heteroskedasticity
(i.e.~change in variance) and random seasonality (both of which will be
treated further on in this text). Hence, \emph{trend} and
\emph{seasonality} are the main features that characterize the
deterministic component of a time series. However, as discussed earlier,
these deterministic components often don't explain all of the observed
time series since there is often a random component characterizing data
measured over time. Not considering the latter component can have
considerable impacts on the inference procedures (as seen earlier) and
it is therefore important to adequately analyse them (see next section).
\subsection{The Random Component
(Noise)}\label{the-random-component-noise}
From this section onwards we will refer to \emph{time series as being
solely the random noise component}. Keeping this in mind, a \emph{time
series} is a particular kind of \emph{stochastic process} which,
generally speaking, is a collection of random variables indexed by a set
of numbers. Not surprisingly, the index of reference for a time series
is given by \emph{time} and, consequently, a time series is a collection
of random variables indexed (or ``measured'') over time such as, for
example, the daily price of a financial asset or the monthly average
temperature in a given location. In terms of notation, a time series is
often represented as
\[\left(X_1, X_2, ..., X_T \right) \;\;\; \text{ or } \;\;\; \left(X_t\right)_{t = 1,...,T}.\]
The time index \(t\) is contained within either the set of reals,
\(\mathbb{R}\), or integers, \(\mathbb{Z}\). When \(t \in \mathbb{R}\),
the time series becomes a \emph{continuous-time} stochastic process such
as a Brownian motion, a model used to represent the random movement of
particles within a suspended liquid or gas. However, within this book,
we will limit ourselves to the cases where \(t \in \mathbb{Z}\), better
known as \emph{discrete-time} processes. Discrete-time processes are
measured sequentially at fixed and equally spaced intervals in time.
This implies that we will uphold two general assumptions for the time
series considered in this book:
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\tightlist
\item
\(t\) is not random, e.g.~the time at which each observation is
measured is known, and
\item
the time between two consecutive observations is constant.
\end{enumerate}
This book will also focus on certain representations of time series
based on parametric probabilistic models. For example, one of the
fundamental probability models used in time series analysis is called
the \emph{white noise} model and is defined as
\[X_t \mathop \sim \limits^{iid} N(0, \sigma^2).\]
This statement simply means that \((X_t)\) is normally distributed and
independent over time. Ideally, this is the type of process that we
would want to observe once we have performed a statistical modelling
procedure. However, despite it appearing to be an excessively simple
model to be considered for time series, it is actually a crucial
component to construct a wide range of more complex time series models
(see Chapter \ref{fundtimeseries}). Indeed, unlike the white noise
process, time series are typically \emph{not} independent over time. For
example, if we suppose that the temperature in State College is
unusually low on a given day, then it is reasonable to assume that the
temperature the day after will also be low.
With this in mind, let us now give a quick overview of the information
that can be retrieved on a time series from a simple descriptive
representation.
\section{Exploratory Data Analysis for Time Series}\label{eda}
When dealing with relatively small time series (e.g.~a few thousands or
less), it is often useful to look at a graph of the original data. A
graph can be an informative tool for ``detecting'' some features of a
time series such as trends and the presence of outliers. This is indeed
what was done in the previous paragraphs when analysing the global
temperature data or the Johnson \& Johnson data.
To go more in depth with respect to the previous paragraphs, a trend is
typically assumed to be present in a time series when the data exhibit
some form of long term increase or decrease or combination of increases
or decreases. Such trends could be linear or non-linear and represent an
important part of the ``signal'' of a model (as seen for the Johnson \&
Johnson time series). Here are a few examples of non-linear trends:
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\item
\textbf{Seasonal trends} (periodic): These are the cyclical patterns
which repeat after a fixed/regular time period. This could be due to
business cycles (e.g.~bust/recession, recovery).
\item
\textbf{Non-seasonal trends} (periodic): These patterns cannot be
associated to seasonal variation and can for example be due to an
external variable such as, for example, the impact of economic
indicators on stock returns. Note that such trends are often hard to
detect based on a graphical analysis of the data.
\item
\textbf{``Other'' trends}: These trends have typically no regular
patterns and are over a segment of time, known as a ``window'', that
change the statistical properties of a time series. A common example
of such trends is given by the vibrations observed before, during and
after an earthquake.
\end{enumerate}
Moreover, when observing ``raw'' time series data it is also interesting
to evaluate if some of the following phenomena occur:
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\tightlist
\item
\textbf{Change in Mean:} Does the mean of the process shift over time?
\item
\textbf{Change in Variance:} Does the variance of the process evolve
with time?
\item
\textbf{Change in State:} Does the time series appear to change
between ``states'' having distinct statistical properties?
\item
\textbf{Outliers} Does the time series contain some ``extreme''
observations? (Note that this is typically difficult to assess
visually.)
\end{enumerate}
\BeginKnitrBlock{example}
\protect\hypertarget{exm:earthquake}{}{\label{exm:earthquake} }In the figure
below, we present an example of displacement recorded during an
earthquake as well as an explosion.
\EndKnitrBlock{example}
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{data}\NormalTok{(EQ5, }\DataTypeTok{package =} \StringTok{"astsa"}\NormalTok{)}
\KeywordTok{data}\NormalTok{(EXP6, }\DataTypeTok{package =} \StringTok{"astsa"}\NormalTok{)}
\CommentTok{# Construct gts object}
\NormalTok{eq5 <-}\StringTok{ }\KeywordTok{gts}\NormalTok{(EQ5, }\DataTypeTok{start =} \DecValTok{0}\NormalTok{, }\DataTypeTok{freq =} \DecValTok{1}\NormalTok{, }\DataTypeTok{unit_ts =} \StringTok{"p/s"}\NormalTok{, }\DataTypeTok{name_ts =} \StringTok{"Earthquake Arrival Phases"}\NormalTok{, }\DataTypeTok{data_name =} \StringTok{"Earthquake Arrival Phases"}\NormalTok{)}
\NormalTok{exp6 <-}\StringTok{ }\KeywordTok{gts}\NormalTok{(EXP6, }\DataTypeTok{start =} \DecValTok{0}\NormalTok{, }\DataTypeTok{freq =} \DecValTok{1}\NormalTok{, }\DataTypeTok{unit_ts =} \StringTok{"p/s"}\NormalTok{, }\DataTypeTok{name_ts =} \StringTok{"Explosion Arrival Phases"}\NormalTok{, }\DataTypeTok{data_name =} \StringTok{"Explosion Arrival Phases"}\NormalTok{)}
\CommentTok{# Plot time series}
\KeywordTok{plot}\NormalTok{(eq5)}
\end{Highlighting}
\end{Shaded}
\begin{center}\includegraphics{ts_files/figure-latex/example_EQ-1} \end{center}
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{plot}\NormalTok{(exp6)}
\end{Highlighting}
\end{Shaded}
\begin{center}\includegraphics{ts_files/figure-latex/example_EQ-2} \end{center}
From the graph, it can be observed that the statistical properties of
the time series appear to change over time. For instance, the variance
of the time series shifts at around \(t = 1150\) for both series. The
shift in variance also opens ``windows'' where there appear to be
distinct states. In the case of the explosion data, this is particularly
relevant around \(t = 50, \cdots, 250\) and then again from
\(t = 1200, \cdots, 1500\). Even within these windows, there are
``spikes'' that could be considered as outliers most notably around
\(t = 1200\) in the explosion series.
Extreme observations or outliers are commonly observed in real time
series data, this is illustrated in the following example.
\BeginKnitrBlock{example}
\protect\hypertarget{exm:precipitation}{}{\label{exm:precipitation} }We
consider here a data set coming from the domain of hydrology. The data
concerns monthly precipitation (in mm) over a certain period of time
(1907 to 1972) and is interesting for scientists in order to study water
cycles. The data are presented in the graph below:
\EndKnitrBlock{example}
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Load hydro dataset}
\KeywordTok{data}\NormalTok{(}\StringTok{"hydro"}\NormalTok{)}
\CommentTok{# Simulate based on data}
\NormalTok{hydro =}\StringTok{ }\KeywordTok{gts}\NormalTok{(}\KeywordTok{as.vector}\NormalTok{(hydro), }\DataTypeTok{start =} \DecValTok{1907}\NormalTok{, }\DataTypeTok{freq =} \DecValTok{12}\NormalTok{, }\DataTypeTok{unit_ts =} \StringTok{"in."}\NormalTok{, }
\DataTypeTok{name_ts =} \StringTok{"Precipitation"}\NormalTok{, }\DataTypeTok{data_name =} \StringTok{"Hydrology data"}\NormalTok{)}
\CommentTok{# Plot hydro }
\KeywordTok{plot}\NormalTok{(hydro)}
\end{Highlighting}
\end{Shaded}
\begin{center}\includegraphics{ts_files/figure-latex/example_hydro-1} \end{center}
We can see how most observations lie below 2mm but there appear to be
different observations that go beyond this and appear to be larger than
the others. These could be possible outliers that can greatly affect the
estimation procedure if not taken adequately into account.
Next, we consider an example coming from high-frequency finance. The
figure below presents the returns or price innovations (i.e.~the changes
in price from one observation to the next) for Starbuck's stock on July
1, 2011 for about 150 seconds (left panel) and about 400 minutes (right
panel).
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Load "high-frequency" Starbucks returns for July 01 2011}
\KeywordTok{data}\NormalTok{(sbux.xts, }\DataTypeTok{package =} \StringTok{"highfrequency"}\NormalTok{)}
\CommentTok{# Plot returns}
\KeywordTok{par}\NormalTok{(}\DataTypeTok{mfrow =} \KeywordTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{,}\DecValTok{2}\NormalTok{))}
\KeywordTok{plot}\NormalTok{(}\KeywordTok{gts}\NormalTok{(sbux.xts[}\DecValTok{1}\OperatorTok{:}\DecValTok{89}\NormalTok{]), }
\DataTypeTok{main =} \StringTok{"Starbucks: 150 Seconds"}\NormalTok{, }
\DataTypeTok{ylab =} \StringTok{"Returns"}\NormalTok{) }
\KeywordTok{plot}\NormalTok{(}\KeywordTok{gts}\NormalTok{(sbux.xts), }
\DataTypeTok{main =} \StringTok{"Starbucks: 400 Minutes"}\NormalTok{, }
\DataTypeTok{ylab =} \StringTok{"Returns"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}