Home > Net >  How do I convert a data frame dataset to time series?
How do I convert a data frame dataset to time series?

Time:01-26

I seem to have some trouble converting my data frame data into a time series. I have a typical data set consisting of date, export quantity, GDP, FDI etc.

# A tibble: 252 x 10
   Date                `Maize Exports (m/t)` `Rainfall (mm)` `Temperature ©` `Exchange rate (R/$)` `Maize price (R)` `FDI (Million R)` GDP (Million~1 Oil p~2 Infla~3
   <dttm>                              <dbl>           <dbl>           <dbl>                 <dbl>             <dbl>             <dbl>          <dbl>   <dbl>   <dbl>
 1 2000-05-01 00:00:00                 21000            30.8            14.4                0.144               678.              4337           9056    192.     5.1
 2 2000-06-01 00:00:00                 54000            14.9            14.0                0.147               583.             -4229           9056    205.     5.1
 3 2000-07-01 00:00:00                134000            11.1            12.6                0.144               518.             -4229           8841    196.     5.9
 4 2000-08-01 00:00:00                213000             6.1            15.3                0.143               526.             -4229           8841    205.     6.8
 5 2000-09-01 00:00:00                123000            38.5            17.8                0.138               576.              6315           8841    234.     6.8
 6 2000-10-01 00:00:00                 94000            61.9            20.1                0.132               636.              6315           4487    231.     7.1
 7 2000-11-01 00:00:00                192000            93.9            19.9                0.129               685.              6315           4487    250.     7.1
 8 2000-12-01 00:00:00                134000            85.6            22.3                0.132               747.             -2143           4487    192.     7  
 9 2001-01-01 00:00:00                133000            92.4            23.4                0.0875             1066.             -5651           7365    226.     5  
10 2001-02-01 00:00:00                168000            51              22.0                0.0879             1042.             -5651           7365    233.     5.9

I've installed the right packages (readxl), I've used the as.Date function so ensure my Date is recognized as such, and I've used the as.ts function to convert the dataset. However, after using the as.ts function, the date column is all muddled up into a random number and not a date anymore. What am I doing wrong? Please help!

   Date Maize Exports (m/t) Rainfall (mm) Temperature © Exchange rate (R/$) Maize price (R) FDI (Million R) GDP (Million R) Oil prices (R/barrel)
[1,] 957139200               21000          30.8         14.36           0.1435235          677.88            4337            9056                192.35
[2,] 959817600               54000          14.9         13.96           0.1474926          583.48           -4229            9056                205.36
[3,] 962409600              134000          11.1         12.61           0.1437298          518.10           -4229            8841                196.38
[4,] 965088000              213000           6.1         15.27           0.1433075          525.59           -4229            8841                204.66
[5,] 967766400              123000          38.5         17.83           0.1382170          576.08            6315            8841                233.64
[6,] 970358400               94000          61.9         20.10           0.1322751          635.79            6315            4487                231.27

CodePudding user response:

In short nothing is wrong - and while this response should really be a comment, I wanted to use a full answer to have a bit more space to explain.

Behind each date is a numeric value tethered to an origin, so this is just R's way of handling it. And since you imported from excel originally, those origins may not line up if you tried to cross check it (see below).

You didn't make your question reproducible, but I put some similar data together to demonstrate what's going on:

Data

df <- data.frame(date = as.Date(c("2000-05-01", 
                          "2000-06-01",
                          "2000-07-01",
                          "2000-08-01",
                          "2000-09-01",
                          "2000-10-01",
                          "2000-11-01")),
                 maize = c(21, 54, 132, 213, 123, 94, 192) * 1000,
                 rainfall = c(30, 14, 11, 6, 38, 61, 93))
tb <- tidyr::as_tibble(df)

Turning this into a time series object using as.ts()

tb_ts <- as.ts(tb)

# Time Series:
# Start = 1 
# End = 7 
# Frequency = 1 
#    date  maize rainfall
# 1 11078  21000       30
# 2 11109  54000       14
# 3 11139 132000       11
# 4 11170 213000        6
# 5 11201 123000       38
# 6 11231  94000       61
# 7 11262 192000       93

Since I created these data in R, the "origin" is January 1, 1970, and we can see this in numerical dates from the time series object and convert them back into date formats:

as.Date(tb_ts[1:7], origin = '1970-01-01')

# [1] "2000-05-01" "2000-06-01" "2000-07-01" "2000-08-01"
# [5] "2000-09-01" "2000-10-01" "2000-11-01"

Note that if you import data from Excel, Excel's origin is December 30th, 1899 (i.e., as.Date(xx, origin = "1899-12-30")), so if you tried that you get the wrong dates:

as.Date(tb_ts[1:7], origin = "1899-12-30")

# [1] "1930-04-30" "1930-05-31" "1930-06-30" "1930-07-31"
# [5] "1930-08-31" "1930-09-30" "1930-10-31

CodePudding user response:

The function worked as it's supposed to. Keeping the date format you're familiar with isn't practical for execution, so it converts the dates to a different value, usually something like the number of days (or minutes or seconds) since a certain year, usually Jan. 1 1970. For example, here is a little set to make the point:

# a test vector of dates
> del1 <- seq(as.Date("2012-04-01"), length.out=4, by=30)
# looks like
> del1
[1] "2012-04-01" "2012-05-01" "2012-05-31" "2012-06-30"
# use the as.ts
> as.ts(del1)
Time Series:
Start = 1 
End = 4 
Frequency = 1 
[1] 15431 15461 15491 15521

So you can see the dates, which are 30 days apart, are converted to a series of values that are 30 integers apart.

  • Related