Home > Back-end >  How to transpose times series data in R so that the date-variable ends up as the new variables names
How to transpose times series data in R so that the date-variable ends up as the new variables names

Time:10-06

I have a huge times series dataset (over 500 variables) consisting of monthly observations for several years. See a simplified example of the input data below (in Excel):

simplified example of input_data

Now, what I need to do (in R) is to transpose this input data set so that the variable date ends up as the new variable names and the current variable names end up in the column date. See a simplified example of the desired output data below (in Excel):

desired output in R data frame

Note that is important that the dates in the variable names in the desired end result need to be in descending order (newest closest to the left).

I do manage to transpose the data using the function t(), however, the problem comes after that.

  1. First problem: The variable date is automatically set as rownames after transposing, which is fine, I just run the function rownames_to_column() to fix this. However, is there a way to avoid this maneuver?

  2. Second problem: All the new variable names end up in the first row, but when I run the function row_to_names(1) to change this the dates in the variable names end up weird. The only solution to this that I have made work (almost all the way) is to first convert all variables to character format before running row_to_names(1). After that I need to change my date-variables back to numeric format which I need to do in order to proceed with further calculations. However, when I do this on my real data set (which is the same just much bigger) I get the error message: "NAs introduced by coercion". Why does this happen? I can't seem to find out why. If you find a different approach to do this that is also fine.

# Example data to copy and paste into R for easy reproduction of problem:

df <- data.frame (date  = c("2018-11-30", "2018-12-31", "2019-01-31", "2019-02-28", "2019-03-31", "2019-04-30"),
fruit_apples = c(333, 337, 341, 345, 349, 353),
fruit_pears = c(86, 87, 88, 89, 90, 91),
fruit_grapes    = c(255, 258, 261, 264, 267, 270),
veggies_tomatoes    = c(2024, 2025, 2026, 2027, 2028, 2029),
veggies_carrots = c(92, 95, 98, 101, 104, 107),
veggies_cucumber    = c(52, 54, 56, 58, 60, 62),
pets_rabbit = c(25, 26, 27, 28, 29, 30),
pets_cat    = c(51, 53, 55, 57, 59, 61),
pets_dog    = c(117, 121, 125, 129, 133, 137))

# The code I have been using so far:

df_output <- df %>%
  arrange(desc(date)) %>% # after transposing I want the newest date to be closest to the left, hence this step
  t() %>%
  as.data.frame() %>%
  rownames_to_column() %>%
  mutate(across(everything(), as.character)) %>%
  row_to_names(1) %>%
  mutate_at(vars(-date), as.numeric) %>% # causes error message "NAs introduced by coercion"
  adorn_totals()

CodePudding user response:

I hope this is what you have in mind:

library(tidyr)

df %>%
  pivot_longer(!date, names_to = "Date", values_to = "value") %>%
  pivot_wider(names_from = date, values_from = value)

# A tibble: 9 x 7
  Date             `2018-11-30` `2018-12-31` `2019-01-31` `2019-02-28` `2019-03-31` `2019-04-30`
  <chr>                   <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
1 fruit_apples              333          337          341          345          349          353
2 fruit_pears                86           87           88           89           90           91
3 fruit_grapes              255          258          261          264          267          270
4 veggies_tomatoes         2024         2025         2026         2027         2028         2029
5 veggies_carrots            92           95           98          101          104          107
6 veggies_cucumber           52           54           56           58           60           62
7 pets_rabbit                25           26           27           28           29           30
8 pets_cat                   51           53           55           57           59           61
9 pets_dog                  117          121          125          129          133          137

CodePudding user response:

A data.table way -

library(data.table)

dcast(melt(setDT(df), id.vars = 'date'), variable~date, value.var = 'value')

#           variable 2018-11-30 2018-12-31 2019-01-31 2019-02-28 2019-03-31 2019-04-30
#1:     fruit_apples        333        337        341        345        349        353
#2:      fruit_pears         86         87         88         89         90         91
#3:     fruit_grapes        255        258        261        264        267        270
#4: veggies_tomatoes       2024       2025       2026       2027       2028       2029
#5:  veggies_carrots         92         95         98        101        104        107
#6: veggies_cucumber         52         54         56         58         60         62
#7:      pets_rabbit         25         26         27         28         29         30
#8:         pets_cat         51         53         55         57         59         61
#9:         pets_dog        117        121        125        129        133        137
  • Related