Home > Software design >  advanced reshaping / pivoting in a r dataframe
advanced reshaping / pivoting in a r dataframe

Time:07-08

I am struggling to reshape a dataframe in R. My starting point is a dataframe, which has the following structure:

df_given <- data.frame (first_column  = c("NA", "NA", "NA", "Country1", "Country2", "Country3"),
               second_column  = c("Consumption", "real", "2021", 10, 11, 23),
              third_column = c("Consumption", "real", "2022", 20, 22, 12),
              fourth_column = c("Inflation", "expected", "2021", 1, 1.2, 2.5),
              fifth_column = c("Inflation", "expected", "2022", 5, 3, 2)

)

Now my problem is the following: I would like to have the 2021 and 2022 only as two columns, instead of repeating the sequence for two times. This therefore involves transforming the "description" of this time series (e.g. consumption real and inflation expected) from a row to a column. For this reason, my final target dataframe would look somehow like this:

 df_target <- data.frame (first_column = c("type", "Country1 Consumption real", "Country2 Consumption real", 
                                "Country3 Consumption real", "Country1 Inflation expected", 
                                "Country2 Inflation expected","Country3 Inflation expected"),
               second_column = c(2021, 10, 11, 23, 1, 1.2, 2.5),
               third_column = c(2022, 20, 22, 12, 5, 3, 2)

)

I assume that pivoting to wider or longer would do the trick. However my problem is, that I can't really tell if my current dataframe is actually in long or wide format, because I think it is kind of both. Can anyone tell me how to approach this problem? Thanks in advance

CodePudding user response:

You can use data table, after dropping the extra info in the first couple of rows which aren't really data.

names(df_given) <- c("country","Real C 2021", "Real C 2022", "Inf 2021", "Inf 2022")
df_given <- df_given[-c(1:3),]
library(data.table)
setDT(df_given)
melt(df_given, measure = patterns("^Real C","^Inf"), value.name = c("2021","2022"))

    country variable 2021 2022
1: Country1        1   10    1
2: Country2        1   11  1.2
3: Country3        1   23  2.5
4: Country1        2   20    5
5: Country2        2   22    3
6: Country3        2   12    2

Documentation

CodePudding user response:

Easiest way is manual, using Base R:

# Transpose: ir => data.frame
ir <- data.frame(t(df_given))

# Derive metrics: ir2 => character vector
ir2 <- apply(ir[,1:3], 1, paste, collapse = " ")[-1]

# Derive countries: ir3 => character vector
ir3 <- unlist(ir[1,4:ncol(ir), drop = TRUE])

# Derive values: ir4 => data.frame
ir4 <- unlist(ir[2:nrow(ir), 4:ncol(ir)])

# Reshape into long df: ir5 => data.frame
ir5 <- within(
  data.frame(
    cbind(
      stat = ir2, 
      country = rep(ir3, each = length(ir2)),
      val = ir4
    ),
    row.names = NULL
  ),
  {
    year <- substring(stat, nchar(stat)-4)
    stat <- trimws(gsub(paste0(year, collapse = "|"), "", stat))
  }
)

# Pivot: data.frame => stdout(console)
reshape(
  ir5, 
  idvar=c("country", "stat"),
  timevar="year", 
  v.names="val", 
  direction="wide"
)
  • Related