Home > front end >  Reshaping a non column name dataframe in r
Reshaping a non column name dataframe in r

Time:05-18

I have a data frame like this:

structure(list(...1 = c(NA, NA, "name_var1", "obs1_var1", "obs2_var1"
), ...2 = c(NA, NA, "name_var2", "obs1_var2", "obs2_var2"), ...3 = c(NA, 
NA, "name_var3", "obs1_var3", "obs2_var3"), ...4 = c("Dimension", 
"Subdimension", "name_var4", "obs1_var4", "obs2_var4"), ...5 = c("Dimension1", 
"Subdimension1", "question1.1.1", "1", "4"), ...6 = c("Dimension1", 
"Subdimension1", "question1.1.2", "3", "2"), ...5.1 = c("Dimension1", 
"Subdimension2", "question1.2.1", "1", "2"), ...5.2 = c("Dimension1", 
"Subdimension2", "question1.2.2", "4", "1"), ...5.3 = c("Dimension2", 
"Subdimension1", "question2.1.1", "1", "4"), ...6.1 = c("Dimension2", 
"Subdimension1", "question2.1.2", "3", "2"), ...5.4 = c("Dimension2", 
"Subdimension2", "question2.2.1", "1", "2"), ...5.5 = c("Dimension2", 
"Subdimension2", "question2.2.2", "4", "1")), class = "data.frame", row.names = c(NA, 
-5L))

and I'd like to transform into this one

structure(list(name_var1 = c("obs1_var1", "obs1_var1", "obs1_var1", 
"obs1_var1", "obs1_var1", "obs1_var1", "obs1_var1", "obs1_var1"
), name_var2 = c("obs1_var2", "obs1_var2", "obs1_var2", "obs1_var2", 
"obs1_var2", "obs1_var2", "obs1_var2", "obs1_var2"), name_var3 = c("obs1_var3", 
"obs1_var3", "obs1_var3", "obs1_var3", "obs1_var3", "obs1_var3", 
"obs1_var3", "obs1_var3"), name_var4 = c("obs1_var4", "obs1_var4", 
"obs1_var4", "obs1_var4", "obs1_var4", "obs1_var4", "obs1_var4", 
"obs1_var4"), Dimension = c("Dimension1", "Dimension1", "Dimension1", 
"Dimension1", "Dimension2", "Dimension2", "Dimension2", "Dimension2"
), Subdimension = c("Subdimension1", "Subdimension1", "Subdimension2", 
"Subdimension2", "Subdimension1", "Subdimension1", "Subdimension2", 
"Subdimension2"), Question = c("question1.1.1", "question1.1.2", 
"question1.2.1", "question1.2.2", "question2.1.1", "question2.1.2", 
"question2.2.1", "question2.2.2"), Value = c(1, 3, 1, 4, 1, 3, 
1, 4)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-8L))

...And continuing with all observations there are in the original dataframe. Any idea how to do this?

Thank you in advance for your comments and assistance.

CodePudding user response:

If you have to do a lot of this kind of data transformation, then I recommend the unpivotr package. There is a related package called tidyxl that is useful for reading Excel files in a raw, cell-by-cell way. This can be useful when column headers are in text, but the cells below are numbers, dates, logical, etc. Also, you can even use formatting information which is sometimes necessary for properly extracting information from an xlsx file.

The package author made a free online book Spreadsheet Munging Strategies that goes through many situations. In your case, you could use

library(tidyverse)
library(unpivotr)

start %>% 
    as_cells() %>% 
    behead("up", Dimension) %>% 
    behead("up", Subdimension) %>%
    behead("up", Question) %>%
    behead("left", name_var1) %>% 
    behead("left", name_var2) %>% 
    behead("left", name_var3) %>% 
    behead("left", name_var4) %>% 
    select(name_var1:name_var4, Dimension:Question, Value = chr)

where start is your starting data frame. Actually, it's best to read the raw excel file using the tidyxl package, but it's not necessary.

  • Related