I have a data frame like this:
structure(list(...1 = c(NA, NA, "name_var1", "obs1_var1", "obs2_var1"
), ...2 = c(NA, NA, "name_var2", "obs1_var2", "obs2_var2"), ...3 = c(NA,
NA, "name_var3", "obs1_var3", "obs2_var3"), ...4 = c("Dimension",
"Subdimension", "name_var4", "obs1_var4", "obs2_var4"), ...5 = c("Dimension1",
"Subdimension1", "question1.1.1", "1", "4"), ...6 = c("Dimension1",
"Subdimension1", "question1.1.2", "3", "2"), ...5.1 = c("Dimension1",
"Subdimension2", "question1.2.1", "1", "2"), ...5.2 = c("Dimension1",
"Subdimension2", "question1.2.2", "4", "1"), ...5.3 = c("Dimension2",
"Subdimension1", "question2.1.1", "1", "4"), ...6.1 = c("Dimension2",
"Subdimension1", "question2.1.2", "3", "2"), ...5.4 = c("Dimension2",
"Subdimension2", "question2.2.1", "1", "2"), ...5.5 = c("Dimension2",
"Subdimension2", "question2.2.2", "4", "1")), class = "data.frame", row.names = c(NA,
-5L))
and I'd like to transform into this one
structure(list(name_var1 = c("obs1_var1", "obs1_var1", "obs1_var1",
"obs1_var1", "obs1_var1", "obs1_var1", "obs1_var1", "obs1_var1"
), name_var2 = c("obs1_var2", "obs1_var2", "obs1_var2", "obs1_var2",
"obs1_var2", "obs1_var2", "obs1_var2", "obs1_var2"), name_var3 = c("obs1_var3",
"obs1_var3", "obs1_var3", "obs1_var3", "obs1_var3", "obs1_var3",
"obs1_var3", "obs1_var3"), name_var4 = c("obs1_var4", "obs1_var4",
"obs1_var4", "obs1_var4", "obs1_var4", "obs1_var4", "obs1_var4",
"obs1_var4"), Dimension = c("Dimension1", "Dimension1", "Dimension1",
"Dimension1", "Dimension2", "Dimension2", "Dimension2", "Dimension2"
), Subdimension = c("Subdimension1", "Subdimension1", "Subdimension2",
"Subdimension2", "Subdimension1", "Subdimension1", "Subdimension2",
"Subdimension2"), Question = c("question1.1.1", "question1.1.2",
"question1.2.1", "question1.2.2", "question2.1.1", "question2.1.2",
"question2.2.1", "question2.2.2"), Value = c(1, 3, 1, 4, 1, 3,
1, 4)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-8L))
...And continuing with all observations there are in the original dataframe. Any idea how to do this?
Thank you in advance for your comments and assistance.
CodePudding user response:
If you have to do a lot of this kind of data transformation, then I recommend the unpivotr
package. There is a related package called tidyxl
that is useful for reading Excel files in a raw, cell-by-cell way. This can be useful when column headers are in text, but the cells below are numbers, dates, logical, etc. Also, you can even use formatting information which is sometimes necessary for properly extracting information from an xlsx
file.
The package author made a free online book Spreadsheet Munging Strategies that goes through many situations. In your case, you could use
library(tidyverse)
library(unpivotr)
start %>%
as_cells() %>%
behead("up", Dimension) %>%
behead("up", Subdimension) %>%
behead("up", Question) %>%
behead("left", name_var1) %>%
behead("left", name_var2) %>%
behead("left", name_var3) %>%
behead("left", name_var4) %>%
select(name_var1:name_var4, Dimension:Question, Value = chr)
where start
is your starting data frame. Actually, it's best to read the raw excel file using the tidyxl
package, but it's not necessary.