My csv file has two headers, like this:
Run #1,Run #1,Run #2,Run #2
Angle,Light,Angle,Light
a,b,c,d
e,f,g,h
The first header gives the measurement number, the second the type of measurement. I would like my data to look like this:
Run Angle Light
1 a b
1 e f
2 c d
2 g h
To read the table into R, I had to read in the headers separately using scan and then merge them into one a single header:
header <- scan(file, nlines = 1, sep=",",what = character())
header2 <- scan(file, skip = 1, nlines = 1, sep = ",", what = character())
df<- read.table(file, sep=",", header=F, skip=2)
names(df) <- paste(header, header2, sep = "_")
So I ended up with this:
structure(list(`Run #1_Angle` = c(0, 0.01, 0.02, 0.03), `Run #1_Light` = c(0,
0, 0, 0), `Run #2_Angle` = c(NA, 0, 0, 0),
`Run #2_Light` = c(NA, NA, 0, 0)), row.names = c(NA,
4L), class = "data.frame")
I assumed that I would have to gather, then spread:
df_fix<-df %>%
gather()%>%
separate(key, into = c('run', 'variable'), sep = "_") %>%
mutate(variable=as.factor(variable)) %>%
mutate(run=as.factor(run)) %>%
group_by(run) %>%
spread(variable, value)
Which gives me this error:
Error: Each row of output must be identified by a unique combination of keys. Keys are shared for 526500 rows.
526500 rows is my entire dataset, so I'm not quite sure what this means and how to avoid it? Alternatively, is there a different way of keeping part of the header and transforming the other part into a column?
CodePudding user response:
Instead of gather/spread
, we may use pivot_longer
to convert the 'wide' format to 'long' format
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(cols = everything(), names_to = c("Run", ".value"),
names_pattern = ".*\\s #(\\d )_(\\w )", values_drop_na = TRUE)
# A tibble: 7 × 3
Run Angle Light
<chr> <dbl> <dbl>
1 1 0 0
2 1 0.01 0
3 2 0 NA
4 1 0.02 0
5 2 0 0
6 1 0.03 0
7 2 0 0