Home > Mobile >  My data has two headers (variable type and grouping factor). How can I split the header and turn the
My data has two headers (variable type and grouping factor). How can I split the header and turn the

Time:10-22

My csv file has two headers, like this:

 Run #1,Run #1,Run #2,Run #2 
 Angle,Light,Angle,Light 
 a,b,c,d    
 e,f,g,h   

The first header gives the measurement number, the second the type of measurement. I would like my data to look like this:

Run Angle Light
1    a      b
1    e      f
2    c      d
2    g      h     

To read the table into R, I had to read in the headers separately using scan and then merge them into one a single header:

header <- scan(file, nlines = 1, sep=",",what = character())
header2 <- scan(file, skip = 1, nlines = 1, sep = ",", what = character())

df<- read.table(file, sep=",", header=F, skip=2)
names(df) <- paste(header, header2, sep = "_")

So I ended up with this:

structure(list(`Run #1_Angle` = c(0, 0.01, 0.02, 0.03), `Run #1_Light` = c(0, 
0, 0, 0), `Run #2_Angle` = c(NA, 0, 0, 0), 
    `Run #2_Light` = c(NA, NA, 0, 0)), row.names = c(NA, 
4L), class = "data.frame")

I assumed that I would have to gather, then spread:

df_fix<-df %>%
  gather()%>%
  separate(key, into = c('run', 'variable'), sep = "_") %>% 
  mutate(variable=as.factor(variable)) %>% 
  mutate(run=as.factor(run)) %>% 
  group_by(run) %>% 
  spread(variable, value)

Which gives me this error:

Error: Each row of output must be identified by a unique combination of keys. Keys are shared for 526500 rows.

526500 rows is my entire dataset, so I'm not quite sure what this means and how to avoid it? Alternatively, is there a different way of keeping part of the header and transforming the other part into a column?

CodePudding user response:

Instead of gather/spread, we may use pivot_longer to convert the 'wide' format to 'long' format

library(dplyr)
library(tidyr)
df1 %>% 
   pivot_longer(cols = everything(), names_to = c("Run", ".value"), 
      names_pattern = ".*\\s #(\\d )_(\\w )", values_drop_na = TRUE)
# A tibble: 7 × 3
  Run   Angle Light
  <chr> <dbl> <dbl>
1 1      0        0
2 1      0.01     0
3 2      0       NA
4 1      0.02     0
5 2      0        0
6 1      0.03     0
7 2      0        0
  •  Tags:  
  • r
  • Related