I am trying to read in multiple excel spreadsheets in which the data is organized into tables. (Example of what the spreadsheet looks like currently in the image):
Data in Excel to be transformed in R, has repeated tables
I need the experimental unit (cow) as its own column in analysis, but it is only located in the header of each table. Any ideas of the best way to rearrange the data in R? The first row of the spreadsheet is also read in as the column names, which I obviously don't really want either. Here is a reproducible example of what the sheet looks like when read into R:
r1<-c("id","Cow 5590","...2","...3","Control","...4","...5","...6","...7","...8")
r2<-c("","","date","day","intake1","intake2","eattime","visits","dmi","vtime")
r3<-c("1","","43173","-21","24384","14.6","214","10","1.46","21.4")
r4<-c("1"," ","43174","-20","24374","17.6","217","17","1.76","27.4")
r5<-c("id","Cow 5591","...2","...3","Control","...4","...5","...6","...7","...8")
r6<-c("","","date","day","intake1","intake2","eattime","visits","dmi","vtime")
r7<-c("1"," ","43173","-21","24364","15.6","234","12","1.36","22.4")
r8<-c("1"," ","43174","-20","24884","18.6","284","18","1.86","28.4")
r9<-c("id","Cow 5592","...2","...3","Control","...4","...5","...6","...7","...8")
r10<-c("","","date","day","intake1","intake2","eattime","visits","dmi","vtime")
r11<-c("1"," ","43173","-21","24564","15.5","234","15","1.56","24.4")
r12<-c("1"," ","43174","-20","24584","15.6","254","18","1.85","34.4")
df<-data.frame(matrix(ncol=10,nrow=11))
colnames(df)<-r1
df[1,]<-r2
df[2,]<-r3
df[3,]<-r4
df[4,]<-r5
df[5,]<-r6
df[6,]<-r7
df[7,]<-r8
df[8,]<-r9
df[9,]<-r10
df[10,]<-r11
df[11,]<-r12
This is what I want it to look like:
c1<-c("id","Cow", "date","day","intake1","intake2","eattime","visits","dmi","vtime")
c2<-c("1","Cow 5590","43173","-21","24384","14.6","214","10","1.46","21.4")
c3<-c("1","Cow 5590","43174","-20","24374","17.6","217","17","1.76","27.4")
c4<-c("1","Cow 5591","43173","-21","24364","15.6","234","12","1.36","22.4")
c5<-c("1","Cow 5591","43174","-20","24884","18.6","284","18","1.86","28.4")
c6<-c("1","Cow 5592","43173","-21","24564","15.5","234","15","1.56","24.4")
c7<-c("1","Cow 5592","43174","-20","24584","15.6","254","18","1.85","34.4")
df2<-data.frame(matrix(ncol=10,nrow=6))
colnames(df2)<-c1
df2[1,]<-c2
df2[2,]<-c3
df2[3,]<-c4
df2[4,]<-c5
df2[5,]<-c6
df2[6,]<-c7
First time poster. I appreciate any help.
CodePudding user response:
We may do
library(dplyr)
library(tidyr)
library(janitor)
df %>%
mutate(`Cow 5590` = na_if(trimws(`Cow 5590`), "")) %>%
fill(`Cow 5590`) %>%
mutate(`Cow 5590` = replace_na(`Cow 5590`, "Cow 5590")) %>%
mutate(id = replace(id, 1, 'id')) %>%
row_to_names(1) %>%
filter(!id %in% c("id", "")) %>%
rename(Cow = `Cow 5590`) %>%
type.convert(as.is = TRUE)
-output
id Cow date day intake1 intake2 eattime visits dmi vtime
1 1 Cow 5590 43173 -21 24384 14.6 214 10 1.46 21.4
2 1 Cow 5590 43174 -20 24374 17.6 217 17 1.76 27.4
3 1 Cow 5591 43173 -21 24364 15.6 234 12 1.36 22.4
4 1 Cow 5591 43174 -20 24884 18.6 284 18 1.86 28.4
5 1 Cow 5592 43173 -21 24564 15.5 234 15 1.56 24.4
6 1 Cow 5592 43174 -20 24584 15.6 254 18 1.85 34.4