This is the extended version of the previous question. I'm trying to make column names from the rows with date. The problem is the dataset has mostly two patterns with one or multiple empty columns.
# Pattern 1: date in single row
df1 <- data.frame(student=c('', 'B', 'C', 'D', 'E'),
score1=c('', '', '', '', ''),
score2=c('May 30, 2021', '', '31', '39', '35'),
score3=c('June 30, 2022', '', '33', '37', '34'))
df1
student score1 score2 score3
<chr> <chr> <chr> <chr>
May 30, 2021 June 30, 2022
B
C 31 33
D 39 37
E 35 34
# Pattern 2: date in two rows
df2 <- data.frame(student=c('', '', 'C', 'D', 'E'),
score1=c('', '', '', '', ''),
score2=c('May 30,', '2021', '', '39', '35'),
score3=c('June 30,', '2022', '', '37', '34'))
df2
student score1 score2 score3
<chr> <chr> <chr> <chr>
May 30, June 30,
2021 2022
C
D 39 37
E 35 34
I want to change the row/rows with date -- sometime one row, sometime two rows -- from score
columns and changed it into column name and then remove the entire row. @jpsmith gave a great suggestion, which nicely works for the pattern 1, except pattern 2. It works for the pattern 2 only if the 2nd row isn't empty.
ex_fun <- function(x){
if(is.na(as.numeric(x[2,2])) > 1000){
x[1,2] <- paste(x[1:2, 2], collapse = ", ")
}
x <- x %>% janitor::row_to_names(row_number = 1)
names(x)[1] <- "student"
x[-1,]
}
ex_fun(df2)
student May 30, June 30,
<chr> <chr> <chr> <chr>
3 C
4 D 39 37
5 E 35 34
Desired Output
student May 30, 2021 June 30, 2022
<chr> <chr> <chr> <chr>
3 C
4 D 39 37
5 E 35 34
Any suggestions would be appreciated. Thanks!
CodePudding user response:
Try this:
names(df1)[grepl("score", names(df1))] <- ""
names(df1)[-1] <- sapply(subset(df1, !nzchar(student), select = -1), paste, collapse = " ")
df1 <- subset(df1, nzchar(student))
df1
# student May 30, 2021 June 30, 2022
# 2 B
# 3 C 31 33
# 4 D 39 37
# 5 E 35 34
names(df2)[grepl("score", names(df2))] <- ""
names(df2)[-1] <- sapply(subset(df2, !nzchar(student), select = -1), paste, collapse = " ")
df2 <- subset(df2, nzchar(student))
df2
# student May 30, 2021 June 30, 2022
# 3 C
# 4 D 39 37
# 5 E 35 34
CodePudding user response:
Try this function, it works with your two df.
ex_fun <- function(x){
for(i in 2:4)
{
if(str_detect(x[1,i],","))
{
if(length(str_split(x[1,i],", ")[[1]]) == 1)
names(x)[i] = x[1,i]
else
names(x)[i] <- paste(x[1:2,i], collapse = " ")
}
}
x[-c(1,2),]
}