I'm trying to make column names from the rows with date. Take the following dataset, for instance:
df <- data.frame(student=c('', '', 'C', 'D', 'E'),
scores=c('May 30, 2022', 35, 31, 39, 35))
df
student scores
1 May 30, 2022
2 35
3 C 31
4 D 39
5 E 35
I want to change the row 1 (with date) from scores
column and changed it into column name and then remove the entire row. I'm trying the following script (from janitor
package) to get the column name:
df %>%
row_to_names(row_number = 1)
May 30, 2022
<chr> <chr>
2 35
3 C 31
4 D 39
5 E 35
It perfectly works here. However, sometimes the date value comes separately -- broken into 2 rows.
student scores
<chr> <chr>
May, 30
2022
C 31
D 39
E 35
The previous script doesn't work here. What would be the ideal way to automate the columns names from rows -- whether single row or two rows -- with a function?
Desired Output
student May 30, 2022
1 C 31
2 D 39
3 E 35
Any suggestions would be appreciated. Thanks!
CodePudding user response:
Using your janitor::row_to_names
approach, you can generalize this with a function:
df_1line <- data.frame(student=c('', '', 'C', 'D', 'E'),
scores=c('May 30, 2022', 35, 31, 39, 35))
df_2lines <- data.frame(student=c('', '', 'C', 'D', 'E'),
scores=c('May 30', 2022, 31, 39, 35))
ex_fun <- function(x){
if(as.numeric(x[2,2]) > 1000){ # quick approach which assumes "true" scores < 1000
x[1,2] <- paste(x[1:2, 2], collapse = ", ")
}
x <- x %>% janitor::row_to_names(row_number = 1)
names(x)[1] <- "student"
x[-1,]
}
ex_fun(df_1line)
# student May 30, 2022
# 3 C 31
# 4 D 39
# 5 E 35
ex_fun(df_2lines)
# student May 30, 2022
# 3 C 31
# 4 D 39
# 5 E 35
CodePudding user response:
Maybe this code will be helpful.
if(as.numeric(df[2,2])>1000){
names(df)[2] <- c(paste0(df[1,2],", ",df[2,2]))
df<- df[-1:-2,]
}else{
names(df)[2] <- c(paste0(df[1,2]))
df<- df[-1,]
}