Home > database >  Take Dates from Row Values and Change Into Column Names Using R
Take Dates from Row Values and Change Into Column Names Using R

Time:12-28

This is the extended version of the previous question. I'm trying to make column names from the rows with date. The problem is the dataset has mostly two patterns with one or multiple empty columns.

# Pattern 1: date in single row
df1 <- data.frame(student=c('', 'B', 'C', 'D', 'E'),
                score1=c('', '', '', '', ''),
                score2=c('May 30, 2021', '', '31', '39', '35'),
                score3=c('June 30, 2022', '', '33', '37', '34'))
df1                                                                              ​                     ​
student   score1     score2       score3
<chr>     <chr>      <chr>        <chr>
                 May 30, 2021  June 30, 2022
B           
C                     31            33
D                     39            37
E                     35            34 

# Pattern 2: date in two rows
df2 <- data.frame(student=c('', '', 'C', 'D', 'E'),
                score1=c('', '', '', '', ''),
                score2=c('May 30,', '2021', '', '39', '35'),
                score3=c('June 30,',  '2022', '', '37', '34'))
df2

student       score1        score2       score3
<chr>          <chr>         <chr>       <chr>
                             May 30,    June 30,
                              2021       2022
C           
D                              39         37
E                              35         34 

I want to change the row/rows with date -- sometime one row, sometime two rows -- from score columns and changed it into column name and then remove the entire row. @jpsmith gave a great suggestion, which nicely works for the pattern 1, except pattern 2. It works for the pattern 2 only if the 2nd row isn't empty.

ex_fun <- function(x){
  if(is.na(as.numeric(x[2,2])) > 1000){
    x[1,2] <- paste(x[1:2, 2], collapse = ", ")
  } 
  x <- x %>% janitor::row_to_names(row_number = 1) 
  names(x)[1] <- "student"
  x[-1,]
}

ex_fun(df2)

  student           May 30,       June 30,
  <chr>    <chr>    <chr>          <chr>
3   C           
4   D                39             37
5   E                35             34

Desired Output

    student         May 30, 2021    June 30, 2022
    <chr>   <chr>     <chr>            <chr>
3    C          
4    D                 39               37
5    E                 35               34

Any suggestions would be appreciated. Thanks!

CodePudding user response:

Try this:

names(df1)[grepl("score", names(df1))] <- ""
names(df1)[-1] <- sapply(subset(df1, !nzchar(student), select = -1), paste, collapse = " ")
df1 <- subset(df1, nzchar(student))
df1
#   student  May 30, 2021 June 30, 2022
# 2       B                            
# 3       C            31            33
# 4       D            39            37
# 5       E            35            34

names(df2)[grepl("score", names(df2))] <- ""
names(df2)[-1] <- sapply(subset(df2, !nzchar(student), select = -1), paste, collapse = " ")
df2 <- subset(df2, nzchar(student))
df2
#   student   May 30, 2021 June 30, 2022
# 3       C                             
# 4       D             39            37
# 5       E             35            34

CodePudding user response:

Try this function, it works with your two df.

  ex_fun <- function(x){
  for(i in 2:4)
  {
    if(str_detect(x[1,i],","))
    {
      if(length(str_split(x[1,i],", ")[[1]]) == 1)
        names(x)[i] = x[1,i]
      else
        names(x)[i] <- paste(x[1:2,i], collapse = " ")
    }
  }
  x[-c(1,2),]
}
  • Related