Home > Software design >  R: reshaping data frame - add column(s), while keeping row index values consistent
R: reshaping data frame - add column(s), while keeping row index values consistent

Time:09-14

I would like to reshape the following data frame using dplyr::mutate(): The goal is to create a new column variable percent but ensuring that the values correspond to the correct subject. The data is currently in the "long" format. Detailed: I would like to extract rows pertaining the pattern "percentQ" and create a new column called percent based on each subject to ensure that the score aligns with the corresponding subject.

df_long <- structure(list(id = c(NA,NA, "scoreQ1", NA, "scoreQ2", NA, NA,"percentQ1", "percentQ2", NA, "GPA"), 
                          subject = c(NA,NA, "Chris", NA, "Liz", NA, NA, "Chris","Liz", NA, NA),
                          grade = c(NA,NA, 45L, NA, 60L, NA, NA, 75L, 100L, NA)), row.names = c(NA,-11L), class = c("data.table", "data.frame"))

print(df_long)

#id       subject   grade
#<NA>      <NA>      NA
#<NA>      <NA>      NA
#scoreQ1   Chris     45
#<NA>      <NA>      NA
#scoreQ2   Liz       60
#<NA>      <NA>      NA
#<NA>      <NA>      NA
#percentQ1 Chris     75
#percentQ2 Liz       100
#<NA>      <NA>      NA
#GPA       <NA>      NA

Please suggest R scripts that would allow to reshape the data frame to the following: As can be seen, the percent value corresponds to the correct subject, in this instance 75 for Chris and 100 for Liz. I have been running into issues, where I was unable to assign the percent values to the correct subject.

df_wide <- structure(list(id = c(NA,NA, "scoreQ1", NA, "scoreQ2", NA, NA, NA, "GPA"), 
                          subject = c(NA,NA, "Chris", NA, "Liz", NA, NA, NA, NA),
                          grade = c(NA,NA, 45L, NA, 60L, NA, NA, NA, NA),
                          percent = c(NA,NA, 75L, NA, 100L, NA, NA, NA, NA)), row.names =  c(NA,-9L), class = c("data.table", "data.frame"))

print(df_wide)

#id       subject  grade   percent
#<NA>     <NA>     NA      NA
#<NA>     <NA>     NA      NA
#scoreQ1  Chris    45      75
#<NA>     <NA>     NA      NA
#scoreQ2  Liz      60     100
#<NA>     <NA>     NA      NA
#<NA>     <NA>     NA      NA
#<NA>     <NA>     NA      NA
#GPA     <NA>      NA      NA

CodePudding user response:

I think your task can be completed with a join process.

Here, we can filter out the values with percent in the id columns, then left_join it to the data_frame after using rename.

library(dplyr)

percent <- grepl("percent", df_long$id)

df_long |> 
  filter(!percent) |> 
  left_join(
    filter(df_long, percent) |> 
      rename(percent = "grade") |> 
      select(-id), by = "subject"
    )
       id subject grade percent
1:    <NA>    <NA>    NA      NA
2:    <NA>    <NA>    NA      NA
3: scoreQ1   Chris    45      75
4:    <NA>    <NA>    NA      NA
5: scoreQ2     Liz    60     100
6:    <NA>    <NA>    NA      NA
7:    <NA>    <NA>    NA      NA
8:    <NA>    <NA>    NA      NA
9:     GPA    <NA>    NA      NA

Data

df_long <- structure(list(id = c(NA, NA, "scoreQ1", NA, "scoreQ2", NA, NA, 
"percentQ1", "percentQ2", NA, "GPA"), subject = c(NA, NA, "Chris", 
NA, "Liz", NA, NA, "Chris", "Liz", NA, NA), grade = c(NA, NA, 
45L, NA, 60L, NA, NA, 75L, 100L, NA, NA)), row.names = c(NA, 
-11L), class = c("data.table", "data.frame"))

CodePudding user response:

I may be missing something, but you aren't really moving from long form to wide form, you are just adding an additional column. Assuming that the top mark is 60 then:

library(tidyverse)

df2 <-
  df_long |>
  mutate(percent = (grade/60)*100)

should do the trick.

  • Related