Transform columns to rows with a condition in r-CodePudding

I m trying to merge two columns into one by considering different values and place them into another row if there is. Here is what my dataset looks like.

df <- data.frame(
  id = c(1,2,3),
  role = c("A","B","C"),
  grade.1 = c(3,4,5),
  state.1 = c(1,NA,1),
  grade.2 = c(4,4,5),
  state.2 = c(1,1,NA),
  grade.3 = c(3,4,5),
  state.3 = c(1,1,NA))

> df
  id role grade.1 state.1 grade.2 state.2 grade.3 state.3
1  1    A       3       1       4       1       3       1
2  2    B       4      NA       4       1       4       1
3  3    C       5       1       5      NA       5      NA

I need to merge these grade.1, grade.2 and grade.3 columns into one Grade column. I tried coalesce but it lost id=1 information since it has two grades across grade. columns. ALso, state. mapping did not work as well.

df <- df %>%
  mutate(Grade = coalesce(grade.1, grade.2, grade.3))

> df
  id role grade.1 state.1 grade.2 state.2 grade.3 state.3 Grade
1  1    A       3       1       4       1       3       1     3
2  2    B       4      NA       4       1       4       1     4
3  3    C       5       1       5      NA       5      NA     5

What I would like to have is add another row for id=1 and place the second row by adding second grade. My expected dataset is:

> df.2
  id role Grade state.1 state.2 state.3
1  1    A     3       1      NA       1
2  1    A     4      NA       1      NA
3  2    B     4      NA       1       1
4  3    C     5       1      NA      NA

So when there are multiple grades for an id, it needs to be placed in a different row, and state. mapping should be based on that grade. Any ideas? Thanks!

CodePudding user response：

One way to do this -

Get the data in long format.
For each id add NA to every value except the current column number in state.
For every unique value in grade column get the non-NA value.

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = starts_with('grade'), 
               values_to = 'grade', names_to = NULL) %>%
  group_by(id) %>%
  mutate(across(starts_with('state'), 
                ~replace(., -as.numeric(sub('state.', '', cur_column(), fixed = TRUE)), NA))) %>%
  group_by(id, role, grade) %>%
  summarise(across(starts_with('state'), ~.x[!is.na(.x)][1]), .groups = 'drop')

#     id role  grade state.1 state.2 state.3
#  <dbl> <chr> <dbl>   <dbl>   <dbl>   <dbl>
#1     1 A         3       1      NA       1
#2     1 A         4      NA       1      NA
#3     2 B         4      NA       1       1
#4     3 C         5       1      NA      NA

CodePudding user response：

I'm not sure if this is exactly what you're looking for. Or if this is the best way to do it. But here's what I have so far:

# read them in as 3 tables
df1 <- df[,.(id, role, grade = grade.1, state.1)]
df2 <- df[,.(id, role, grade = grade.2, state.2)]
df3 <- df[,.(id, role, grade = grade.3, state.3)]
# set the keys to do joins
setkey(df1, id, role, grade)
setkey(df2, id, role, grade)
setkey(df3, id, role, grade)
df_res <- rbind(
  df1[df2[df3]],
  df1[df3[df2]],
  df2[df3[df1]],
  df2[df1[df3]],
  df3[df1[df2]],
  df3[df2[df1]],
  fill = T
)
unique(df_res)[order(id)]
>    id role grade state.1 state.2 state.3
1:  1    A     3       1      NA       1
2:  1    A     4      NA       1      NA
3:  2    B     4      NA       1       1
4:  3    C     5       1      NA      NA