Home > Net >  Transform columns to rows with a condition in r
Transform columns to rows with a condition in r

Time:10-09

I m trying to merge two columns into one by considering different values and place them into another row if there is. Here is what my dataset looks like.

df <- data.frame(
  id = c(1,2,3),
  role = c("A","B","C"),
  grade.1 = c(3,4,5),
  state.1 = c(1,NA,1),
  grade.2 = c(4,4,5),
  state.2 = c(1,1,NA),
  grade.3 = c(3,4,5),
  state.3 = c(1,1,NA))

> df
  id role grade.1 state.1 grade.2 state.2 grade.3 state.3
1  1    A       3       1       4       1       3       1
2  2    B       4      NA       4       1       4       1
3  3    C       5       1       5      NA       5      NA

I need to merge these grade.1, grade.2 and grade.3 columns into one Grade column. I tried coalesce but it lost id=1 information since it has two grades across grade. columns. ALso, state. mapping did not work as well.

df <- df %>%
  mutate(Grade = coalesce(grade.1, grade.2, grade.3))

> df
  id role grade.1 state.1 grade.2 state.2 grade.3 state.3 Grade
1  1    A       3       1       4       1       3       1     3
2  2    B       4      NA       4       1       4       1     4
3  3    C       5       1       5      NA       5      NA     5

What I would like to have is add another row for id=1 and place the second row by adding second grade. My expected dataset is:

> df.2
  id role Grade state.1 state.2 state.3
1  1    A     3       1      NA       1
2  1    A     4      NA       1      NA
3  2    B     4      NA       1       1
4  3    C     5       1      NA      NA

So when there are multiple grades for an id, it needs to be placed in a different row, and state. mapping should be based on that grade. Any ideas? Thanks!

CodePudding user response:

One way to do this -

  • Get the data in long format.
  • For each id add NA to every value except the current column number in state.
  • For every unique value in grade column get the non-NA value.
library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = starts_with('grade'), 
               values_to = 'grade', names_to = NULL) %>%
  group_by(id) %>%
  mutate(across(starts_with('state'), 
                ~replace(., -as.numeric(sub('state.', '', cur_column(), fixed = TRUE)), NA))) %>%
  group_by(id, role, grade) %>%
  summarise(across(starts_with('state'), ~.x[!is.na(.x)][1]), .groups = 'drop')

#     id role  grade state.1 state.2 state.3
#  <dbl> <chr> <dbl>   <dbl>   <dbl>   <dbl>
#1     1 A         3       1      NA       1
#2     1 A         4      NA       1      NA
#3     2 B         4      NA       1       1
#4     3 C         5       1      NA      NA
  

CodePudding user response:

I'm not sure if this is exactly what you're looking for. Or if this is the best way to do it. But here's what I have so far:

# read them in as 3 tables
df1 <- df[,.(id, role, grade = grade.1, state.1)]
df2 <- df[,.(id, role, grade = grade.2, state.2)]
df3 <- df[,.(id, role, grade = grade.3, state.3)]
# set the keys to do joins
setkey(df1, id, role, grade)
setkey(df2, id, role, grade)
setkey(df3, id, role, grade)
df_res <- rbind(
  df1[df2[df3]],
  df1[df3[df2]],
  df2[df3[df1]],
  df2[df1[df3]],
  df3[df1[df2]],
  df3[df2[df1]],
  fill = T
)
unique(df_res)[order(id)]
>    id role grade state.1 state.2 state.3
1:  1    A     3       1      NA       1
2:  1    A     4      NA       1      NA
3:  2    B     4      NA       1       1
4:  3    C     5       1      NA      NA
  • Related