I m trying to merge two columns into one by considering different values and place them into another row if there is. Here is what my dataset looks like.
df <- data.frame(
id = c(1,2,3),
role = c("A","B","C"),
grade.1 = c(3,4,5),
state.1 = c(1,NA,1),
grade.2 = c(4,4,5),
state.2 = c(1,1,NA),
grade.3 = c(3,4,5),
state.3 = c(1,1,NA))
> df
id role grade.1 state.1 grade.2 state.2 grade.3 state.3
1 1 A 3 1 4 1 3 1
2 2 B 4 NA 4 1 4 1
3 3 C 5 1 5 NA 5 NA
I need to merge these grade.1
, grade.2
and grade.3
columns into one Grade
column. I tried coalesce
but it lost id
=1
information since it has two grades across grade.
columns. ALso, state.
mapping did not work as well.
df <- df %>%
mutate(Grade = coalesce(grade.1, grade.2, grade.3))
> df
id role grade.1 state.1 grade.2 state.2 grade.3 state.3 Grade
1 1 A 3 1 4 1 3 1 3
2 2 B 4 NA 4 1 4 1 4
3 3 C 5 1 5 NA 5 NA 5
What I would like to have is add another row for id
=1
and place the second row by adding second grade. My expected dataset is:
> df.2
id role Grade state.1 state.2 state.3
1 1 A 3 1 NA 1
2 1 A 4 NA 1 NA
3 2 B 4 NA 1 1
4 3 C 5 1 NA NA
So when there are multiple grades for an id
, it needs to be placed in a different row, and state.
mapping should be based on that grade.
Any ideas?
Thanks!
CodePudding user response:
One way to do this -
- Get the data in long format.
- For each
id
addNA
to every value except the current column number instate
. - For every unique value in
grade
column get the non-NA value.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with('grade'),
values_to = 'grade', names_to = NULL) %>%
group_by(id) %>%
mutate(across(starts_with('state'),
~replace(., -as.numeric(sub('state.', '', cur_column(), fixed = TRUE)), NA))) %>%
group_by(id, role, grade) %>%
summarise(across(starts_with('state'), ~.x[!is.na(.x)][1]), .groups = 'drop')
# id role grade state.1 state.2 state.3
# <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#1 1 A 3 1 NA 1
#2 1 A 4 NA 1 NA
#3 2 B 4 NA 1 1
#4 3 C 5 1 NA NA
CodePudding user response:
I'm not sure if this is exactly what you're looking for. Or if this is the best way to do it. But here's what I have so far:
# read them in as 3 tables
df1 <- df[,.(id, role, grade = grade.1, state.1)]
df2 <- df[,.(id, role, grade = grade.2, state.2)]
df3 <- df[,.(id, role, grade = grade.3, state.3)]
# set the keys to do joins
setkey(df1, id, role, grade)
setkey(df2, id, role, grade)
setkey(df3, id, role, grade)
df_res <- rbind(
df1[df2[df3]],
df1[df3[df2]],
df2[df3[df1]],
df2[df1[df3]],
df3[df1[df2]],
df3[df2[df1]],
fill = T
)
unique(df_res)[order(id)]
> id role grade state.1 state.2 state.3
1: 1 A 3 1 NA 1
2: 1 A 4 NA 1 NA
3: 2 B 4 NA 1 1
4: 3 C 5 1 NA NA