r transfer values from one dataset to another by ID-CodePudding

I have two datasets , the first dataset is like this

   ID     Weight     State
   1      12.34      NA
   2      11.23      IA
   2      13.12      IN
   3      12.67      MA 
   4      10.89      NA
   5      14.12      NA

The second dataset is a lookup table for state values by ID

   ID    State
   1     WY
   2     IA
   3     MA
   4     OR
   4     CA
   5     FL

As you can see there are two different state values for ID 4, which is normal.

What I want to do is replace the NAs in dataset1 State column with State values from dataset 2. Expected dataset

  ID     Weight     State
   1      12.34      WY
   2      11.23      IA
   2      13.12      IN
   3      12.67      MA 
   4      10.89      OR,CA
   5      14.12      FL

Since ID 4 has two state values in dataset2 , these two values are collapsed and separated by , and used to replace the NA in dataset1. Any suggestion on accomplishing this is much appreciated. Thanks in advance.

CodePudding user response：

Collapse df2 value and join it with df1 by 'ID'. Use coalesce to use non-NA value from the two state columns.

library(dplyr)

df1 %>%
  left_join(df2 %>%
              group_by(ID) %>%
              summarise(State = toString(State)), by = 'ID') %>%
  mutate(State = coalesce(State.x, State.y)) %>%
  select(-State.x, -State.y)

#  ID Weight  State
#1  1   12.3     WY
#2  2   11.2     IA
#3  2   13.1     IN
#4  3   12.7     MA
#5  4   10.9 OR, CA
#6  5   14.1     FL

In base R with merge and transform.

merge(df1, aggregate(State~ID, df2, toString), by = 'ID') |>
  transform(State = ifelse(is.na(State.x), State.y, State.x))

CodePudding user response：

Tidyverse way:

library(tidyverse)
df1 %>%
  left_join(df2 %>%
              group_by(ID) %>%
              summarise(State = toString(State)) %>%
              ungroup(), by = 'ID') %>%
  transmute(ID, Weight, State = coalesce(State.x, State.y))

Base R alternative:

na_idx <- which(is.na(df1$State))
df1$State[na_idx] <- with(
  aggregate(State ~ ID, df2, toString),
  State[match(df1$ID, ID)]
)[na_idx]

Data:

df1 <- structure(list(ID = c(1L, 2L, 2L, 3L, 4L, 5L), Weight = c(12.34, 
11.23, 13.12, 12.67, 10.89, 14.12), State = c("WY", "IA", "IN", 
"MA", "OR, CA", "FL")), row.names = c(NA, -6L), class = "data.frame")

df2 <- structure(list(ID = c(1L, 2L, 3L, 4L, 4L, 5L), State = c("WY", 
"IA", "MA", "OR", "CA", "FL")), class = "data.frame", row.names = c(NA, 
-6L))