Home > Mobile >  r transfer values from one dataset to another by ID
r transfer values from one dataset to another by ID

Time:10-12

I have two datasets , the first dataset is like this

   ID     Weight     State
   1      12.34      NA
   2      11.23      IA
   2      13.12      IN
   3      12.67      MA 
   4      10.89      NA
   5      14.12      NA

The second dataset is a lookup table for state values by ID

   ID    State
   1     WY
   2     IA
   3     MA
   4     OR
   4     CA
   5     FL

As you can see there are two different state values for ID 4, which is normal.

What I want to do is replace the NAs in dataset1 State column with State values from dataset 2. Expected dataset

  ID     Weight     State
   1      12.34      WY
   2      11.23      IA
   2      13.12      IN
   3      12.67      MA 
   4      10.89      OR,CA
   5      14.12      FL

Since ID 4 has two state values in dataset2 , these two values are collapsed and separated by , and used to replace the NA in dataset1. Any suggestion on accomplishing this is much appreciated. Thanks in advance.

CodePudding user response:

Collapse df2 value and join it with df1 by 'ID'. Use coalesce to use non-NA value from the two state columns.

library(dplyr)

df1 %>%
  left_join(df2 %>%
              group_by(ID) %>%
              summarise(State = toString(State)), by = 'ID') %>%
  mutate(State = coalesce(State.x, State.y)) %>%
  select(-State.x, -State.y)

#  ID Weight  State
#1  1   12.3     WY
#2  2   11.2     IA
#3  2   13.1     IN
#4  3   12.7     MA
#5  4   10.9 OR, CA
#6  5   14.1     FL

In base R with merge and transform.

merge(df1, aggregate(State~ID, df2, toString), by = 'ID') |>
  transform(State = ifelse(is.na(State.x), State.y, State.x))

CodePudding user response:

Tidyverse way:

library(tidyverse)
df1 %>%
  left_join(df2 %>%
              group_by(ID) %>%
              summarise(State = toString(State)) %>%
              ungroup(), by = 'ID') %>%
  transmute(ID, Weight, State = coalesce(State.x, State.y))

Base R alternative:

na_idx <- which(is.na(df1$State))
df1$State[na_idx] <- with(
  aggregate(State ~ ID, df2, toString),
  State[match(df1$ID, ID)]
)[na_idx]

Data:

df1 <- structure(list(ID = c(1L, 2L, 2L, 3L, 4L, 5L), Weight = c(12.34, 
11.23, 13.12, 12.67, 10.89, 14.12), State = c("WY", "IA", "IN", 
"MA", "OR, CA", "FL")), row.names = c(NA, -6L), class = "data.frame")

df2 <- structure(list(ID = c(1L, 2L, 3L, 4L, 4L, 5L), State = c("WY", 
"IA", "MA", "OR", "CA", "FL")), class = "data.frame", row.names = c(NA, 
-6L))
  • Related