Home > Mobile >  Adding new column to dataframe using values of multiple columns and querying in another dataframe
Adding new column to dataframe using values of multiple columns and querying in another dataframe

Time:08-29

I have two dataframes like this:

First Second Third Fourth
12345 23423 12342 43214
23453 22342 23421 43214
23453 NA NA NA
Code Type
12345 A
23453 B
23423 Other
12342 A

I would like to add a fifth column to the first data frame based on the codes in each row. Using the second dataframe, if all the codes in a row are Type A, I would like to insert "A" into the fifth column.

If all the codes in a row are Type B, I would like to insert "B" in the fifth column.

If all the codes in a row are Type "Other", I would like to insert "Other".

If there are codes that are both Type A and B, I would like to insert "Both".

If there are codes that are both Type A, B, or A and B and Other, I would like to insert A, B, or Both, respectively.

I would also like to ignore all the NAs in a row.

Thank you so much!

CodePudding user response:

A bit (probably over-)complicated. First join all columns with their respective code, then unite columns and remove NA. Then check for unique values with the aid of map and finally add the rule for "both".

library(tidyverse)

# add row to show whether rule "Both" works
df <- df %>% add_row(First = 12345, Second = 23453)

df %>%
  left_join(type_df, by = c("First" = "Code")) %>%
  left_join(type_df, by = c("Second" = "Code")) %>%
  left_join(type_df, by = c("Third" = "Code")) %>%
  left_join(type_df, by = c("Fourth" = "Code")) %>%
  tidyr::unite(contains("Type"), col = "type", na.rm = TRUE, sep = ",") %>%
  mutate(type = stringr::str_split(type, ",") %>%
    map(., ~ unique(.x)) %>%
    map_chr(., ~ paste(.x, collapse = ", "))) %>%
  mutate(type = case_when(
    str_detect(type, "A") & str_detect(type, "B") ~ "Both",
    TRUE ~ type
  ))

Output:

  First Second Third Fourth     type
1 12345  23423 12342  43214 A, Other
2 23453  22342 23421  43214        B
3 23453     NA    NA     NA        B
4 12345  23453    NA     NA     Both

Data:

df <- read.table(text = "First  Second  Third   Fourth
12345   23423   12342   43214
23453   22342   23421   43214
23453   NA  NA  NA", h=T)

type_df <- read.table(text = "Code  Type
12345   A
23453   B
23423   Other
12342   A", h = T)
  •  Tags:  
  • r
  • Related