Adding new information to a table upon matching rows-CodePudding

I have very basic knowledge of R. I have two tabs (A and B) with rows I want to compare - some values match and some don't. I want R to find the matching elements and add the text value "E" to a pre-existing row in tab A if this is the case.

Example:

Tab A

   ID   Existing?
1  A   
2  B  
3  C   
4  D  
5  E

Tab B

Upon match:

Tab A

   ID   Existing?
1  A       E
2  B       E
3  C   
4  D       E
5  E

I have found information online on how to match tables but none on how to write new information when the match takes place.

Please explain like I'm 5... I have no programming background.

Thank you in advance!

CodePudding user response：

Use match to get the elements in df1$ID that are also in df2$ID, and ifelse to recode the values that are both in df1 and in df2 with "E", and NA otherwise.

df1 <- data.frame(ID = LETTERS[1:5])
df2 <- data.frame(ID = c("D", "B", "Y", "A", "W"))

df1$Existing <- ifelse(match(df1$ID, df2$ID), "E", NA)

  ID Existing
1  A        E
2  B        E
3  C     <NA>
4  D        E
5  E     <NA>

CodePudding user response：

Another solution - using dplyr - would be to join the two dataframes, where you have added the column Existing to the one being joined:

library(dplyr, warn.conflicts = FALSE)

df1 <- tibble(ID = LETTERS[1:5])
df2 <- tibble(ID = c("D", "B", "Y", "A", "W"))

df1 %>% 
  left_join(df2 %>% mutate(Existing = "E"))
#> Joining, by = "ID"
#> # A tibble: 5 x 2
#>   ID    Existing
#>   <chr> <chr>   
#> 1 A     E       
#> 2 B     E       
#> 3 C     <NA>    
#> 4 D     E       
#> 5 E     <NA>

This will set all matching IDs to E and all non-matching to NA.

CodePudding user response：

# data

tab1 <- structure(list(ID = c("A", "B", "C", "D", "E"), Existing = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_)), class = "data.frame", row.names = c(NA,
-5L))

tab2 <- structure(list(ID = c("D", "B", "Y", "A", "W")), class = "data.frame", row.names = c(NA,
-5L))

There are many ways to skin this cat. In base-R, you could try, e.g.,

tab1$Existing[tab1$ID %in% tab2$ID] <- 'E'

In practise, for anything more complicated than tables with 6 rows, you could try dplyr:

library(dplyr)
tab1 %>% mutate(Existing = ifelse(ID %in% tab2$ID, 'E',NA))

Another useful tool -- with a slightly differing syntax -- is data.table.

library(data.table)
setDT(tab1) -> tab1
setDT(tab2) -> tab2
tab1[,Existing := ifelse(tab1$ID %in% tab2$ID, 'E',NA)]

Note that, here mutate and := play roughly the same role. Probably, if you work more with R, you will develop an affinity with one of the "dialects" above.

EDIT: To drop the rows NA values values (in dplyr), you could either do:

    tab1 %>% mutate(Existing = ifelse(ID %in% tab2$ID, 'E',NA)) %>% 
filter(!is.na(Existing))

Or piggy-backing on @jpiversen's solution:

df1 %>% 
  inner_join(df2 %>% mutate(Existing = "E"))