Home > database >  R: adding matching vector values from two dataframes in one colomn
R: adding matching vector values from two dataframes in one colomn

Time:06-30

I have a data frame which is configured roughly like this:

df <- cbind(c('hello', 'yes', 'example'),c(7,8,5),c(0,0,0))
words frequency count
hello 7 0
yes 8 0
example 5 0

What I'm trying to do is add values to the third column from a different data frame, which is similiar but looks like this:

df2 <- cbind(c('example','hello') ,c(5,6))
words frequency
example 5
hello 6

My goal is to find matching values for the first column in both data frames (they have the same column name) and add matching values from the second data frame to the third column of the first data frame.

The result should look like this:

df <- cbind(c('hello', 'yes', 'example'),c(7,8,5),c(6,0,5))
words frequency count
hello 7 6
yes 8 0
example 5 5

What I've tried so far is:

df <- merge(df,df2, by = "words", all.x=TRUE) 

However, it doesn't work.

I could use some help understanding how could it be done. Any help will be welcome.

CodePudding user response:

This is an "update join". My favorite way to do it is in dplyr:

library(dplyr)
df %>% rows_update(rename(df2, count = frequency), by = "words")

In base R you could do the same thing like this:

names(df2)[2] = "count2"
df = merge(df, df2, by = "words", all.x=TRUE)
df$count = ifelse(is.na(df$coutn2), df$count, df$count2)
df$count2 = NULL

CodePudding user response:

Here is an option with data.table:

library(data.table)

setDT(df)[setDT(df2), on = "words", count := i.frequency]

Output

     words frequency count
    <char>     <num> <num>
1:   hello         7     6
2:     yes         8     0
3: example         5     5

Or using match in base R:

df$count[match(df2$words, df$words)] <- df2$frequency

Or another option with tidyverse using left_join and coalesce:

library(tidyverse)

left_join(df, df2 %>% rename(count.y = frequency), by = "words") %>%
  mutate(count = pmax(count.y, count, na.rm = T)) %>%
  select(-count.y)

Data

df <- structure(list(words = c("hello", "yes", "example"), frequency = c(7, 
8, 5), count = c(0, 0, 0)), class = "data.frame", row.names = c(NA, 
-3L))

df2 <- structure(list(words = c("example", "hello"), frequency = c(5, 6)), class = "data.frame", row.names = c(NA, 
-2L))
  • Related