I have a data frame which is configured roughly like this:
df <- cbind(c('hello', 'yes', 'example'),c(7,8,5),c(0,0,0))
words | frequency | count |
---|---|---|
hello | 7 | 0 |
yes | 8 | 0 |
example | 5 | 0 |
What I'm trying to do is add values to the third column from a different data frame, which is similiar but looks like this:
df2 <- cbind(c('example','hello') ,c(5,6))
words | frequency |
---|---|
example | 5 |
hello | 6 |
My goal is to find matching values for the first column in both data frames (they have the same column name) and add matching values from the second data frame to the third column of the first data frame.
The result should look like this:
df <- cbind(c('hello', 'yes', 'example'),c(7,8,5),c(6,0,5))
words | frequency | count |
---|---|---|
hello | 7 | 6 |
yes | 8 | 0 |
example | 5 | 5 |
What I've tried so far is:
df <- merge(df,df2, by = "words", all.x=TRUE)
However, it doesn't work.
I could use some help understanding how could it be done. Any help will be welcome.
CodePudding user response:
This is an "update join". My favorite way to do it is in dplyr
:
library(dplyr)
df %>% rows_update(rename(df2, count = frequency), by = "words")
In base R you could do the same thing like this:
names(df2)[2] = "count2"
df = merge(df, df2, by = "words", all.x=TRUE)
df$count = ifelse(is.na(df$coutn2), df$count, df$count2)
df$count2 = NULL
CodePudding user response:
Here is an option with data.table
:
library(data.table)
setDT(df)[setDT(df2), on = "words", count := i.frequency]
Output
words frequency count
<char> <num> <num>
1: hello 7 6
2: yes 8 0
3: example 5 5
Or using match
in base R:
df$count[match(df2$words, df$words)] <- df2$frequency
Or another option with tidyverse
using left_join
and coalesce
:
library(tidyverse)
left_join(df, df2 %>% rename(count.y = frequency), by = "words") %>%
mutate(count = pmax(count.y, count, na.rm = T)) %>%
select(-count.y)
Data
df <- structure(list(words = c("hello", "yes", "example"), frequency = c(7,
8, 5), count = c(0, 0, 0)), class = "data.frame", row.names = c(NA,
-3L))
df2 <- structure(list(words = c("example", "hello"), frequency = c(5, 6)), class = "data.frame", row.names = c(NA,
-2L))