I'm trying to replace string values in a Dataframe by numeric values in another one. Let me create an example:
df1 <- data.frame("Time" = seq(1, 3, 1),
"Words" = c("dog cat", "bird fish", "elephant shark turtle"))
This is my main Dataframe. It's pretty simple: The first column refers to the Time spent and the second column shows every word registered on that specific time. The other Dataframe is:
df2 <- data.frame("Words" = c("turtle", "bird", "dog", "elephant", "cat", "shark", "fish"),
"Points" = seq(1, 7, 1))
This Dataframe refers to the Points that each Word accomplishes. My goal is to calculate the median of points on each time period. For instance, when Time = 1, "dog" equals 3 and "cat" equals 5, so the median will be 4.
I'm having a lot of trouble with this. The first thing is how to separate the words within a string. Then, I think that I have to create a loop to search in df2 the values that appears in df1 and substitute them by their values. Finally, I have to reach the median of each Time period to finish the task.
Any ideas?
CodePudding user response:
Using tidy approach
library(tidyverse)
df1 %>%
mutate(Words = strsplit(df1$Words, " ")) %>%
unnest(Words) %>%
inner_join(df2, by="Words") %>%
group_by(Time) %>%
summarize(Score = median(Points))
Using data.table
library(data.table)
setDT(df1)[,.(Words=strsplit(Words, " ")[[1]]),by=Time] %>%
.[setDT(df2), on=.(Words)] %>%
.[,(Score = median(Points)), by=Time]
Output:
Time Score
<dbl> <dbl>
1 1 4
2 2 4.5
3 3 4