I have the following dataframe:
df <- data.frame(q = c("a, b, c", "a, b, d"), combined = c("big big sentence","I like sentences"))
q combined
1 a, b, c big big sentence
2 a, b, d I like sentences
I am looking to count the frequency of each unique word per unique q. The desired output looks like:
words freq V1 V2 V3
1 big 2 a b c
2 sentence 1 a b c
3 I 1 a b d
4 like 1 a b d
5 sentences 1 a b d
I managed to write some code to do this for the first row of df only. How can I transfer this code to a loop, so that is does the data manipulation steps for each of the rows in df?
The code I wrote for 1 row which works:
df_1 <- df[1,]
countdf <- data.frame(table(unlist(strsplit(tolower(df_1$combined), " "))))
countsplit <- str_split_fixed(df_1$q, ",", 3)
countsplit <- as.data.frame(countsplit)
countdf$V1 <- countsplit$V1
countdf$V2 <- countsplit$V2
countdf$V3 <- countsplit$V3
CodePudding user response:
library(tidyverse)
df <- data.frame(q = c("a, b, c", "a, b, d"), combined = c("big big sentence","I like sentences"))
df %>%
as_tibble() %>%
transmute(q, words = combined %>% map(~ .x %>% str_split(" ") %>% simplify)) %>%
unnest(words) %>%
separate(q, into = c("V1", "V2", "V3")) %>%
count(V1, V2, V3, words, name = "freq")
#> # A tibble: 5 x 5
#> V1 V2 V3 words freq
#> <chr> <chr> <chr> <chr> <int>
#> 1 a b c big 2
#> 2 a b c sentence 1
#> 3 a b d I 1
#> 4 a b d like 1
#> 5 a b d sentences 1
Created on 2022-02-22 by the reprex package (v2.0.0)
CodePudding user response:
You can use separate_rows
and separate
:
library(tidyr)
library(dplyr)
df %>%
separate_rows(combined) %>%
group_by(q, words = combined) %>%
summarise(freq = n()) %>%
separate(q, into = c("V1", "V2", "V3"))
# A tibble: 5 x 5
V1 V2 V3 words freq
<chr> <chr> <chr> <chr> <int>
1 a b c big 2
2 a b c sentence 1
3 a b d I 1
4 a b d like 1
5 a b d sentences 1