Home > Software design >  R - how to create a loop for a unique word frequency count
R - how to create a loop for a unique word frequency count

Time:02-23

I have the following dataframe:

df <- data.frame(q = c("a, b, c", "a, b, d"), combined = c("big big sentence","I like sentences"))


        q           combined
1       a, b, c     big big sentence
2       a, b, d     I like sentences

I am looking to count the frequency of each unique word per unique q. The desired output looks like:

      words freq V1 V2 V3
1       big    2  a  b  c
2  sentence    1  a  b  c
3         I    1  a  b  d
4      like    1  a  b  d
5 sentences    1  a  b  d

I managed to write some code to do this for the first row of df only. How can I transfer this code to a loop, so that is does the data manipulation steps for each of the rows in df?

The code I wrote for 1 row which works:

df_1 <- df[1,]

countdf <- data.frame(table(unlist(strsplit(tolower(df_1$combined), " "))))

countsplit <- str_split_fixed(df_1$q, ",", 3)
countsplit <- as.data.frame(countsplit)

countdf$V1 <- countsplit$V1
countdf$V2 <- countsplit$V2
countdf$V3 <- countsplit$V3

CodePudding user response:

library(tidyverse)

df <- data.frame(q = c("a, b, c", "a, b, d"), combined = c("big big sentence","I like sentences"))

df %>%
  as_tibble() %>%
  transmute(q, words = combined %>% map(~ .x %>% str_split(" ") %>% simplify)) %>%
  unnest(words) %>%
  separate(q, into = c("V1", "V2", "V3")) %>%
  count(V1, V2, V3, words, name = "freq")
#> # A tibble: 5 x 5
#>   V1    V2    V3    words      freq
#>   <chr> <chr> <chr> <chr>     <int>
#> 1 a     b     c     big           2
#> 2 a     b     c     sentence      1
#> 3 a     b     d     I             1
#> 4 a     b     d     like          1
#> 5 a     b     d     sentences     1

Created on 2022-02-22 by the reprex package (v2.0.0)

CodePudding user response:

You can use separate_rows and separate:

library(tidyr)
library(dplyr)

df %>% 
  separate_rows(combined) %>% 
  group_by(q, words = combined) %>% 
  summarise(freq = n()) %>% 
  separate(q, into = c("V1", "V2", "V3"))

# A tibble: 5 x 5
  V1    V2    V3    words      freq
  <chr> <chr> <chr> <chr>     <int>
1 a     b     c     big           2
2 a     b     c     sentence      1
3 a     b     d     I             1
4 a     b     d     like          1
5 a     b     d     sentences     1
  • Related