Home > Software design >  Find the response with the most words
Find the response with the most words

Time:11-13

Trying to find the respondent in the list g.list with the most words in their response (g.list is a list of respondents with their IDs). However, g.list contains lists within it with the actual response which leads to the lapply(length) giving length 1. I'm struggling to deal with this. I'd ideally like to do this using the lapply() and strsplit() functions. Here's my code:

head(g.list)

Output:

$`1590553444`
[1] "Nothing"

$`1590610566`
[1] "Couldn't sit in a lot of them"

$`1590609253`
[1] "N/a"

Code:

g.split <- lapply(unlist(g.list), strsplit, " ")
head(g.split)

Output:

$`1590553444`
$`1590553444`[[1]]
[1] "Nothing"


$`1590610566`
$`1590610566`[[1]]
[1] "Couldn't" "sit"      "in"       "a"        "lot"      "of"       "them"    


$`1590609253`
$`1590609253`[[1]]
[1] "N/a"

Code:

 g.count <- lapply(unlist(g.split), length)
 head(g.count)

Output:

$`1590553444`
[1] 1

$`1590610566`
[1] 1

$`1590609253`
[1] 1

Code:

max(unlist(g.count))

I was expecting gala.count <- lapply(unlist(gala.split), length) to give the number of words. However, all of them are 1. I know this is super basic, I just got started with learning r.

CodePudding user response:

The issue here was where to place the [[1]]. strsplit() returns a list, hence why length was returned as 1.

# create data
g.list = list(
    `1590553444` = "Nothing",
    `1590610566` = "Couldn't sit in a lot of them",
    `1590609253` = "N/a"
)

# solution
get_len = function(string) {
    length(strsplit(string, " ")[[1]])
}

lapply(g.list, get_len)

$`1590553444`
[1] 1

$`1590610566`
[1] 7

$`1590609253`
[1] 1

To get the max:

max(unlist(lengths))

[1] 7

CodePudding user response:

If we have only one entry per list element, unlist, then strsplit the vector which returns a list of vectors and then use lengths

out <- lengths(strsplit(unlist(g.list), " "))
out
1590553444 1590610566 1590609253 
         1          7          1 

Then use which.max to get the index and extract the element with the max count

g.list[which.max(out)]
$`1590610566`
[1] "Couldn't sit in a lot of them"

Or another option is with str_count

library(stringr)
str_count(g.list, "\\S ")
[1] 1 7 1

data

g.list <- list(`1590553444` = "Nothing", 
  `1590610566` = "Couldn't sit in a lot of them", 
    `1590609253` = "N/a")
  •  Tags:  
  • r
  • Related