Home > Software design >  How to count the occurrences of "c(\" in a string in a data frame in R?
How to count the occurrences of "c(\" in a string in a data frame in R?

Time:11-20

I have a data frame where certain columns contain the error and warning messages from Mplus. The text is saved in a weird format, so rather than trying to process each message, I was hoping to simply count the number of messages by counting the occurrences of c(\ in the cell as it is the unique character combination that appears before each warning or error.

For example, one cell contains the messages:

[[1]]
[1] "c(\"All variables are uncorrelated with all other variables within class.\""
[2] " \"Check that this is what is intended.\""                                  
[3] " \"1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS\")"                         
[4] " c(\"WARNING:  THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED.  THE\""     
[5] " \"SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA.  INCREASE THE\""    
[6] " \"NUMBER OF RANDOM STARTS.\")" 

while another contains a shorter message like this:

[[1]]
[1] "c(\"All variables are uncorrelated with all other variables within class.\""
[2] " \"Check that this is what is intended.\""                                  
[3] " \"1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS\")" 

I've tried using str_count several different ways, including my most recent attempt:

    str_count(test#, '//c(\//')

but I get the error: Error: '\/' is an unrecognized escape in character string starting "'//c(\/". Ideally, this would return 2 for the first example, and 1 for the second example.

How can I count the occurrences of this unique string when it contains characters that throw off most ways of encapsulating it or escaping?

Here's some easy to use test-code to try it on!

test1 <- '"c(\"All variables are uncorrelated with all other variables within class.\"" " \"Check that this is what is intended.\"" " \"1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS\")"'

test2 <- '"c(\"All variables are uncorrelated with all other variables within class.\"" " \"Check that this is what is intended.\"" " \"1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS\")" " c(\"WARNING:  THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED.  THE\"" " \"SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA.  INCREASE THE\"" " \"NUMBER OF RANDOM STARTS.\")"'

CodePudding user response:

You can try either reducing the part to be counted like in my comment

str_count(test1, "c\\(")

or you can lengthen the parameter and use the fixed() argument by checking for c(\":

str_count(test1, fixed('c(\"'))

as you can see both ways show the correct answer(s):

string1 <- 'c(\"All variables are uncorrelated with all other variables within class.\"" 
             " \"Check that this is what is intended.\"" 
             " \"1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS\")" 
             " c(\"WARNING:  THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED. 
             THE\"" " \"SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA.  INCREASE THE\""
             " \"NUMBER OF RANDOM STARTS.\")'

> str_count(string1, fixed('c(\"'))
[1] 2
> str_count(string1, "c\\(")
[1] 2

CodePudding user response:

You could try gregexpr().

test1 <- '"c(\" foo bar baz'
test2 <- '"c(\" foo bar baz "c(\" baz bar foo'

length(unlist(gregexpr('c\\(', test1)))
# [1] 1
length(unlist(gregexpr('c\\(', test2)))
# [1] 2
length(unlist(gregexpr('c\\(', list(test1, test2))))
# [1] 3
  • Related