Home > database >  Is there another alternative to grep?
Is there another alternative to grep?

Time:05-20

I have two data frames. One of them contains numbers of questions as text and I use the grep() function to match those numbers to the name of my other dataframe columns.

The problem is that a part of my code doesn't work because my function grep() is not doing the trick.

Basically my two dataframesare as follows

DF1:

Question Group
11 Redmeat
100 Chicken
56 Vegetables
210 Dairy

DF 2 (values don't matter, only the column name):

1.Question 2.Question ... 101.Question ... 250.Question
Yes No ... ... ... ...
Yes Yes ... ... ... ...
No Yes ... ... ... ...
No Yes ... ... ... ...

I use the following code:

i <- n ## I change n according to the row of DF1 that I want
grep(DF1$Question[i], colnames(DF2), fixed = T)

If I do:

i <- 2  ## (Question number 100)
grep(DF1$Question[i], colnames(DF2), fixed = T)

My code returns 100, which is correct since it's the column that corresponds to "100.Question"

But if I do:

i <- 1  ## (Question number 1)
grep(DF1$Question[i], colnames(DF2), fixed = T)

My code returns 1, 11, 21 ... 101 ... 201

Same if i do:

i <- 3  ## (Question number 56)
grep(DF1$Question[i], colnames(DF2), fixed = T)

It returns 56, 156

I only want the exact same number. Even if i use the argument fixed = TRUE it doesn't work.

Is there a solution or an alternative?

CodePudding user response:

Two options: 1) Include the . in the grep pattern, grep(paste0("^", DF1$Question[i], "\\."), colnames(DF2)), or 2) paste the full ".Question" on and use exact matching without any grep at all: paste0(DF1$Question, ".Question"). This will likely be more efficient than regex. Since your code has these is all over the place, I assume you're using a loop. grep and paste are vectorized, so if you provide more context we may be able to help you avoid the loop entirely.

CodePudding user response:

What about specifying in the pattern that you want from the start ^ and you want it to be followed by .Q?

i=3
grep(paste0("^",DF1$Question[i],".Q"), colnames(DF2))

Output:

[1] 56

CodePudding user response:

You need to grep for unique values, therefore you should grep the start of the string ^, together with your number and the dot .. In this case, you cannot use the fixed = T argument, since you are using regex to match.

grep(paste0("^", DF1$Question[i], "\\."), colnames(DF2))
  • Related