I have two data frames
. One of them contains numbers of questions as text and I use the grep()
function to match those numbers to the name of my other dataframe
columns.
The problem is that a part of my code
doesn't work because my function
grep()
is not doing the trick.
Basically my two dataframes
are as follows
DF1:
Question | Group |
---|---|
11 | Redmeat |
100 | Chicken |
56 | Vegetables |
210 | Dairy |
DF 2 (values don't matter, only the column name):
1.Question | 2.Question | ... | 101.Question | ... | 250.Question |
---|---|---|---|---|---|
Yes | No | ... | ... | ... | ... |
Yes | Yes | ... | ... | ... | ... |
No | Yes | ... | ... | ... | ... |
No | Yes | ... | ... | ... | ... |
I use the following code:
i <- n ## I change n according to the row of DF1 that I want
grep(DF1$Question[i], colnames(DF2), fixed = T)
If I do:
i <- 2 ## (Question number 100)
grep(DF1$Question[i], colnames(DF2), fixed = T)
My code returns 100, which is correct since it's the column that corresponds to "100.Question"
But if I do:
i <- 1 ## (Question number 1)
grep(DF1$Question[i], colnames(DF2), fixed = T)
My code returns 1, 11, 21 ... 101 ... 201
Same if i do:
i <- 3 ## (Question number 56)
grep(DF1$Question[i], colnames(DF2), fixed = T)
It returns 56, 156
I only want the exact same number. Even if i use the argument fixed = TRUE
it doesn't work.
Is there a solution or an alternative?
CodePudding user response:
Two options: 1) Include the .
in the grep pattern, grep(paste0("^", DF1$Question[i], "\\."), colnames(DF2))
, or 2) paste the full ".Question"
on and use exact matching without any grep at all: paste0(DF1$Question, ".Question")
. This will likely be more efficient than regex. Since your code has these i
s all over the place, I assume you're using a loop. grep and paste are vectorized, so if you provide more context we may be able to help you avoid the loop entirely.
CodePudding user response:
What about specifying in the pattern
that you want from the start ^
and you want it to be followed by .Q
?
i=3
grep(paste0("^",DF1$Question[i],".Q"), colnames(DF2))
Output:
[1] 56
CodePudding user response:
You need to grep
for unique values, therefore you should grep
the start of the string ^
, together with your number and the dot .
. In this case, you cannot use the fixed = T
argument, since you are using regex to match.
grep(paste0("^", DF1$Question[i], "\\."), colnames(DF2))