I have a data frame that gets updated frequently, and there are some rows that need to be removed from it if certain strings are found in them. I have done that previously using -grep to remove the rows containing the string in question, eg:
dataframe[-grep('some string', dataframe$column),]
However, at times that string doesn't appear in the dataframe, in which case the -grep is returning an empty dataframe. Here's a minimal reproducible example:
> test.df<-data.frame(number=c(1:10), letter=letters[1:10])
> test.df
number letter
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 f
7 7 g
8 8 h
9 9 i
10 10 j
> test.df[-grep('h', test.df$letter),]
number letter
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 f
7 7 g
9 9 i
10 10 j
> test.df[-grep('k', test.df$letter),]
[1] number letter
<0 rows> (or 0-length row.names)
I could wrap the 'test.df[-grep...' in an 'if' test to check if the search string is found prior to removing it, eg:
if(any(grepl('k',test.df$letter))){test.df<-test.df[-grep('k', test.df$letter),]}
...but it seems to me that this should be implicit in the -grep command. Is there a better (more efficient) way to accomplish row removal that doesn't threaten to remove all my data if the search string is absent from the data frame?
CodePudding user response:
Using grepl
you could do:
test.df <- data.frame(number = c(1:10), letter = letters[1:10])
test.df[!grepl("h", test.df$letter), ]
#> number letter
#> 1 1 a
#> 2 2 b
#> 3 3 c
#> 4 4 d
#> 5 5 e
#> 6 6 f
#> 7 7 g
#> 9 9 i
#> 10 10 j
test.df[!grepl("k", test.df$letter), ]
#> number letter
#> 1 1 a
#> 2 2 b
#> 3 3 c
#> 4 4 d
#> 5 5 e
#> 6 6 f
#> 7 7 g
#> 8 8 h
#> 9 9 i
#> 10 10 j
Created on 2023-01-19 with reprex v2.0.2