Home > database >  Removing rows from a data.frame with -grep removes all rows if no matches are found (-- how to preve
Removing rows from a data.frame with -grep removes all rows if no matches are found (-- how to preve

Time:01-19

I have a data frame that gets updated frequently, and there are some rows that need to be removed from it if certain strings are found in them. I have done that previously using -grep to remove the rows containing the string in question, eg:

dataframe[-grep('some string', dataframe$column),]

However, at times that string doesn't appear in the dataframe, in which case the -grep is returning an empty dataframe. Here's a minimal reproducible example:

> test.df<-data.frame(number=c(1:10), letter=letters[1:10])

> test.df
   number letter
1       1      a
2       2      b
3       3      c
4       4      d
5       5      e
6       6      f
7       7      g
8       8      h
9       9      i
10     10      j

> test.df[-grep('h', test.df$letter),]
   number letter
1       1      a
2       2      b
3       3      c
4       4      d
5       5      e
6       6      f
7       7      g
9       9      i
10     10      j

> test.df[-grep('k', test.df$letter),]
[1] number letter
<0 rows> (or 0-length row.names)

I could wrap the 'test.df[-grep...' in an 'if' test to check if the search string is found prior to removing it, eg:

if(any(grepl('k',test.df$letter))){test.df<-test.df[-grep('k', test.df$letter),]}

...but it seems to me that this should be implicit in the -grep command. Is there a better (more efficient) way to accomplish row removal that doesn't threaten to remove all my data if the search string is absent from the data frame?

CodePudding user response:

Using grepl you could do:

test.df <- data.frame(number = c(1:10), letter = letters[1:10])

test.df[!grepl("h", test.df$letter), ]
#>    number letter
#> 1       1      a
#> 2       2      b
#> 3       3      c
#> 4       4      d
#> 5       5      e
#> 6       6      f
#> 7       7      g
#> 9       9      i
#> 10     10      j

test.df[!grepl("k", test.df$letter), ]
#>    number letter
#> 1       1      a
#> 2       2      b
#> 3       3      c
#> 4       4      d
#> 5       5      e
#> 6       6      f
#> 7       7      g
#> 8       8      h
#> 9       9      i
#> 10     10      j

Created on 2023-01-19 with reprex v2.0.2

  • Related