How to remove data which doesn't begin with 'xa' in my code?-CodePudding

I have some data which looks like:

I want to remove rows 4 and 5 because they don't contain data preceding "xa".

My code is:

is.na(x$column)

But it returns with TRUE for row 4 but false for row 5. This is just an example but I have many other random pieces of data in the column, but I just want a way to select for when the data is "xaN", with n representing any integer.

CodePudding user response：

We can determine the rows that start with xa, then keep only those rows.

library(tidyverse)

df %>% 
  filter(str_detect(column, "^xa"))

Or in base R:

df[grepl("^xa", df$column),]

Output

  Row column
1   1  xa123
2   2  xa456
3   3 xa5555

Data

df <- structure(list(Row = 1:5, column = c("xa123", "xa456", "xa5555", 
".", "-")), class = "data.frame", row.names = c(NA, -5L))

CodePudding user response：

A data.table approach using %like% operator with partial match:

library("data.table")  
df[df$column %like% "^xa", ]

    Row column
  <dbl> <chr> 
1     1 xa1   
2     2 xa456 
3     3 xa555

CodePudding user response：

From base R:

grepl() searches for character "xa" at the beginning of the string (by regex ^) and returns a logical value, which can be used by subset.

subset(df, grepl("^xa", column))

  Row column
1   1  xa123
2   2  xa456
3   3 xa5555

Data

structure(list(Row = 1:5, column = c("xa123", "xa456", "xa5555", 
".", "-")), class = "data.frame", row.names = c(NA, -5L))

CodePudding user response：

df <- data.frame(Row = 1:5,
                 column = c("xa123", "xa456", "xa5555", NA, "-"))

df[!is.na(df$column) & substr(df$column, 1, 2) == "xa", ]

Result:

  Row column
1   1  xa123
2   2  xa456
3   3 xa5555