I have some data which looks like:
I want to remove rows 4 and 5 because they don't contain data preceding "xa".
My code is:
is.na(x$column)
But it returns with TRUE for row 4 but false for row 5. This is just an example but I have many other random pieces of data in the column, but I just want a way to select for when the data is "xaN", with n representing any integer.
CodePudding user response:
We can determine the rows that start with xa
, then keep only those rows.
library(tidyverse)
df %>%
filter(str_detect(column, "^xa"))
Or in base R:
df[grepl("^xa", df$column),]
Output
Row column
1 1 xa123
2 2 xa456
3 3 xa5555
Data
df <- structure(list(Row = 1:5, column = c("xa123", "xa456", "xa5555",
".", "-")), class = "data.frame", row.names = c(NA, -5L))
CodePudding user response:
A data.table
approach using %like%
operator with partial match:
library("data.table")
df[df$column %like% "^xa", ]
Row column
<dbl> <chr>
1 1 xa1
2 2 xa456
3 3 xa555
CodePudding user response:
From base R:
grepl()
searches for character "xa" at the beginning of the string (by regex ^
) and returns a logical value, which can be used by subset
.
subset(df, grepl("^xa", column))
Row column
1 1 xa123
2 2 xa456
3 3 xa5555
Data
structure(list(Row = 1:5, column = c("xa123", "xa456", "xa5555",
".", "-")), class = "data.frame", row.names = c(NA, -5L))
CodePudding user response:
df <- data.frame(Row = 1:5,
column = c("xa123", "xa456", "xa5555", NA, "-"))
df[!is.na(df$column) & substr(df$column, 1, 2) == "xa", ]
Result:
Row column
1 1 xa123
2 2 xa456
3 3 xa5555