Home > Enterprise >  R dplyr subset rows that contain any character value
R dplyr subset rows that contain any character value

Time:07-16

I have a dataset in which a particular column (x in this case) has certain rows with character values.

How can I slice/subset these rows which contain any character value for data exploration purposes?

Please note that I don't want to hardcode all characters in rows to tell the code if the rows have these characters then subset these rows. Because my original dataset is huge, so this column could have any character values in any number of rows.

So, that is the purpose to see all those rows in that column that have a character value

Sample data and code:

library(dplyr)

x = c("1000", "1001", "1003", "14484R", "1004", "1005",  "12241alternet", "12634TAB", "12644R", "END", NA, NA)
y = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)

df = data.frame(x, y)

# subset/slice rows in column x that contain any character value 

CodePudding user response:

One option to capture any letters using [:alpha:]:

library(dplyr)
library(stringr)

df %>% filter(str_detect(x, "[:alpha:]"))
              x  y
1        14484R  4
2 12241alternet  7
3      12634TAB  8
4        12644R  9
5           END 10

or since you said your dataset is huge:

library(data.table)
setDT(df)
df[str_detect(x, "[:alpha:]")]

CodePudding user response:

Easiest would be to convert to numeric with as.numeric, which will return NA for all non-numeric elements, then use is.na to find the NA elements as a logical vector for subsetting (there would be a warning as part of the forced conversion to numeric)

subset(df, is.na(as.numeric(x)) & !is.na(x))
             x  y
4         14484R  4
7  12241alternet  7
8       12634TAB  8
9         12644R  9
10           END 10

The & !is.na(x) make sure that it drops all rows with NAs in the original 'x' column

CodePudding user response:

If you want to use base R and no packages :

df[grepl("[:a-z:]|[:A-Z:]", df$x),]
  • Related