I have a dataset in which a particular column (x
in this case) has certain rows with character values.
How can I slice/subset these rows which contain any character value for data exploration purposes?
Please note that I don't want to hardcode all characters in rows to tell the code if the rows have these characters then subset these rows. Because my original dataset is huge, so this column could have any character values in any number of rows.
So, that is the purpose to see all those rows in that column that have a character value
Sample data and code:
library(dplyr)
x = c("1000", "1001", "1003", "14484R", "1004", "1005", "12241alternet", "12634TAB", "12644R", "END", NA, NA)
y = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
df = data.frame(x, y)
# subset/slice rows in column x that contain any character value
CodePudding user response:
One option to capture any letters using [:alpha:]
:
library(dplyr)
library(stringr)
df %>% filter(str_detect(x, "[:alpha:]"))
x y
1 14484R 4
2 12241alternet 7
3 12634TAB 8
4 12644R 9
5 END 10
or since you said your dataset is huge:
library(data.table)
setDT(df)
df[str_detect(x, "[:alpha:]")]
CodePudding user response:
Easiest would be to convert to numeric with as.numeric
, which will return NA for all non-numeric elements, then use is.na
to find the NA elements as a logical vector for subset
ting (there would be a warning as part of the forced conversion to numeric)
subset(df, is.na(as.numeric(x)) & !is.na(x))
x y
4 14484R 4
7 12241alternet 7
8 12634TAB 8
9 12644R 9
10 END 10
The & !is.na(x)
make sure that it drops all rows with NA
s in the original 'x' column
CodePudding user response:
If you want to use base R and no packages :
df[grepl("[:a-z:]|[:A-Z:]", df$x),]