Below you can see my data frame.
df<-data.frame(
items=c("1 Food Item 1",
"1.1 Food Item 2",
"01.1.1 Food Item 3",
"01.1.2 Food Item 4",
"01.1.3 Food Item 5",
"2 Food Item 6",
"2.1 Food Item 7",
"02.1.1 Food Item 8",
"10 Food Item 9",
"10.1 Food Item 10",
"10.1.1 Food Item 11",
"10.1.2 Food Item 12")
)
df
This df
contains items that begin with different numbers with two, three, and four digits. Now I want to filter this df
, and the final output should be items only with four digits:
"01.1.1 Food Item 3",
"01.1.2 Food Item 4",
"01.1.3 Food Item 5",
"02.1.1 Food Item 8",
"10.1.1 Food Item 11",
"10.1.2 Food Item 12"
So can anybody help me with how to solve this problem?
CodePudding user response:
Use subset
with grepl
in base R
- matches the pattern of 2 digits (\\d{2}
) followed by a dot, then a digit, followed by a dot and another digit and spaces (\\s
) after
subset(df, grepl("^\\d{2}\\.\\d\\.\\d\\s ", items))
-output
items
3 01.1.1 Food Item 3
4 01.1.2 Food Item 4
5 01.1.3 Food Item 5
8 02.1.1 Food Item 8
11 10.1.1 Food Item 11
12 10.1.2 Food Item 12
CodePudding user response:
library(stringr)
df<-data.frame(
items=c("1 Food Item 1",
"1.1 Food Item 2",
"01.1.1 Food Item 3",
"01.1.2 Food Item 4",
"01.1.3 Food Item 5",
"2 Food Item 6",
"2.1 Food Item 7",
"02.1.1 Food Item 8",
"10 Food Item 9",
"10.1 Food Item 10",
"10.1.1 Food Item 11",
"10.1.2 Food Item 12")
)
idx <- df$items |> str_detect('^\\d{2}\\.\\d{1}\\.\\d{1}') |> which()
df[ idx, ] |> print()