Filtering data frame in R-CodePudding

Below you can see my data frame.

df<-data.frame( 
                items=c("1 Food Item 1",
                "1.1 Food Item 2",
                "01.1.1 Food Item 3",
                "01.1.2 Food Item 4",
                "01.1.3 Food Item 5",
                "2 Food Item 6",
                "2.1 Food Item 7",
                "02.1.1 Food Item 8",
                "10 Food Item 9",
                "10.1 Food Item 10",
                "10.1.1 Food Item 11",
                "10.1.2 Food Item 12")
    )

df

This df contains items that begin with different numbers with two, three, and four digits. Now I want to filter this df, and the final output should be items only with four digits:

"01.1.1 Food Item 3",
"01.1.2 Food Item 4",
"01.1.3 Food Item 5",
"02.1.1 Food Item 8",
"10.1.1 Food Item 11",
"10.1.2 Food Item 12"

So can anybody help me with how to solve this problem?

CodePudding user response：

Use subset with grepl in base R - matches the pattern of 2 digits (\\d{2}) followed by a dot, then a digit, followed by a dot and another digit and spaces (\\s ) after

subset(df, grepl("^\\d{2}\\.\\d\\.\\d\\s ", items))

-output

           items
3   01.1.1 Food Item 3
4   01.1.2 Food Item 4
5   01.1.3 Food Item 5
8   02.1.1 Food Item 8
11 10.1.1 Food Item 11
12 10.1.2 Food Item 12

CodePudding user response：

library(stringr)
df<-data.frame( 
  items=c("1 Food Item 1",
          "1.1 Food Item 2",
          "01.1.1 Food Item 3",
          "01.1.2 Food Item 4",
          "01.1.3 Food Item 5",
          "2 Food Item 6",
          "2.1 Food Item 7",
          "02.1.1 Food Item 8",
          "10 Food Item 9",
          "10.1 Food Item 10",
          "10.1.1 Food Item 11",
          "10.1.2 Food Item 12")
)

idx <- df$items |> str_detect('^\\d{2}\\.\\d{1}\\.\\d{1}') |> which()
df[ idx, ] |> print()