Home > Software design >  Using column index to drop row, rather than being reliant on name
Using column index to drop row, rather than being reliant on name

Time:10-16

I've got a dataframe of the following pattern:

tibble [9 x 2] (S3: tbl_df/tbl/data.frame)
 $ Date: chr [1:9] "Tuesday 4 October 2022" "Wednesday 5 October 2022" "Thursday 6 October 2022" "Note that:"
 $ EVENTS CALENDAR       : chr [1:9] "A61" "A32" "A51" "29 Jan 2029"

I'd like to drop the entire row containing "Note:" in the first column, and "29 Jan 2029" in the second column (found at the bottom of the dataframe).

I've been able to achieve it quite easily by using:

df <- df[!grepl("Note that:", df$`Date: 15-Oct-2022`),]

However given that the "Date: 15-Oct-2022" title will change on the day, I'd like to come up with a more dynamic solution to removing this redundant row.

Attempting to drop by column index using grepl does not work, and seems to blank the entire dataframe.

Once I've removed that final row, I've attempted to convert the date field to a more traditional format using:

df$`Date: 15-Oct-2022` <- as.Date(df$`Date: 15-Oct-2022`, format = "%A %d %B %Y")

Though again trying to use column indexing to do that conversion leads to an error for what I imagine to be similar reasons.

Any advice would be most appreciated.

CodePudding user response:

Use df[[1]] to refer to the column:

df <- df[!grepl("^Note that", df[[1]]),]
df[[1]] <- as.Date(df[[1]], format = "%A %d %B %Y")

If you don't know that the target column is column 1, then you could find out which column it is, like this:

target_column = which(grepl("^Date", names(df)))

and then use that instead:

df <- df[!grepl("^Note that", df[[target_column]]),]
df[[target_column]] <- as.Date(df[[target_column]], format = "%A %d %B %Y")

CodePudding user response:

With dplyrWe can just use a simple filter operation. If we must filter out a row if any variable matches the desired pattern, we can use if_any()

library(dplyr)

df %>%
    filter(!grepl("Note that:", Date))

# A tibble: 3 × 2
  Date                     EVENTS_CALENDAR
  <chr>                    <chr>          
1 Tuesday 4 October 2022   A61            
2 Wednesday 5 October 2022 A32            
3 Thursday 6 October 2022  A51  

If we must filter out a row if any variable matches the desired pattern, we can use if_any()

df %>% 
    filter(!if_any(everything(), ~grepl("Note that:", .x)))
  • Related