I am aiming to filter my data if I can find the string Code
in any of the columns. It should filter at the first occurrence of this word.
I am working with a list and the columns containing Code
change by their arrangement. So I need a general way to do this like the attempt.
Some example data:
# A tibble: 11 × 10
`Title:` `A&E weekly activity statistics, NHS…` ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Contact: Paul Steele - [email protected] NA NA NA NA NA NA NA NA
2 NA NA NA NA NA NA NA NA NA NA
3 Provider Level Data NA NA NA NA NA NA NA NA NA
4 NA NA NA A&E … NA NA NA A&E … NA NA
5 SHA Code Name Type… Type… Type… Tota… Type… Type… Type…
6 - - Engl… 2831… 12906 1369… 4330… 15347 34 172
7 NA NA NA NA NA NA NA NA NA NA
8 Q30 RLN City… 1423 669 297 2389 202 0 18
9 Q30 RXP Coun… 2473 0 2088 4561 89 0 0
10 Q30 5J9 Darl… 0 0 0 0 0 0 0
11 Q30 RR7 Gate… 1251 0 0 1251 24 0 0
My attempt:
vf %>% filter(row_number() >= across(everything(), ~ .x %>% which(. == 'Code')))
I get these error:
Error in `filter()`:
! Problem while computing `..1 = row_number() >= ...`.
Caused by error in `across()`:
! Problem while computing column `Title:`.
Caused by error in `which()`:
! argument to 'which' is not logical
Run `rlang::last_error()` to see where the error occurred.
Expected output:
# A tibble: 7 × 10
`Title:` `A&E weekly activity statistics, NHS and indepen…` ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 SHA Code Name Type… Type… Type… Tota… Type… Type… Type…
2 - - Engl… 2831… 12906 1369… 4330… 15347 34 172
3 NA NA NA NA NA NA NA NA NA NA
4 Q30 RLN City… 1423 669 297 2389 202 0 18
5 Q30 RXP Coun… 2473 0 2088 4561 89 0 0
6 Q30 5J9 Darl… 0 0 0 0 0 0 0
7 Q30 RR7 Gate… 1251 0 0 1251 24 0 0
Reproducible code:
structure(list(`Title:` = c("Contact:", NA, "Provider Level Data",
NA, "SHA", "-", NA, "Q30", "Q30", "Q30", "Q30"), `A&E weekly activity statistics, NHS and independent sector organisations in England` = c("Paul Steele - [email protected]",
NA, NA, NA, "Code", "-", NA, "RLN", "RXP", "5J9", "RR7"), ...3 = c(NA,
NA, NA, NA, "Name", "England", NA, "City Hospitals Sunderland NHS Foundation Trust",
"County Durham And Darlington NHS Foundation Trust", "Darlington PCT",
"Gateshead Health NHS Foundation Trust"), ...4 = c(NA, NA, NA,
"A&E attendances", "Type 1 Departments - Major A&E", "283175",
NA, "1423", "2473", "0", "1251"), ...5 = c(NA, NA, NA, NA, "Type 2 Departments - Single Specialty",
"12906", NA, "669", "0", "0", "0"), ...6 = c(NA, NA, NA, NA,
"Type 3 Departments - Other A&E/Minor Injury Unit", "136985",
NA, "297", "2088", "0", "0"), ...7 = c(NA, NA, NA, NA, "Total attendances",
"433066", NA, "2389", "4561", "0", "1251"), ...8 = c(NA, NA,
NA, "A&E attendances > 4 hours from arrival to admission, transfer or discharge",
"Type 1 Departments - Major A&E", "15347", NA, "202", "89", "0",
"24"), ...9 = c(NA, NA, NA, NA, "Type 2 Departments - Single Specialty",
"34", NA, "0", "0", "0", "0"), ...10 = c(NA, NA, NA, NA, "Type 3 Departments - Other A&E/Minor Injury Unit",
"172", NA, "18", "0", "0", "0")), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"))
CodePudding user response:
The following attempt seems to get what I am after:
data %>% filter(row_number() >= which(keep(.,~any(which(.=='Code'))) == 'Code'))
however, I am sure there are cleaner attempts than this!
What I have done:
- filter for row numbers greater than or equal to the logical vector containing
Code
in any of the columns. keep
retains the column that hasCode
in it, sofilter
can do its work.
Output:
# A tibble: 7 × 10
`Title:` `A&E weekly activity statistics, NHS and indepen…` ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 SHA Code Name Type… Type… Type… Tota… Type… Type… Type…
2 - - Engl… 2831… 12906 1369… 4330… 15347 34 172
3 NA NA NA NA NA NA NA NA NA NA
4 Q30 RLN City… 1423 669 297 2389 202 0 18
5 Q30 RXP Coun… 2473 0 2088 4561 89 0 0
6 Q30 5J9 Darl… 0 0 0 0 0 0 0
7 Q30 RR7 Gate… 1251 0 0 1251 24 0 0
CodePudding user response:
There is a function cumany
that can accomplish your task for you.
It returns TRUE for all cases after the first appearance of "Code"
df |>
filter(if_any(everything(), ~ cumany(.x %in% "Code")))
`Title:` `A&E weekly activity statistics, NHS and independent sector organisations in England` ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 SHA Code Name Type… Type… Type… Tota… Type… Type… Type…
2 - - England 2831… 12906 1369… 4330… 15347 34 172
3 NA NA NA NA NA NA NA NA NA NA
4 Q30 RLN City Hospita… 1423 669 297 2389 202 0 18
5 Q30 RXP County Durha… 2473 0 2088 4561 89 0 0
6 Q30 5J9 Darlington P… 0 0 0 0 0 0 0
7 Q30 RR7 Gateshead He… 1251 0 0 1251 24 0 0
CodePudding user response:
You can use str_detect
to detect "Code" across
all columns in combination with if_any
to see if there is any column with the string and remove all rows before that specific row like this (Thanks to @Darren Tsai):
library(stringr)
library(dplyr)
vf %>%
filter(row_number() >= which.max(if_any(everything(), str_detect, "Code")))
#> # A tibble: 7 × 10
#> `Title:` `A&E weekly activit…` ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 SHA Code Name Type… Type… Type… Tota… Type… Type… Type…
#> 2 - - Engl… 2831… 12906 1369… 4330… 15347 34 172
#> 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 4 Q30 RLN City… 1423 669 297 2389 202 0 18
#> 5 Q30 RXP Coun… 2473 0 2088 4561 89 0 0
#> 6 Q30 5J9 Darl… 0 0 0 0 0 0 0
#> 7 Q30 RR7 Gate… 1251 0 0 1251 24 0 0
Created on 2022-07-01 by the reprex package (v2.0.1)
Old answer:
You can use str_detect
to detect "Code" across
all columns and remove all rows before that specific row like this:
library(stringr)
library(dplyr)
vf %>%
filter(row_number() > if_all(everything(), ~!str_detect(., "Code")))
#> # A tibble: 6 × 10
#> `Title:` `A&E weekly activit…` ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 SHA Code Name Type… Type… Type… Tota… Type… Type… Type…
#> 2 - - Engl… 2831… 12906 1369… 4330… 15347 34 172
#> 3 Q30 RLN City… 1423 669 297 2389 202 0 18
#> 4 Q30 RXP Coun… 2473 0 2088 4561 89 0 0
#> 5 Q30 5J9 Darl… 0 0 0 0 0 0 0
#> 6 Q30 RR7 Gate… 1251 0 0 1251 24 0 0
Created on 2022-07-01 by the reprex package (v2.0.1)