I have around 5k csv files named hospitalXXXX_info.csv
, where XXXX refers to the hospital numeric code which varies in the number of characters and does not follow a clean sequence.
I would like to create a vector with all the hospital codes which are informed in these csv files.
Consider the simple example below.
hospital1_info <- data.table(a=1:100, b=1:100)
hospital68_info <- data.table(a=1:100, b=1:100)
hospital999_info <- data.table(a=1:100, b=1:100)
fwrite(hospital1_info,"hospital1_info.csv")
fwrite(hospital68_info,"hospital68_info.csv")
fwrite(hospital999_info,"hospital999_info.csv")
My desired output is below:
> output <- c(1, 68, 999)
> output
[1] 1 68 999
PS: In the same folder I have other csv files with different name patterns (e.g., hospitalXXXXotherinfo.csv
) which I would like to ignore.
CodePudding user response:
Here is a base R option
# Sample data
fn <- c("hospital1_info.csv", "hospital68_info.csv", "hospital999_info.csv", "hospital1000otherinfo.csv")
pattern <- "hospital(\\d )_info\\.csv"
sub(pattern, "\\1", fn[grep(pattern, fn)])
# [1] "1" "68" "999"
I assume that fn
is the result of a e.g. list.files()
calls, listing all CSV files in the relevant folder.
Explanation: Use grep
to filter for those CSV files that match the pattern
; then use sub
to remove everything except for the digit part.
CodePudding user response:
Alternatively you could use parse_number
readr::parse_number(fn)
Result
1 68 999 1000