Create R vector with numeric codes informed in .csv file name-CodePudding

I have around 5k csv files named hospitalXXXX_info.csv, where XXXX refers to the hospital numeric code which varies in the number of characters and does not follow a clean sequence.

I would like to create a vector with all the hospital codes which are informed in these csv files.

Consider the simple example below.

hospital1_info <- data.table(a=1:100, b=1:100)
hospital68_info <- data.table(a=1:100, b=1:100)
hospital999_info <- data.table(a=1:100, b=1:100)
fwrite(hospital1_info,"hospital1_info.csv")
fwrite(hospital68_info,"hospital68_info.csv")
fwrite(hospital999_info,"hospital999_info.csv")

My desired output is below:

> output <- c(1, 68, 999)
> output
[1]   1  68 999

PS: In the same folder I have other csv files with different name patterns (e.g., hospitalXXXXotherinfo.csv) which I would like to ignore.

CodePudding user response：

Here is a base R option

# Sample data
fn <- c("hospital1_info.csv", "hospital68_info.csv", "hospital999_info.csv", "hospital1000otherinfo.csv")

pattern <- "hospital(\\d )_info\\.csv"
sub(pattern, "\\1", fn[grep(pattern, fn)])
# [1] "1"   "68"  "999"

I assume that fn is the result of a e.g. list.files() calls, listing all CSV files in the relevant folder.

Explanation: Use grep to filter for those CSV files that match the pattern; then use sub to remove everything except for the digit part.

CodePudding user response：

Alternatively you could use parse_number

readr::parse_number(fn)

Result

1   68  999 1000