I tried the following command (which works in powershell window)
system('powershell -command \"Get-ChildItem -Filter "*.html" | Where-Object { $_.LastWriteTime -ge "11/12/2021 09:10:00" }\"')
However, from the R console I get the error:
At line:1 char:79
... er *.html | Where-Object { $_.LastWriteTime -ge 11/12/2021 09:10:00 }
~~~~~~~~
Unexpected token '09:10:00' in expression or statement.
CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
FullyQualifiedErrorId : UnexpectedToken
[1] 1
EDIT
Using Mikael Jagan's idea I managed to get a result:
system(paste('powershell -command ',shQuote('Get-ChildItem -Filter "*.html" | Where-Object { $_.LastWriteTime -ge "11/30/2021 09:10:00" }')))
However, as you see below, the result is a character vector rather a dataframe. Is there a way to get only a vector with the filenames?
[1] ""
[2] ""
[3] " Directory: C:\\Users\\user\\Documents\\R_Data\\texk"
[4] ""
[5] ""
[6] "Mode LastWriteTime Length Name "
[7] "---- ------------- ------ ---- "
[8] "-a---- 30/11/2021 10:31 386751 auth_cash_flow.html "
[9] "-a---- 30/11/2021 10:31 189370 auth_cash_flow_total.html "
[10] "-a---- 30/11/2021 10:31 552947 auth_symv_gantt.html "
[11] "-a---- 30/11/2021 10:31 93238 auth_tender_schedule.html "
[12] "-a---- 30/11/2021 10:30 683088 dev_constr_pivot.html "
[13] "-a---- 30/11/2021 10:30 70224 form_org_chart.html "
[14] "-a---- 30/11/2021 10:31 199907 form_org_chart2.html "
[15] "-a---- 30/11/2021 10:30 618821 form_workload.html "
[16] "-a---- 30/11/2021 10:30 109127 index.html "
[17] ""
[18] ""
This is my session info:
> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=Greek_Greece.1253 LC_CTYPE=Greek_Greece.1253 LC_MONETARY=Greek_Greece.1253
[4] LC_NUMERIC=C LC_TIME=Greek_Greece.1253
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.0.5 tools_4.0.5
CodePudding user response:
x <- system(paste('powershell -command ',shQuote('Get-ChildItem -Filter "*.html" | Where-Object { $_.LastWriteTime -ge "11/30/2021 09:10:00" } | Select -exp Name')), intern = TRUE)
CodePudding user response:
Alternatively, this can done using R's dedicated function file.info(), it is about 6x times faster than powershell answer provided by GeorgeDontas.
Benchmarking is done on a folder with 1228 files, of which 553 match on "*.txt"
, then filter on date, gives 5 files:
microbenchmark::microbenchmark(
R = {
files <- file.info(list.files(pattern = "*.txt"))
rownames(files[ files$mtime > as.POSIXct("01/01/2021 09:10:00", format = "%m/%d/%Y %H:%M:%S"), ])
},
PS = {
system(paste('powershell -command ',
shQuote('Get-ChildItem -Filter "*.txt" | Where-Object { $_.LastWriteTime -ge "01/01/2021 09:10:00" } | Select -exp Name')),
intern = TRUE)
})
# Unit: milliseconds
# expr min lq mean median uq max neval
# R 147.7614 161.6208 173.5587 168.8470 181.1235 261.7563 100
# PS 801.9531 959.1979 997.8903 999.4841 1024.4515 1259.1713 100
# ~6x
# 997/168 = 5.934524
Note: I understand OP might have valid reasons to use powershell approach, posting this answer as a benchmark.
CodePudding user response:
How about this? I am assuming from your sample output (a) that file names only occur at the end of a line of shell output followed by zero or more spaces and (b) that file names contain only alphanumeric characters and underscores prior to the .html
suffix. If that is not the case, then the regular expressions in the second and third lines would need to be generalized slightly.
## Get shell output as character vector
x <- system(paste('powershell -command', shQuote('Get-ChildItem -Filter "*.html" | Where-Object { $_.LastWriteTime -ge "11/30/2021 09:10:00" } | Select -exp Name')), intern = TRUE)
## Find lines ending in ".html" followed by zero or more spaces
i <- grep("\\.html\\s*$", x)
## Extract file names from those lines
fn <- sub("^.* (\\w*\\.html)\\s*$", "\\1", x[i])
Here is a test with the lines from your shell output:
x <- c(
"",
"",
" Directory: C:\\Users\\user\\Documents\\R_Data\\texk",
"",
"",
"Mode LastWriteTime Length Name ",
"---- ------------- ------ ---- ",
"-a---- 30/11/2021 10:31 386751 auth_cash_flow.html ",
"-a---- 30/11/2021 10:31 189370 auth_cash_flow_total.html ",
"-a---- 30/11/2021 10:31 552947 auth_symv_gantt.html ",
"-a---- 30/11/2021 10:31 93238 auth_tender_schedule.html ",
"-a---- 30/11/2021 10:30 683088 dev_constr_pivot.html ",
"-a---- 30/11/2021 10:30 70224 form_org_chart.html ",
"-a---- 30/11/2021 10:31 199907 form_org_chart2.html ",
"-a---- 30/11/2021 10:30 618821 form_workload.html ",
"-a---- 30/11/2021 10:30 109127 index.html ",
"",
""
)
i <- grep("\\.html\\s*$", x)
i
# [1] 8 9 10 11 12 13 14 15 16
fn <- sub("^.* (\\w*\\.html)\\s*$", "\\1", x[i])
fn
# [1] "auth_cash_flow.html" "auth_cash_flow_total.html"
# [3] "auth_symv_gantt.html" "auth_tender_schedule.html"
# [5] "dev_constr_pivot.html" "form_org_chart.html"
# [7] "form_org_chart2.html" "form_workload.html"
# [9] "index.html"
And in case you do want a data frame in the end:
## Read lines of shell output containing file names into table
tc <- textConnection(x[i])
dd <- read.table(tc)
names(dd) <- c("mode", "date", "time", "length", "name")
close(tc)
## Coerce character date and time to useful date-time class
dd$time <- as.POSIXct(paste(dd$date, dd$time), format = "%d/%m/%Y %H:%M")
dd$date <- NULL
dd
# mode time length name
# 1 -a---- 2021-11-30 10:31:00 386751 auth_cash_flow.html
# 2 -a---- 2021-11-30 10:31:00 189370 auth_cash_flow_total.html
# 3 -a---- 2021-11-30 10:31:00 552947 auth_symv_gantt.html
# 4 -a---- 2021-11-30 10:31:00 93238 auth_tender_schedule.html
# 5 -a---- 2021-11-30 10:30:00 683088 dev_constr_pivot.html
# 6 -a---- 2021-11-30 10:30:00 70224 form_org_chart.html
# 7 -a---- 2021-11-30 10:31:00 199907 form_org_chart2.html
# 8 -a---- 2021-11-30 10:30:00 618821 form_workload.html
# 9 -a---- 2021-11-30 10:30:00 109127 index.html
My opinion is that this is all quite a lot of work just to get around using file.info
, which is designed specifically for this task. It may be worth doing a benchmark to determine whether the file.info
approach is actually as slow as you think.