Running powershell command from R: Unexpected token in expression or statement-CodePudding

I tried the following command (which works in powershell window)

system('powershell -command \"Get-ChildItem -Filter "*.html" | Where-Object { $_.LastWriteTime -ge "11/12/2021 09:10:00" }\"')

However, from the R console I get the error:

At line:1 char:79
  ... er *.html | Where-Object { $_.LastWriteTime -ge 11/12/2021 09:10:00 }
                                                                 ~~~~~~~~
Unexpected token '09:10:00' in expression or statement.
      CategoryInfo          : ParserError: (:) [], ParentContainsErrorRecordException
      FullyQualifiedErrorId : UnexpectedToken
 
[1] 1

EDIT
Using Mikael Jagan's idea I managed to get a result:

system(paste('powershell -command ',shQuote('Get-ChildItem -Filter "*.html" | Where-Object { $_.LastWriteTime -ge "11/30/2021 09:10:00" }')))

However, as you see below, the result is a character vector rather a dataframe. Is there a way to get only a vector with the filenames?

 [1] ""
 [2] ""
 [3] "    Directory: C:\\Users\\user\\Documents\\R_Data\\texk"
 [4] ""
 [5] ""
 [6] "Mode                 LastWriteTime         Length Name                                                                 "
 [7] "----                 -------------         ------ ----                                                                 "
 [8] "-a----        30/11/2021     10:31         386751 auth_cash_flow.html                                                  "
 [9] "-a----        30/11/2021     10:31         189370 auth_cash_flow_total.html                                            "
[10] "-a----        30/11/2021     10:31         552947 auth_symv_gantt.html                                                 "
[11] "-a----        30/11/2021     10:31          93238 auth_tender_schedule.html                                            "
[12] "-a----        30/11/2021     10:30         683088 dev_constr_pivot.html                                                "
[13] "-a----        30/11/2021     10:30          70224 form_org_chart.html                                                  "
[14] "-a----        30/11/2021     10:31         199907 form_org_chart2.html                                                 "
[15] "-a----        30/11/2021     10:30         618821 form_workload.html                                                   "
[16] "-a----        30/11/2021     10:30         109127 index.html                                                           "
[17] ""
[18] ""

This is my session info:

> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=Greek_Greece.1253  LC_CTYPE=Greek_Greece.1253    LC_MONETARY=Greek_Greece.1253
[4] LC_NUMERIC=C                  LC_TIME=Greek_Greece.1253    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.0.5 tools_4.0.5

CodePudding user response：

x <- system(paste('powershell -command ',shQuote('Get-ChildItem -Filter "*.html" | Where-Object { $_.LastWriteTime -ge "11/30/2021 09:10:00" } | Select -exp Name')), intern = TRUE)

CodePudding user response：

Alternatively, this can done using R's dedicated function file.info(), it is about 6x times faster than powershell answer provided by GeorgeDontas.

Benchmarking is done on a folder with 1228 files, of which 553 match on "*.txt", then filter on date, gives 5 files:

microbenchmark::microbenchmark(
  R = {
    files <- file.info(list.files(pattern = "*.txt"))
    rownames(files[ files$mtime > as.POSIXct("01/01/2021 09:10:00", format = "%m/%d/%Y %H:%M:%S"), ])
  },
  PS = {
    system(paste('powershell -command ',
                 shQuote('Get-ChildItem -Filter "*.txt" | Where-Object { $_.LastWriteTime -ge "01/01/2021 09:10:00" } | Select -exp Name')), 
           intern = TRUE)
  })

# Unit: milliseconds
#  expr      min       lq     mean   median        uq       max neval
#     R 147.7614 161.6208 173.5587 168.8470  181.1235  261.7563   100
#    PS 801.9531 959.1979 997.8903 999.4841 1024.4515 1259.1713   100

# ~6x
# 997/168 = 5.934524

Note: I understand OP might have valid reasons to use powershell approach, posting this answer as a benchmark.

CodePudding user response：

How about this? I am assuming from your sample output (a) that file names only occur at the end of a line of shell output followed by zero or more spaces and (b) that file names contain only alphanumeric characters and underscores prior to the .html suffix. If that is not the case, then the regular expressions in the second and third lines would need to be generalized slightly.

## Get shell output as character vector
x <- system(paste('powershell -command', shQuote('Get-ChildItem -Filter "*.html" | Where-Object { $_.LastWriteTime -ge "11/30/2021 09:10:00" } | Select -exp Name')), intern = TRUE)

## Find lines ending in ".html" followed by zero or more spaces
i <- grep("\\.html\\s*$", x)

## Extract file names from those lines
fn <- sub("^.* (\\w*\\.html)\\s*$", "\\1", x[i])

Here is a test with the lines from your shell output:

x <- c(
  "",
  "",
  "    Directory: C:\\Users\\user\\Documents\\R_Data\\texk",
  "",
  "",
  "Mode                 LastWriteTime         Length Name                                                                 ",
  "----                 -------------         ------ ----                                                                 ",
  "-a----        30/11/2021     10:31         386751 auth_cash_flow.html                                                  ",
  "-a----        30/11/2021     10:31         189370 auth_cash_flow_total.html                                            ",
  "-a----        30/11/2021     10:31         552947 auth_symv_gantt.html                                                 ",
  "-a----        30/11/2021     10:31          93238 auth_tender_schedule.html                                            ",
  "-a----        30/11/2021     10:30         683088 dev_constr_pivot.html                                                ",
  "-a----        30/11/2021     10:30          70224 form_org_chart.html                                                  ",
  "-a----        30/11/2021     10:31         199907 form_org_chart2.html                                                 ",
  "-a----        30/11/2021     10:30         618821 form_workload.html                                                   ",
  "-a----        30/11/2021     10:30         109127 index.html                                                           ",
  "",
  ""
)

i <- grep("\\.html\\s*$", x)
i
# [1]  8  9 10 11 12 13 14 15 16

fn <- sub("^.* (\\w*\\.html)\\s*$", "\\1", x[i])
fn
# [1] "auth_cash_flow.html"       "auth_cash_flow_total.html"
# [3] "auth_symv_gantt.html"      "auth_tender_schedule.html"
# [5] "dev_constr_pivot.html"     "form_org_chart.html"      
# [7] "form_org_chart2.html"      "form_workload.html"       
# [9] "index.html"

And in case you do want a data frame in the end:

## Read lines of shell output containing file names into table
tc <- textConnection(x[i])
dd <- read.table(tc)
names(dd) <- c("mode", "date", "time", "length", "name")
close(tc)

## Coerce character date and time to useful date-time class
dd$time <- as.POSIXct(paste(dd$date, dd$time), format = "%d/%m/%Y %H:%M")
dd$date <- NULL
dd
#     mode                time length                      name
# 1 -a---- 2021-11-30 10:31:00 386751       auth_cash_flow.html
# 2 -a---- 2021-11-30 10:31:00 189370 auth_cash_flow_total.html
# 3 -a---- 2021-11-30 10:31:00 552947      auth_symv_gantt.html
# 4 -a---- 2021-11-30 10:31:00  93238 auth_tender_schedule.html
# 5 -a---- 2021-11-30 10:30:00 683088     dev_constr_pivot.html
# 6 -a---- 2021-11-30 10:30:00  70224       form_org_chart.html
# 7 -a---- 2021-11-30 10:31:00 199907      form_org_chart2.html
# 8 -a---- 2021-11-30 10:30:00 618821        form_workload.html
# 9 -a---- 2021-11-30 10:30:00 109127                index.html

My opinion is that this is all quite a lot of work just to get around using file.info, which is designed specifically for this task. It may be worth doing a benchmark to determine whether the file.info approach is actually as slow as you think.