I have a data frame that looks as such:
files
MVNDR_10055_WESTERN FOREST PRODUCTS.PDF
MVNDR_102182_BLACK AND DECKER US INC..PDF
MVNDR_10363_SPRAY AND FORGET .PDF
How do I extract the "PDF" into another column in the data frame and then lowercase it? My idea is then to concatenate afterwards which shouldn't be an issue using paste0.
Ideal result would look as as such:
files Lowercase
MVNDR_10055_WESTERN FOREST PRODUCTS.PDF pdf
MVNDR_102182_BLACK AND DECKER US INC..PDF pdf
MVNDR_10363_SPRAY AND FORGET .PDF pdf
Eventually the goal is for the table to look like this.
files
MVNDR_10055_WESTERN FOREST PRODUCTS.pdf
MVNDR_102182_BLACK AND DECKER US INC..pdf
MVNDR_10363_SPRAY AND FORGET .pdf
Note the lowercase "pdf"
CodePudding user response:
R already has functions that work with file name extensions:
tools::file_path_sans_ext
tools::file_ext
So you can combine them:
df |>
mutate(files = paste0(tools::file_path_sans_ext(files), ".", tolower(tools::file_ext(files))))
However, these functions are somewhat buggy so if you’re already using the Tidyverse, I suggest using ‘fs’ functions instead:
df |>
mutate(files = fs::path_ext_set(files, tolower(fs::path_ext(files))))
CodePudding user response:
You could use gsubfn
from gsubfn
and say tolower
everything after the last dot like this:
df <- data.frame(files = c("MVNDR_10055_WESTERN FOREST PRODUCTS.PDF",
"MVNDR_102182_BLACK AND DECKER US INC..PDF",
"MVNDR_10363_SPRAY AND FORGET .PDF"))
library(gsubfn)
#> Loading required package: proto
df$files <- gsubfn(".[^.] $", tolower, df$files)
df
#> files
#> 1 MVNDR_10055_WESTERN FOREST PRODUCTS.pdf
#> 2 MVNDR_102182_BLACK AND DECKER US INC..pdf
#> 3 MVNDR_10363_SPRAY AND FORGET .pdf
Created on 2022-08-22 with reprex v2.0.2
As you can see, the extension (.pdf) is now in lowercase.