Home > other >  How to lowercase string value in R?
How to lowercase string value in R?

Time:08-24

I have a data frame that looks as such:

                                          files
       MVNDR_10055_WESTERN FOREST PRODUCTS.PDF
     MVNDR_102182_BLACK AND DECKER US INC..PDF
      MVNDR_10363_SPRAY AND FORGET        .PDF

How do I extract the "PDF" into another column in the data frame and then lowercase it? My idea is then to concatenate afterwards which shouldn't be an issue using paste0.

Ideal result would look as as such:

                                      files     Lowercase 
   MVNDR_10055_WESTERN FOREST PRODUCTS.PDF         pdf
 MVNDR_102182_BLACK AND DECKER US INC..PDF         pdf
  MVNDR_10363_SPRAY AND FORGET        .PDF         pdf

Eventually the goal is for the table to look like this.

                                      files
   MVNDR_10055_WESTERN FOREST PRODUCTS.pdf
 MVNDR_102182_BLACK AND DECKER US INC..pdf
  MVNDR_10363_SPRAY AND FORGET        .pdf

Note the lowercase "pdf"

CodePudding user response:

R already has functions that work with file name extensions:

  • tools::file_path_sans_ext
  • tools::file_ext

So you can combine them:

df |>
    mutate(files = paste0(tools::file_path_sans_ext(files), ".", tolower(tools::file_ext(files))))

However, these functions are somewhat buggy so if you’re already using the Tidyverse, I suggest using ‘fs’ functions instead:

df |>
    mutate(files = fs::path_ext_set(files, tolower(fs::path_ext(files))))

CodePudding user response:

You could use gsubfn from gsubfn and say tolower everything after the last dot like this:

df <- data.frame(files = c("MVNDR_10055_WESTERN FOREST PRODUCTS.PDF",
                           "MVNDR_102182_BLACK AND DECKER US INC..PDF",
                           "MVNDR_10363_SPRAY AND FORGET        .PDF"))

library(gsubfn)
#> Loading required package: proto
df$files <- gsubfn(".[^.] $", tolower, df$files)
df
#>                                       files
#> 1   MVNDR_10055_WESTERN FOREST PRODUCTS.pdf
#> 2 MVNDR_102182_BLACK AND DECKER US INC..pdf
#> 3  MVNDR_10363_SPRAY AND FORGET        .pdf

Created on 2022-08-22 with reprex v2.0.2

As you can see, the extension (.pdf) is now in lowercase.

  • Related