Home > Back-end >  Regular Expression: Extract number out of long string and place in a new column in R?
Regular Expression: Extract number out of long string and place in a new column in R?

Time:07-28

I have this data frame in R and I would like to extract the number after the "M" and before the ) so in short I would like the number 60102148 in the first example. The data is in a data frame.

Current Output

file_name 
(P102180.R2858.M60102148)SupplierPerformanceDashboard.PDF
(P10424.R2858.M60010424)SupplierPerformanceDashboard.PDF
(P14479.R2858.M60004820)SupplierPerformanceDashboard.PDF
(P14479.R2858.M60031167)SupplierPerformanceDashboard.PDF
(P14479.R2858.M60032342)SupplierPerformanceDashboard.PDF

Desired output with a new column named MNVDR_NBR

 file_name                                                            MVNDR_NBR
(P102180.R2858.M60102148)SupplierPerformanceDashboard.PDF               60102148
(P10424.R2858.M60010424)SupplierPerformanceDashboard.PDF                60010424
(P14479.R2858.M60004820)SupplierPerformanceDashboard.PDF                60004820
(P14479.R2858.M60031167)SupplierPerformanceDashboard.PDF                60031167
(P14479.R2858.M60032342)SupplierPerformanceDashboard.PDF                60032342

CodePudding user response:

A possible solution, based on stringr::str_extract and lookaround.

EXPLANATION

See explanation.

library(tidyverse)

df %>% 
  mutate(MVNDR_NBR = str_extract(file_name, "(?<=M)\\d (?=\\))"))

#>                                                   file_name MVNDR_NBR
#> 1 (P102180.R2858.M60102148)SupplierPerformanceDashboard.PDF  60102148
#> 2  (P10424.R2858.M60010424)SupplierPerformanceDashboard.PDF  60010424
#> 3  (P14479.R2858.M60004820)SupplierPerformanceDashboard.PDF  60004820
#> 4  (P14479.R2858.M60031167)SupplierPerformanceDashboard.PDF  60031167
#> 5  (P14479.R2858.M60032342)SupplierPerformanceDashboard.PDF  60032342
  • Related