I have this data frame in R and I would like to extract the number after the "M"
and before the )
so in short I would like the number 60102148
in the first example. The data is in a data frame.
Current Output
file_name
(P102180.R2858.M60102148)SupplierPerformanceDashboard.PDF
(P10424.R2858.M60010424)SupplierPerformanceDashboard.PDF
(P14479.R2858.M60004820)SupplierPerformanceDashboard.PDF
(P14479.R2858.M60031167)SupplierPerformanceDashboard.PDF
(P14479.R2858.M60032342)SupplierPerformanceDashboard.PDF
Desired output with a new column named MNVDR_NBR
file_name MVNDR_NBR
(P102180.R2858.M60102148)SupplierPerformanceDashboard.PDF 60102148
(P10424.R2858.M60010424)SupplierPerformanceDashboard.PDF 60010424
(P14479.R2858.M60004820)SupplierPerformanceDashboard.PDF 60004820
(P14479.R2858.M60031167)SupplierPerformanceDashboard.PDF 60031167
(P14479.R2858.M60032342)SupplierPerformanceDashboard.PDF 60032342
CodePudding user response:
A possible solution, based on stringr::str_extract
and lookaround.
EXPLANATION
library(tidyverse)
df %>%
mutate(MVNDR_NBR = str_extract(file_name, "(?<=M)\\d (?=\\))"))
#> file_name MVNDR_NBR
#> 1 (P102180.R2858.M60102148)SupplierPerformanceDashboard.PDF 60102148
#> 2 (P10424.R2858.M60010424)SupplierPerformanceDashboard.PDF 60010424
#> 3 (P14479.R2858.M60004820)SupplierPerformanceDashboard.PDF 60004820
#> 4 (P14479.R2858.M60031167)SupplierPerformanceDashboard.PDF 60031167
#> 5 (P14479.R2858.M60032342)SupplierPerformanceDashboard.PDF 60032342