How to extract 2 dates from a long string and put them in two different columns?-CodePudding

I have a data frame df:

df = data.frame(text = "M2O__LS___P_20160727T165346_20160808T165347_VV_ts_pan_fgdo_inte_dbrrt_IW9")

I want to have a new data frame df2 that extracts the first (2016.07.27) and second (2016.08.08) dates from the text column of df.

The desired output data frame is:

     begin      end
1 20160727 20160808

I would specifically be interested in knowing a tidyverse approach to this problem (unless the base r is much more efficient)

CodePudding user response：

1) Base R Read the text separating by underscore, take the 7th and 8th columns, add the desired names and remove the T and everything after it.

DF <- read.table(text = df$text, sep  = "_")[7:8]
names(DF) <- c("begin", "end")
DF[] <- lapply(DF, sub, pattern = "T.*", replacement = "")
DF
##      begin      end
## 1 20160727 20160808

1a) Another base solution.

pat <- "_(\\d{8})T.*_(\\d{8})T"
strcapture(pat, df$text, list(begin = character(0), end = character(0)))
##      begin      end
## 1 20160727 20160808

2) read.pattern pat is from (1a)

library(gsubfn)
read.pattern(text = df$text, pattern = pat, col.names = c("begin", "end"))
##      begin      end
## 1 20160727 20160808

3) tidyr pat is from (1a).

library(tidyr)
extract(df, text, c("begin", "end"), pat)
##      begin      end
## 1 20160727 20160808

CodePudding user response：

Another solution:

library(tidyverse)

df %>% 
  transmute(begin = str_extract(.$text,"(?<=_)\\d{8}"),
            end = str_extract(.$text,"(?<=\\d_)\\d{8}"))
#>      begin      end
#> 1 20160727 20160808