I have a data frame df
:
df = data.frame(text = "M2O__LS___P_20160727T165346_20160808T165347_VV_ts_pan_fgdo_inte_dbrrt_IW9")
I want to have a new data frame df2
that extracts the first (2016.07.27) and second (2016.08.08) dates from the text
column of df
.
The desired output data frame is:
begin end
1 20160727 20160808
I would specifically be interested in knowing a tidyverse
approach to this problem (unless the base r
is much more efficient)
CodePudding user response:
1) Base R Read the text separating by underscore, take the 7th and 8th columns, add the desired names and remove the T and everything after it.
DF <- read.table(text = df$text, sep = "_")[7:8]
names(DF) <- c("begin", "end")
DF[] <- lapply(DF, sub, pattern = "T.*", replacement = "")
DF
## begin end
## 1 20160727 20160808
1a) Another base solution.
pat <- "_(\\d{8})T.*_(\\d{8})T"
strcapture(pat, df$text, list(begin = character(0), end = character(0)))
## begin end
## 1 20160727 20160808
2) read.pattern pat is from (1a)
library(gsubfn)
read.pattern(text = df$text, pattern = pat, col.names = c("begin", "end"))
## begin end
## 1 20160727 20160808
3) tidyr pat is from (1a).
library(tidyr)
extract(df, text, c("begin", "end"), pat)
## begin end
## 1 20160727 20160808
CodePudding user response:
Another solution:
library(tidyverse)
df %>%
transmute(begin = str_extract(.$text,"(?<=_)\\d{8}"),
end = str_extract(.$text,"(?<=\\d_)\\d{8}"))
#> begin end
#> 1 20160727 20160808