Having a dataframe like this:
df <- data.frame(id = c(1,2,3), date1 = c("2014-Dec 2018","2009-2010","Jan 2009-Aug 2010"), date2 = c("Feb 2016-Dec 2018","2014-Dec 2018","Oct 2013-Dec 2018"))
id date1 date2
1 1 2014-Dec 2018 Feb 2016-Dec 2018
2 2 2009-2010 2014-Dec 2018
3 3 Jan 2009-Aug 2010 Oct 2013-Dec 2018
Is their any command which could check in every row if their is something different than this format "Jan 2009-Aug 2010" and keep it into a new dataframe? Meaning that check if there are 17 charcters including the spaces between month and year.
Example of expected output
data.frame(id = c(1,2), date1 = c("2014-Dec 2018","2009-2010"), date2 = c("Feb 2016-Dec 2018","2014-Dec 2018"))
id date1 date2
1 1 2014-Dec 2018 Feb 2016-Dec 2018
2 2 2009-2010 2014-Dec 2018
CodePudding user response:
A safest option could be to split
your data and use grepl
to check whether the date respects the format:
pattern = "[A-Za-z]{3} \\d{4}-[A-Za-z]{3} \\d{4}"
split(df, rowSums(sapply(df[-1], grepl, pattern = pattern)) == 2)
output
$`FALSE`
id date1 date2
1 1 2014-Dec 2018 Feb 2016-Dec 2018
2 2 2009-2010 2014-Dec 2018
$`TRUE`
id date1 date2
3 3 Jan 2009-Aug 2010 Oct 2013-Dec 2018
Explanation
The pattern is not that complicated: three {3}
letters [A-Za-z]
followed by 4 digits \\d
, twice, and separated with -
.
CodePudding user response:
Check the number of characters for all dates, then check if there are less than 2 per row:
df[ rowSums(sapply(df[ -1 ], nchar) == 17, na.rm = TRUE) < 2, ]
# id date1 date2
# 1 1 2014-Dec 2018 Feb 2016-Dec 2018
# 2 2 2009-2010 2014-Dec 2018