Is there any command to count the length of rows?-CodePudding

Having a dataframe like this:

df <- data.frame(id = c(1,2,3), date1 = c("2014-Dec 2018","2009-2010","Jan 2009-Aug 2010"), date2 = c("Feb 2016-Dec 2018","2014-Dec 2018","Oct 2013-Dec 2018"))

  id             date1             date2
1  1     2014-Dec 2018 Feb 2016-Dec 2018
2  2         2009-2010     2014-Dec 2018
3  3 Jan 2009-Aug 2010 Oct 2013-Dec 2018

Is their any command which could check in every row if their is something different than this format "Jan 2009-Aug 2010" and keep it into a new dataframe? Meaning that check if there are 17 charcters including the spaces between month and year.

Example of expected output

data.frame(id = c(1,2), date1 = c("2014-Dec 2018","2009-2010"), date2 = c("Feb 2016-Dec 2018","2014-Dec 2018"))
  id         date1             date2
1  1 2014-Dec 2018 Feb 2016-Dec 2018
2  2     2009-2010     2014-Dec 2018

CodePudding user response：

A safest option could be to split your data and use grepl to check whether the date respects the format:

pattern = "[A-Za-z]{3} \\d{4}-[A-Za-z]{3} \\d{4}"
split(df, rowSums(sapply(df[-1], grepl, pattern = pattern)) == 2)

output

$`FALSE`
  id         date1             date2
1  1 2014-Dec 2018 Feb 2016-Dec 2018
2  2     2009-2010     2014-Dec 2018

$`TRUE`
  id             date1             date2
3  3 Jan 2009-Aug 2010 Oct 2013-Dec 2018

Explanation

The pattern is not that complicated: three {3} letters [A-Za-z] followed by 4 digits \\d, twice, and separated with -.

CodePudding user response：

Check the number of characters for all dates, then check if there are less than 2 per row:

df[ rowSums(sapply(df[ -1 ], nchar) == 17, na.rm = TRUE) < 2, ]
#   id         date1             date2
# 1  1 2014-Dec 2018 Feb 2016-Dec 2018
# 2  2     2009-2010     2014-Dec 2018