I have addresses that have duplicated information from Kingwood and Humble addresses. I am trying to combine these entries, preserving the the minimum first reported date, and the max last reported dates, using this code:
df <- df %>% group_by(id, street) %>%
summarise(firstReportedDate = min(as.Date(firstReportedDate))) %>%
summarise(lastReportedDate = max(as.Date(lastReportedDate)))
However, for some reason, id == 1000 is giving me the error:
Error: Problem with `summarise()` column `firstReportedDate`.
i `firstReportedDate = min(as.Date(firstReportedDate))`.
x character string is not in a standard unambiguous format
i The error occurred in group 3: id = "1000", street = "Po Box 203"
Can anyone help me understand this error? Sample of data below:
dput(df)
structure(list(street = c("2200 Lake Village Dr", "1040 Marina Dr",
"2200 Lake Village Dr", "1040 Marina Dr", "22302 Rustic Bridge Ln",
"22302 Rustic Bridge Ln", "1060 Marina Dr", "3211 Laurel Point Ct",
"Po Box 203", "19703 Highway 59 N", "6714 Dorylee Ln", "3511 Forest Row Dr",
"3511 Forest Row Dr", "Acorn Ln"), city = c("Kingwood", "Humble",
"Kingwood", "Kingwood", "Kingwood", "Humble", "Humble", "Kingwood",
"Humble", "Humble", "Humble", "Kingwood", "Humble", "Humble"),
state = c("TX", "TX", "TX", "TX", "TX", "TX", "TX", "TX",
"TX", "TX", "TX", "TX", "TX", "TX"), zip = c("77339", "77339",
"77339", "77339", "77339", "77339", "77339", "77339", "77347",
"77338", "77396", "77345", "77345", "77345"), firstReportedDate = c("5/25/2019",
"1/1/2015", "9/30/2017", "11/30/2015", "10/18/2017", "6/15/2017",
"9/30/2009", "10/12/2002", "9/22/2017", "1/1/2009", "3/5/2004",
"4/8/2012", "9/30/2009", "1/1/2009"), lastReportedDate = c("4/1/2022",
"1/1/2021", "9/30/2017", "11/30/2015", "4/1/2022", "6/15/2018",
"9/30/2009", "3/3/2004", "4/1/2022", "1/1/2011", "3/5/2004",
"4/1/2022", "9/30/2009", "1/1/2013"), id = c("357", "357",
"357", "357", "359", "359", "359", "359", "1000", "1000",
"1000", "1431", "1431", "1431")), row.names = c(NA, -14L), class = c("tbl_df",
"tbl", "data.frame"))
CodePudding user response:
Embed everything in the same summarise call. Also, you should specify the format of your date in the format
argument of as.Date
when your data is not in the international date format.
dat %>%
mutate(across(ends_with("Date"), as.Date, format = "%m/%d/%Y")) %>%
group_by(id, street) %>%
summarise(firstReportedDate = min(firstReportedDate),
lastReportedDate = max(lastReportedDate))
output
# A tibble: 10 × 4
# Groups: id [4]
id street firstReportedDate lastReportedDate
<chr> <chr> <date> <date>
1 1000 19703 Highway 59 N 2009-01-01 2011-01-01
2 1000 6714 Dorylee Ln 2004-03-05 2004-03-05
3 1000 Po Box 203 2017-09-22 2022-04-01
4 1431 3511 Forest Row Dr 2009-09-30 2022-04-01
5 1431 Acorn Ln 2009-01-01 2013-01-01
6 357 1040 Marina Dr 2015-01-01 2021-01-01
7 357 2200 Lake Village Dr 2017-09-30 2022-04-01
8 359 1060 Marina Dr 2009-09-30 2009-09-30
9 359 22302 Rustic Bridge Ln 2017-06-15 2022-04-01
10 359 3211 Laurel Point Ct 2002-10-12 2004-03-03