Home > Software engineering >  trying to group_by and then summarize max and min - running into error for unambiguous format
trying to group_by and then summarize max and min - running into error for unambiguous format

Time:05-18

I have addresses that have duplicated information from Kingwood and Humble addresses. I am trying to combine these entries, preserving the the minimum first reported date, and the max last reported dates, using this code:

df <- df %>% group_by(id, street) %>% 
  summarise(firstReportedDate = min(as.Date(firstReportedDate))) %>% 
  summarise(lastReportedDate = max(as.Date(lastReportedDate)))

However, for some reason, id == 1000 is giving me the error:

Error: Problem with `summarise()` column `firstReportedDate`.
i `firstReportedDate = min(as.Date(firstReportedDate))`.
x character string is not in a standard unambiguous format
i The error occurred in group 3: id = "1000", street = "Po Box 203"

Can anyone help me understand this error? Sample of data below:

dput(df)
structure(list(street = c("2200 Lake Village Dr", "1040 Marina Dr", 
"2200 Lake Village Dr", "1040 Marina Dr", "22302 Rustic Bridge Ln", 
"22302 Rustic Bridge Ln", "1060 Marina Dr", "3211 Laurel Point Ct", 
"Po Box 203", "19703 Highway 59 N", "6714 Dorylee Ln", "3511 Forest Row Dr", 
"3511 Forest Row Dr", "Acorn Ln"), city = c("Kingwood", "Humble", 
"Kingwood", "Kingwood", "Kingwood", "Humble", "Humble", "Kingwood", 
"Humble", "Humble", "Humble", "Kingwood", "Humble", "Humble"), 
    state = c("TX", "TX", "TX", "TX", "TX", "TX", "TX", "TX", 
    "TX", "TX", "TX", "TX", "TX", "TX"), zip = c("77339", "77339", 
    "77339", "77339", "77339", "77339", "77339", "77339", "77347", 
    "77338", "77396", "77345", "77345", "77345"), firstReportedDate = c("5/25/2019", 
    "1/1/2015", "9/30/2017", "11/30/2015", "10/18/2017", "6/15/2017", 
    "9/30/2009", "10/12/2002", "9/22/2017", "1/1/2009", "3/5/2004", 
    "4/8/2012", "9/30/2009", "1/1/2009"), lastReportedDate = c("4/1/2022", 
    "1/1/2021", "9/30/2017", "11/30/2015", "4/1/2022", "6/15/2018", 
    "9/30/2009", "3/3/2004", "4/1/2022", "1/1/2011", "3/5/2004", 
    "4/1/2022", "9/30/2009", "1/1/2013"), id = c("357", "357", 
    "357", "357", "359", "359", "359", "359", "1000", "1000", 
    "1000", "1431", "1431", "1431")), row.names = c(NA, -14L), class = c("tbl_df", 
"tbl", "data.frame"))

CodePudding user response:

Embed everything in the same summarise call. Also, you should specify the format of your date in the format argument of as.Date when your data is not in the international date format.

dat %>% 
  mutate(across(ends_with("Date"), as.Date, format = "%m/%d/%Y")) %>% 
  group_by(id, street) %>% 
  summarise(firstReportedDate = min(firstReportedDate),
            lastReportedDate = max(lastReportedDate))

output

# A tibble: 10 × 4
# Groups:   id [4]
   id    street                 firstReportedDate lastReportedDate
   <chr> <chr>                  <date>            <date>          
 1 1000  19703 Highway 59 N     2009-01-01        2011-01-01      
 2 1000  6714 Dorylee Ln        2004-03-05        2004-03-05      
 3 1000  Po Box 203             2017-09-22        2022-04-01      
 4 1431  3511 Forest Row Dr     2009-09-30        2022-04-01      
 5 1431  Acorn Ln               2009-01-01        2013-01-01      
 6 357   1040 Marina Dr         2015-01-01        2021-01-01      
 7 357   2200 Lake Village Dr   2017-09-30        2022-04-01      
 8 359   1060 Marina Dr         2009-09-30        2009-09-30      
 9 359   22302 Rustic Bridge Ln 2017-06-15        2022-04-01      
10 359   3211 Laurel Point Ct   2002-10-12        2004-03-03      
  • Related