Home > Software design >  Average multiple columns by FIPS
Average multiple columns by FIPS

Time:11-08

Based on the data below how can I average the columns InIndividuals, OutIndividuals, InAGI and OutAGI by FIPS? I am using FIPS because there are several counties with the same name. So the column Key is no longer needed as it's a concatenate of FIPS and Year.

Desired output schema/columns:

County State FIPS AvgInIndividuals AvgOutIndividuals AvgInAGI AvgOutAGI 

Sample data:

df = structure(list(FIPS = c(12001L, 8001L, 16001L, 12001L, 
    8001L, 16001L), State = c("FL", "CO", "ID", "FL", 
    "CO", "ID"), County = c("Alachua County", "Adams County", 
    "Ada County", "Alachua County", "Adams County", "Ada County"), 
        InIndividuals = c(433L, 30L, 16L, 381L, 42L, 21L), OutIndividuals = c(426L, 33L, 12L, 382L, 47L, 25L), InAGI = c(111L, 222L, 333L, 444L, 555L, 666L), NetMigration = c(7L, -3L, 4L, -1L, -5L, -4L), OutAGI = c(570L, 246L, 135L, 123L, 456L, 789L), FiscalYear = c("2011 - 2012", 
        "2011 - 2012", "2011 - 2012", "2011 - 2012", "2010 - 2011", 
        "2010 - 2011"), Year = c(2012L, 2012L, 2012L, 2011L, 2011L, 
        2011L), Key = c(120012012L, 80012012L, 160012012L, 120012011L, 
        80012011L, 160012011L)), class = "data.frame", row.names = c(NA, 
    -6L))

CodePudding user response:

Try

library(dplyr)
df1 %>% 
   group_by(Key, County = Origin_Place, 
      FIPS = Origin_FIPS, State = Origin_StateName) %>% 
   summarise(across(matches('(Individuals|AGI)$'), mean, 
    na.rm = TRUE, .names = "Avg{.col}"), .groups = "drop")

-output

# A tibble: 6 × 8
        Key County          FIPS State AvgInIndividuals AvgOutIndividuals AvgInAGI AvgOutAGI
      <int> <chr>          <int> <chr>            <dbl>             <dbl>    <dbl>     <dbl>
1  80012011 Adams County    8001 CO                  42                47      555       456
2  80012012 Adams County    8001 CO                  30                33      222       246
3 120012011 Alachua County 12001 FL                 381               382      444       123
4 120012012 Alachua County 12001 FL                 433               426      111       570
5 160012011 Ada County     16001 ID                  21                25      666       789
6 160012012 Ada County     16001 ID                  16                12      333       135

CodePudding user response:

aggregate(cbind(InIndividuals, OutIndividuals, InAGI, OutAGI) ~ Key   Origin_Place   Origin_StateName   Origin_FIPS, 
          data = df, 
          mean) |> 
  `colnames<-`(c('Key', 'County', 'State', 'FIPS', 'AvgOutIndividuals', 'AvgInIndividuals', 'AvgInAGI', 'AvgOutAGI'))
  • Related