Home > Blockchain >  Group df by two columns then count repeated values
Group df by two columns then count repeated values

Time:10-18

I have a df that houses both addresses and zipcodes. For each zipcode there are multiple addresses, and occasionally, there are repeat addresses within each zipcode (Apartment building and high risers). I am interested in finding the number of unique Addresses within each zipcode. I thought about using group_by from dplyr but I cannot get the unique counts.

Data structure is:

df = data.frame(zipcode= c(94613,94613,94613,94615,94615, 94615, 94615, 94615, 95901, 95901, 95901, 95901), Address= c('5000 MACARTHUR BLVD','5000 MACARTHUR BLVD','4000 MACARTHUR BLVD','7000 MACARTHUR BLVD','7000 MACARTHUR BLVD','7000 MACARTHUR BLVD', '9000 MACARTHUR BLVD','8000 MACARTHUR BLVD', '1675 9TH ST', '1675 9TH ST','1675 7TH ST','1670 7TH ST'), buildingName = c('THis', 'THat', 'The', 'Other','Blah','Blah','THat','THat','THat','THat','THat','THat'))

This is the start of my script:

dff = df %>% group_by(zipcode) %>% group_by(Address) %>% summarise(HELP)

CodePudding user response:

We can count() zipcode and Address to get the number of each combination. Is this your desired output?

library(dplyr)

df %>% 
  count(zipcode, Address)
#>   zipcode             Address n
#> 1   94613 4000 MACARTHUR BLVD 1
#> 2   94613 5000 MACARTHUR BLVD 2
#> 3   94615 7000 MACARTHUR BLVD 3
#> 4   94615 8000 MACARTHUR BLVD 1
#> 5   94615 9000 MACARTHUR BLVD 1
#> 6   95901         1670 7TH ST 1
#> 7   95901         1675 7TH ST 1
#> 8   95901         1675 9TH ST 2

Created on 2022-10-17 with reprex v2.0.2

  • Related