Home > database >  How to get the number of unique values in a column considering another column values with R?
How to get the number of unique values in a column considering another column values with R?

Time:09-06

I have this data frame:

df <- data.frame(id = c(918, 919, 920, 921, 922),
            city = c("a", "c", "b", "c", "a"),
            mosquitoes = c(9, 13, 8, 25, 10))

What I want to do is to get the number of unique ID values for each city and then create a new dataframe that should looks like:

newdf <- data.frame(city = c("a", "b", "c"),
                id = c(2,1,2),
                mosquitoes = c(19, 8, 38))

I know how to do half of that using

newdf <- aggregate(mosquitoes ~ city, data = df, sum)

But no matter how I try, I can't get the range for unique values of ID according to the cities that I have. I've been trying

newdf$id <- aggregate(length(id) ~ city, data = df, sum)

And I also tried a loop (because my original data has way more than 3 cities), but only got disaster and can't make it work at all:

 x <- unique(df$city)
unique_ID <- 
  for (x in df$city) {
    city = unique(df$city)
    mosquitoes = ?
    ID = ?
  }

This topic was the most similar to mine I could found, but apparently it only works with numeric values. At least I couldn't make it work with my character columns.

Can someone please help me?

CodePudding user response:

You could do:

library(tidyverse)

df <- data.frame(id = c(918, 919, 920, 921, 922),
                 city = c("a", "c", "b", "c", "a"),
                 mosquitoes = c(9, 13, 8, 25, 10))

df %>%
  group_by(city) %>%
  summarise(id = n(), mosquitoes = sum(mosquitoes))
#> # A tibble: 3 x 3
#>   city     id mosquitoes
#>   <chr> <int>      <dbl>
#> 1 a         2         19
#> 2 b         1          8
#> 3 c         2         38

Created on 2022-09-05 with reprex v2.0.2

  •  Tags:  
  • r
  • Related