Home > Software engineering >  group_by unique counts across multiple columns
group_by unique counts across multiple columns

Time:11-24

I have a table with the information about shops and fruits (apples and oranges). Each shop has an inventory where these fruits are recorded by their IDs.

  • Apples and oranges can be of different types e.g., pink lady, royal gala, apple eve, apple jazz etc. When they are of different types, they are stored with different IDs.
  • But when two apples are of the same type, they will have the same ID.

I am interested in counting, for each shop, how many different types of apples and how many different types of oranges are there.

My input is:

Shop     Apple_id   Orange_id
Coles    12         201
Woolies  20         51
Walmart  13         16
Woolies  20         52
Coles    14         202
Target   19         81
M&S      75         99
Coles    16         203
M&S      71         99
Dunnes   56         101
M&S      72         91

My expected output is:

Shop    Apples  Oranges
Coles   3       3
Dunnes  1       1
M&S     3       2
Target  1       1
Walmart 1       1
Woolies 1       2

I can run the code one by one for each fruit using the dplyr() package:

# Extract information on Apples only
library(dplyr)

apples_by_shop = raw %>%
  group_by(shop) %>%
  distinct(Apple_id) %>%
  count()

Similarly, I can write code to extract information for oranges only:

# Extract information on Oranges only
oranges_by_shop = raw %>%
  group_by(shop) %>%
  distinct(Orange_id) %>%
  count()

My question is, can I merge the above two in a single line of code e.g., through the summarise function?

CodePudding user response:

You may try

library(dplyr)

raw %>%
  group_by(Shop) %>%
  summarise(Apples = length(unique(Apple_id)),
            Oranges = length(unique(Orange_id)))

  Shop    Apples Oranges
  <chr>    <int>   <int>
1 Coles        3       3
2 Dunnes       1       1
3 M&S          3       2
4 Target       1       1
5 Walmart      1       1
6 Woolies      1       2

df %>%
  group_by(Shop) %>%
  summarise(Apples = n_distinct(Apple_id),
            Oranges = n_distinct(Orange_id))

CodePudding user response:

You can do that in one line in base R.

aggregate(. ~ Shop, dat, function(x) length(unique(x)))
#      Shop Apple_id Orange_id
# 1   Coles        3         3
# 2  Dunnes        1         1
# 3     M&S        3         2
# 4  Target        1         1
# 5 Walmart        1         1
# 6 Woolies        1         2

Data:

dat <- structure(list(Shop = c("Coles", "Woolies", "Walmart", "Woolies", 
"Coles", "Target", "M&S", "Coles", "M&S", "Dunnes", "M&S"), Apple_id = c(12L, 
20L, 13L, 20L, 14L, 19L, 75L, 16L, 71L, 56L, 72L), Orange_id = c(201L, 
51L, 16L, 52L, 202L, 81L, 99L, 203L, 99L, 101L, 91L)), class = "data.frame", row.names = c(NA, 
-11L))
  • Related