I have a table with the information about shops and fruits (apples and oranges). Each shop has an inventory where these fruits are recorded by their IDs.
- Apples and oranges can be of different types e.g., pink lady, royal gala, apple eve, apple jazz etc. When they are of different types, they are stored with different IDs.
- But when two apples are of the same type, they will have the same ID.
I am interested in counting, for each shop, how many different types of apples and how many different types of oranges are there.
My input is:
Shop Apple_id Orange_id
Coles 12 201
Woolies 20 51
Walmart 13 16
Woolies 20 52
Coles 14 202
Target 19 81
M&S 75 99
Coles 16 203
M&S 71 99
Dunnes 56 101
M&S 72 91
My expected output is:
Shop Apples Oranges
Coles 3 3
Dunnes 1 1
M&S 3 2
Target 1 1
Walmart 1 1
Woolies 1 2
I can run the code one by one for each fruit using the dplyr()
package:
# Extract information on Apples only
library(dplyr)
apples_by_shop = raw %>%
group_by(shop) %>%
distinct(Apple_id) %>%
count()
Similarly, I can write code to extract information for oranges only:
# Extract information on Oranges only
oranges_by_shop = raw %>%
group_by(shop) %>%
distinct(Orange_id) %>%
count()
My question is, can I merge the above two in a single line of code e.g., through the summarise function?
CodePudding user response:
You may try
library(dplyr)
raw %>%
group_by(Shop) %>%
summarise(Apples = length(unique(Apple_id)),
Oranges = length(unique(Orange_id)))
Shop Apples Oranges
<chr> <int> <int>
1 Coles 3 3
2 Dunnes 1 1
3 M&S 3 2
4 Target 1 1
5 Walmart 1 1
6 Woolies 1 2
df %>%
group_by(Shop) %>%
summarise(Apples = n_distinct(Apple_id),
Oranges = n_distinct(Orange_id))
CodePudding user response:
You can do that in one line in base R.
aggregate(. ~ Shop, dat, function(x) length(unique(x)))
# Shop Apple_id Orange_id
# 1 Coles 3 3
# 2 Dunnes 1 1
# 3 M&S 3 2
# 4 Target 1 1
# 5 Walmart 1 1
# 6 Woolies 1 2
Data:
dat <- structure(list(Shop = c("Coles", "Woolies", "Walmart", "Woolies",
"Coles", "Target", "M&S", "Coles", "M&S", "Dunnes", "M&S"), Apple_id = c(12L,
20L, 13L, 20L, 14L, 19L, 75L, 16L, 71L, 56L, 72L), Orange_id = c(201L,
51L, 16L, 52L, 202L, 81L, 99L, 203L, 99L, 101L, 91L)), class = "data.frame", row.names = c(NA,
-11L))