Here is a sample of the dataset that I have. I am looking to find the state that has the maximum number of stores. In this case, CA and also see how many IDs come from that state
| ID | | State | | Stores|
| -- | |------ | | ----- |
|a11 | | CA | | 16585 |
|a12 | | CA | | 45552 |
|a13 | | AK | | 7811 |
|a14 | | MA | | 4221 |
I have this code using dplyr
max_state <- df %>%
group_by(State) %>%
summarise(total_stores = sum(Stores)) %>%
top_n(1) %>%
select(State)
This gives me "CA"
Can I use this variable "max(state)" to pass through a filter and use summarise(n()) to count the number of Ids for CA?
CodePudding user response:
Here is a tidyverse
approach.
If your Stores
column is numeric, we can arrange this column in reverse order, so that the maximum would be the first row. Then filter State
that match the first row, and do the summarise
afterwards.
library(tidyverse)
df %>% arrange(desc(Stores)) %>%
filter(State == first(State)) %>%
group_by(State) %>%
summarize(Count_ID = n())
Output
# A tibble: 1 x 2
State Count_ID
<chr> <int>
1 CA 2
Your dataset for reference
# A tibble: 4 x 3
ID State Stores
<chr> <chr> <dbl>
1 a11 CA 16585
2 a12 CA 45552
3 a13 AK 7811
4 a14 MA 4221
CodePudding user response:
A few ways:
# this takes your max_state (CA) and brings in the parts of
# your original table that have the same State
max_state %>%
left_join(df) %>%
summarize(n = n())
# filter the State in df to match the State in max_state
df %>%
filter(State == max_state$State) %>%
summarize(n = n())
# Add Stores_total for each State, only keep the State rows which
# match that of the max State, and count the # of IDs therein
df %>%
group_by(State) %>%
mutate(Stores_total = sum(Stores)) %>%
filter(Stores_total == max(Stores_total)) %>%
count(ID)
CodePudding user response:
You can combine more operations into one summarize
call that will be applied to the same group:
df |>
group_by(State) |>
summarize(gsum = sum(Stores), nids = n()) |>
filter(gsum == max(gsum))
##> # A tibble: 1 × 3
##> State gsum nids
##> <chr> <dbl> <int>
##>1 CA 62137 2
Where the dataset df
is obtained by:
df <- data.frame(ID = c("a11", "a12","a13", "a14"),
State = c("CA", "CA", "AK", "MA"),
Stores = c(16585, 45552, 7811, 4221))