Home > Enterprise >  Pass a variable into a filter - R dplyr
Pass a variable into a filter - R dplyr

Time:02-07

Here is a sample of the dataset that I have. I am looking to find the state that has the maximum number of stores. In this case, CA and also see how many IDs come from that state

| ID | | State | | Stores| 
| -- | |------ | | ----- | 
|a11 | | CA    | | 16585 | 
|a12 | | CA    | | 45552 | 
|a13 | | AK    | | 7811  |
|a14 | | MA    | | 4221  |

I have this code using dplyr

max_state <- df  %>%
    group_by(State)  %>%
    summarise(total_stores = sum(Stores))  %>%
    top_n(1)  %>%
    select(State)

This gives me "CA"

Can I use this variable "max(state)" to pass through a filter and use summarise(n()) to count the number of Ids for CA?

CodePudding user response:

Here is a tidyverse approach.

If your Stores column is numeric, we can arrange this column in reverse order, so that the maximum would be the first row. Then filter State that match the first row, and do the summarise afterwards.

library(tidyverse)

df %>% arrange(desc(Stores)) %>% 
  filter(State == first(State)) %>% 
  group_by(State) %>% 
  summarize(Count_ID = n())

Output

# A tibble: 1 x 2
  State Count_ID
  <chr>    <int>
1 CA           2

Your dataset for reference

# A tibble: 4 x 3
  ID    State Stores
  <chr> <chr>  <dbl>
1 a11   CA     16585
2 a12   CA     45552
3 a13   AK      7811
4 a14   MA      4221

CodePudding user response:

A few ways:

# this takes your max_state (CA) and brings in the parts of 
# your original table that have the same State
max_state %>% 
  left_join(df) %>%
  summarize(n = n())

# filter the State in df to match the State in max_state
df %>%
  filter(State == max_state$State) %>%
  summarize(n = n())


# Add Stores_total for each State, only keep the State rows which
# match that of the max State, and count the # of IDs therein
df %>%
  group_by(State) %>%
  mutate(Stores_total = sum(Stores)) %>%
  filter(Stores_total == max(Stores_total)) %>% 
  count(ID)
  

CodePudding user response:

You can combine more operations into one summarize call that will be applied to the same group:

df |>
  group_by(State) |>
  summarize(gsum = sum(Stores), nids = n()) |>
  filter(gsum == max(gsum))

##>  # A tibble: 1 × 3
##>  State  gsum  nids
##>  <chr> <dbl> <int>
##>1 CA    62137     2

Where the dataset df is obtained by:

df <- data.frame(ID = c("a11", "a12","a13", "a14"),
                 State = c("CA", "CA", "AK", "MA"),
                 Stores = c(16585, 45552, 7811, 4221))
  •  Tags:  
  • Related