Home > Mobile >  How do I use summarise properly in R for this simple analysis?
How do I use summarise properly in R for this simple analysis?

Time:07-06

Haven't used RStudio in a while, so I am quite rusty.

I want to create a bar chart showing the countries shipping the most freight weight in ascending order.

I have made this simple script that does the job:

df_new %>% 
filter(!is.na(Freight_weight)) %>% 
filter(!is.na(origin_name)) %>% 
select(origin_name, Freight_weight) %>% 
  ggplot(aes(x = reorder(origin_name, Freight_weight, FUN = sum), y = Freight_weight))  
  geom_col()  
  labs(x = "")  
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

However, when I try to do more with it, like adding a top_10 clause to only get the countries with the highest shipments, it doesn't work since it takes the 10 highest individual shipments and not per country.

Instead, I have tried something like this:

df_new %>% 
  group_by(origin_name) %>% 
  summarise(n = sum(Freight_weight, na.rm = TRUE)) %>% 
  ungroup() %>% 
  mutate(share = n /sum(n) %>% factor() %>% fct_reorder(share)) %>% 
  ggplot(aes(x = origin_name, y = n))  
  geom_col()  
  labs(x = "")  
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

But here, I can't get the share function to work. What am I doing wrong?

Greatly appreciate your input - if I get this down I should be able to do most of the concurrent analyses!

CodePudding user response:

If you want to find the top 10 countries ordered by their corresponding highest Freight_weight, one possible solution is,

(Note that, I have created more countries, (denoted by Alphabets) and more data)

Hope this helps.

library(dplyr)

set.seed(123)
df_new <- structure(
  list(
    Freight_weight = runif(200, min = 1, max = 50),
    origin_name = sample(LETTERS[1:15], size = 200, replace = TRUE)
  ),
  row.names = c(NA,-200L),
  class = c("tbl_df", "tbl",
            "data.frame")
)


df_new %>% 
  group_by(origin_name) %>% 
  slice_max(order_by = Freight_weight, n = 1) %>%
  ungroup() %>% 
  arrange(desc(Freight_weight)) %>% 
  slice(1:10)

#> # A tibble: 10 × 2
#>    Freight_weight origin_name
#>             <dbl> <chr>      
#>  1           49.7 N          
#>  2           49.3 I          
#>  3           49.2 J          
#>  4           49.0 F          
#>  5           47.9 M          
#>  6           47.8 K          
#>  7           47.8 E          
#>  8           47.4 O          
#>  9           47.1 H          
#> 10           46.9 G

Created on 2022-07-06 by the reprex package (v2.0.1)

  • Related