Adding a column of totals using dplyr in a dataframe-CodePudding

how would you add a column to this dataset showing the number of individuals of each species?.

install.packages("ggplot")
library(ggplot)
library(ggplot2)

starwars

So far this is what I have tried:

num <- starwars %>% group_by(species)
num

CodePudding user response：

In the case where all you need is the count of the number of individuals/records within a group you can use dplyr::count(). After counting the groups I arrange() them in decreasing (desc()) order of n.

library(dplyr)
starwars %>%
  group_by(species) %>%
  count() %>%
  arrange(desc(n))

More generally in the place of count() you can use summarize() and n() if for instance you need to do some other calculation with the counts or summarize some other data elements.

Here with summarize() I divide the count by the number of rows in the original dataset to make a proportion a column named foo

library(dplyr)
starwars %>%
  group_by(species) %>%
  summarize(foo = n() / nrow(starwars)) %>%
  arrange(desc(foo))

CodePudding user response：

Use add_count

library(dplyr)

starwars %>% 
  add_count(species, name = "Total_species")

add_count: does the same as: (note mutate)

starwars %>% 
  group_by(species) %>% 
  mutate(n = n())

in contrast:

count: does the following: (note summarise)

starwars %>% 
  group_by(species) %>% 
  summarise(n = n())

   name     height  mass hair_color skin_color eye_color birth_year sex   gender homeworld species films vehicles starships Total_species
   <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr>  <chr>     <chr>   <lis> <list>   <list>            <int>
 1 Luke Sk~    172    77 blond      fair       blue            19   male  mascu~ Tatooine  Human   <chr> <chr>    <chr [2]>            35
 2 C-3PO       167    75 NA         gold       yellow         112   none  mascu~ Tatooine  Droid   <chr> <chr>    <chr [0]>             6
 3 R2-D2        96    32 NA         white, bl~ red             33   none  mascu~ Naboo     Droid   <chr> <chr>    <chr [0]>             6
 4 Darth V~    202   136 none       white      yellow          41.9 male  mascu~ Tatooine  Human   <chr> <chr>    <chr [1]>            35
 5 Leia Or~    150    49 brown      light      brown           19   fema~ femin~ Alderaan  Human   <chr> <chr>    <chr [0]>            35
 6 Owen La~    178   120 brown, gr~ light      blue            52   male  mascu~ Tatooine  Human   <chr> <chr>    <chr [0]>            35
 7 Beru Wh~    165    75 brown      light      blue            47   fema~ femin~ Tatooine  Human   <chr> <chr>    <chr [0]>            35
 8 R5-D4        97    32 NA         white, red red             NA   none  mascu~ Tatooine  Droid   <chr> <chr>    <chr [0]>             6
 9 Biggs D~    183    84 black      light      brown           24   male  mascu~ Tatooine  Human   <chr> <chr>    <chr [1]>            35
10 Obi-Wan~    182    77 auburn, w~ fair       blue-gray       57   male  mascu~ Stewjon   Human   <chr> <chr>    <chr [5]>            35

CodePudding user response：

Loading in the dataset.

library(dplyr)
head(starwars)

What we want to do is to clean up some of the NAs in general. This is optional, but I am doing it in this case. We will also create a tibble containing the counts of each species.

starwars_clean <- starwars %>% na.omit()
species_counts <- starwars_clean %>%
  count(species) %>%
  mutate(species_count = n)

Afterwards, we want to merge the two tibbles (something like a left join in a relational database) based on the species column. A left join is denoted by the all.x = TRUE argument.

joined_tibble <- merge(
  x = starwars_clean,
  y = species_counts,
  by = "species",
  all.x = TRUE
)

head(joined_tibble)

We will have the species_count column for each row of the starwars tibble.