how would you add a column to this dataset showing the number of individuals of each species?.
install.packages("ggplot")
library(ggplot)
library(ggplot2)
starwars
So far this is what I have tried:
num <- starwars %>% group_by(species)
num
CodePudding user response:
In the case where all you need is the count of the number of individuals/records within a group you can use dplyr::count()
. After counting the groups I arrange()
them in decreasing (desc()
) order of n
.
library(dplyr)
starwars %>%
group_by(species) %>%
count() %>%
arrange(desc(n))
More generally in the place of count()
you can use summarize()
and n()
if for instance you need to do some other calculation with the counts or summarize some other data elements.
Here with summarize()
I divide the count by the number of rows in the original dataset to make a proportion a column named foo
library(dplyr)
starwars %>%
group_by(species) %>%
summarize(foo = n() / nrow(starwars)) %>%
arrange(desc(foo))
CodePudding user response:
Use add_count
library(dplyr)
starwars %>%
add_count(species, name = "Total_species")
add_count
: does the same as: (note mutate)
starwars %>%
group_by(species) %>%
mutate(n = n())
in contrast:
count
: does the following: (note summarise)
starwars %>%
group_by(species) %>%
summarise(n = n())
name height mass hair_color skin_color eye_color birth_year sex gender homeworld species films vehicles starships Total_species
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <lis> <list> <list> <int>
1 Luke Sk~ 172 77 blond fair blue 19 male mascu~ Tatooine Human <chr> <chr> <chr [2]> 35
2 C-3PO 167 75 NA gold yellow 112 none mascu~ Tatooine Droid <chr> <chr> <chr [0]> 6
3 R2-D2 96 32 NA white, bl~ red 33 none mascu~ Naboo Droid <chr> <chr> <chr [0]> 6
4 Darth V~ 202 136 none white yellow 41.9 male mascu~ Tatooine Human <chr> <chr> <chr [1]> 35
5 Leia Or~ 150 49 brown light brown 19 fema~ femin~ Alderaan Human <chr> <chr> <chr [0]> 35
6 Owen La~ 178 120 brown, gr~ light blue 52 male mascu~ Tatooine Human <chr> <chr> <chr [0]> 35
7 Beru Wh~ 165 75 brown light blue 47 fema~ femin~ Tatooine Human <chr> <chr> <chr [0]> 35
8 R5-D4 97 32 NA white, red red NA none mascu~ Tatooine Droid <chr> <chr> <chr [0]> 6
9 Biggs D~ 183 84 black light brown 24 male mascu~ Tatooine Human <chr> <chr> <chr [1]> 35
10 Obi-Wan~ 182 77 auburn, w~ fair blue-gray 57 male mascu~ Stewjon Human <chr> <chr> <chr [5]> 35
CodePudding user response:
Loading in the dataset.
library(dplyr)
head(starwars)
What we want to do is to clean up some of the NAs in general. This is optional, but I am doing it in this case. We will also create a tibble containing the counts of each species.
starwars_clean <- starwars %>% na.omit()
species_counts <- starwars_clean %>%
count(species) %>%
mutate(species_count = n)
Afterwards, we want to merge the two tibbles (something like a left join in a relational database) based on the species
column. A left join is denoted by the all.x = TRUE
argument.
joined_tibble <- merge(
x = starwars_clean,
y = species_counts,
by = "species",
all.x = TRUE
)
head(joined_tibble)
We will have the species_count
column for each row of the starwars
tibble.