I have a database like this:
Individual Year ID
A 1 R
A 1 S
A 1 T
A 2 T
B 1 T
B 5 T
C 7 S
D 9 K
D 8 H
E 1 S
There are thousands of individuals in the database.
Each individual is associated with none, one or more than one ID per year (eg. individual A has 3 different ID's for year 1, and individual D only one ID for year 10 and no other information)
I am trying to study the evolution of ID "S" throughout the years and plot a line graph in which:
x axis contain years
y axis percentage (#ID S/ # total yearly ID's for all individuals)
In this example my output should be:
Year Percentage of S
1 0,5
2 0
5 0
7 1
8 0
9 0
The value for year one is obtained by dividing 3 (total ID S in year 1) by 6 (total IDs registered for year 1).
Thanks
CodePudding user response:
You could use
library(dplyr)
df %>%
group_by(Year) %>%
summarise(perc_of_s = sum(ID == "S") / n())
This returns
# A tibble: 6 x 2
Year perc_of_s
<dbl> <dbl>
1 1 0.4
2 2 0
3 5 0
4 7 1
5 8 0
6 9 0
There are just five IDs in Year 1, two of them are S
, so the percentage is 0.4
.