Home > Back-end >  Percentage of occurrence of a specific string throughout time in R
Percentage of occurrence of a specific string throughout time in R

Time:11-03

I have a database like this:

Individual      Year     ID
A                1       R
A                1       S
A                1       T
A                2       T
B                1       T
B                5       T
C                7       S
D                9       K
D                8       H
E                1       S

There are thousands of individuals in the database.

Each individual is associated with none, one or more than one ID per year (eg. individual A has 3 different ID's for year 1, and individual D only one ID for year 10 and no other information)

I am trying to study the evolution of ID "S" throughout the years and plot a line graph in which:
x axis contain years
y axis percentage (#ID S/ # total yearly ID's for all individuals)

In this example my output should be:

Year       Percentage of S
1           0,5
2           0
5           0
7           1
8           0
9           0

The value for year one is obtained by dividing 3 (total ID S in year 1) by 6 (total IDs registered for year 1).

Thanks

CodePudding user response:

You could use

library(dplyr)

df %>% 
  group_by(Year) %>% 
  summarise(perc_of_s = sum(ID == "S") / n())

This returns

# A tibble: 6 x 2
   Year perc_of_s
  <dbl>     <dbl>
1     1       0.4
2     2       0  
3     5       0  
4     7       1  
5     8       0  
6     9       0  

There are just five IDs in Year 1, two of them are S, so the percentage is 0.4.

  • Related