Home > Net >  How to loop through the dataset to group and summarise data?
How to loop through the dataset to group and summarise data?

Time:01-25

I have dataset with 5900 rows as follows: Merged_char column is a merged column of 4 different columns with socio-demo characteristics and sum column is the sum of how many observations with each set of characteristics belong to one of 4 groups.

Period Group Sum Merged_char
01.2021 1st 100 2_medium_North_55yo
01.2021 2nd 210 2_medium_North_55yo
01.2021 3rd 350 2_medium_North_55yo
01.2021 4th 170 2_medium_North_55yo
01.2021 1st 12 2_medium_North_20yo
01.2021 2nd 190 2_medium_North_20yo
01.2021 3rd 250 2_medium_North_20yo
01.2021 4th 266 2_medium_North_20yo
01.2021 1st 450 2_medium_South_55yo

I want to obtain a summary with information what percentage each group represents for each combination of socio-demo. So firstly we have to calculate the sum of col "Sum" for every 4 groups: 100 210 350 170 = 830 and then use the formula for percentages like: 100/830100 = 12% for 1st row , 210/830100= 25,3 % etc. For the next set of Merged_char the sum of the sum column will be different, as 12 190 250 266= 718. So the output should look like this:

Period Group Sum Merged_char Percentiles
01.2021 1st 100 2_medium_North_55yo 12,0
01.2021 2nd 210 2_medium_North_55yo 25,3
01.2021 3rd 350 2_medium_North_55yo 42,2
01.2021 4th 170 2_medium_North_55yo 20,4
01.2021 1st 12 2_medium_North_20yo 1,7
01.2021 2nd 190 2_medium_North_20yo 26,4
01.2021 3rd 250 2_medium_North_20yo 37,0
01.2021 4th 266 2_medium_North_20yo 37,5
01.2021 1st 450 2_medium_South_55yo ...

To do that I tried to filter through group list, which have only 4 elements: 1st, 2nd, 3rd, 4th. Problem is with Merged_char column, because in my dataset I have 144 different combinations of it and the making the list with 144 elements is not the best way.

Thank you for helping

CodePudding user response:

library(dplyr)

df%>%
  group_by(Merged_char)%>%
  mutate(Percentile=Sum/sum(Sum,na.rm=T)*100)%>%
  ungroup


  Period Group   Sum Merged_char         Percentile
   <dbl> <fct> <int> <fct>                    <dbl>
1   1.20 1st     100 2_medium_North_55yo      12.0 
2   1.20 2nd     210 2_medium_North_55yo      25.3 
3   1.20 3rd     350 2_medium_North_55yo      42.2 
4   1.20 4th     170 2_medium_North_55yo      20.5 
5   1.20 1st      12 2_medium_North_20yo       1.67
6   1.20 2nd     190 2_medium_North_20yo      26.5 
7   1.20 3rd     250 2_medium_North_20yo      34.8 
8   1.20 4th     266 2_medium_North_20yo      37.0 
9   1.20 1st     450 2_medium_South_55yo     100 
  • Related