Home > database >  How to add values of multiple rows during/after "groupby"?
How to add values of multiple rows during/after "groupby"?

Time:01-06

I have a assignment dataset which consists of students' grades sorted by academic groups for various modules. In my assignment, I used the following code to acquire an output which I later hardcoded it into a table:

math_comparison_size = data.groupby(["Academic Group","Math.SemGrade"]).size()

enter image description here

Is there a way where I can combine/merge certain rows together to increase their values?

Such as:

A (combination of A and A values for each academic group E.g. for Group A it will be 28 instead of 7 and 21)

so on and forth...

CodePudding user response:

Update

>>> df.unstack(level=0).groupby(df.index.levels[1].map(mapping)).sum()

Academic Group  Grp A  Grp B  Grp C  Grp D  Grp E
Math.SemGrade                                    
A                  43     49     93     82     39
B                  24     69     65     59     57
C                  20      8      5     23     13

Input data:

Academic Group  Math.SemGrade
Grp A           A                13
                A                17
                AD                6
                B                 1
                B                 6
                C                 9
                C                 0
                D                 9
                D                 6
                F                20
Grp B           A                 6
                A                 8
                AD               19
                B                 6
                B                10
                C                24
                C                 5
                D                11
                D                29
                F                 8
Grp C           A                22
                A                27
                AD               16
                B                14
                B                14
                C                26
                C                14
                D                 0
                D                25
                F                 5
Grp D           A                29
                A                23
                AD                2
                B                11
                B                17
                C                 1
                C                27
                D                 3
                D                28
                F                23
Grp E           A                 2
                A                 9
                AD                4
                B                 9
                B                15
                C                18
                C                10
                D                 5
                D                24
                F                13
dtype: int64

Old answer

If you want to group by the first letter (A -> A, A -> A, ...), you can use:

>>> (df.groupby(df['Maths Semester Grades'].str[0])
       .sum(numeric_only=True).reset_index())

  Maths Semester Grades  Grp A  Grp B  Grp C  Grp D  Grp E
0                     A     36     33     65     54     15
1                     B      7     16     28     28     24
2                     C      9     29     40     28     28
3                     D     15     40     25     31     29
4                     F     20      8      5     23     13

If you want to control the groups, use a mapping dict:

mapping = {'A': 'A', 'A ': 'A', 'AD': 'A', 'B': 'A', 'B ': 'A',
           'C': 'B', 'C ': 'B', 'D': 'B', 'D ': 'B',
           'E': 'C', 'F': 'C'}

>>> (df.groupby(df['Maths Semester Grades'].map(mapping))
       .sum(numeric_only=True).reset_index()

  Maths Semester Grades  Grp A  Grp B  Grp C  Grp D  Grp E
0                     A     43     49     93     82     39
1                     B     24     69     65     59     57
2                     C     20      8      5     23     13

Input dataframe:

>>> df
  Maths Semester Grades  Grp A  Grp B  Grp C  Grp D  Grp E
0                     A     13      6     22     29      2
1                    A      17      8     27     23      9
2                    AD      6     19     16      2      4
3                     B      1      6     14     11      9
4                    B       6     10     14     17     15
5                     C      9     24     26      1     18
6                    C       0      5     14     27     10
7                     D      9     11      0      3      5
8                    D       6     29     25     28     24
9                     F     20      8      5     23     13

CodePudding user response:

Use DataFrame.groupby().sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.

In this article, I will explain how to use groupby() and sum() functions together with examples. group by & sum on single & multiple columns is accomplished by multiple ways in pandas, some among them are groupby(), pivot(), transform(), and aggregate() functions. enter link description here

  • Related