Home > Blockchain >  Calculating total flags against each ID in a dataframe
Calculating total flags against each ID in a dataframe

Time:09-19

Suppose I have a dataframe(grouped by ID-A and ID-B) like this-

ID-A  ID-B    time     sum   flag
A       1   09:30:00    5     Y
            09:33:00    8     N
            09:36:00    5     Y
        2   09:36:00    10    Y
            09:39:00    15    Y
            09:42:00    2     Y
B       1   09:30:00    10    Y
            09:33:00    12    N
            09:36:00    5     Y

I want to calculate the total 'Y' flags against each ID-A and ID-B.

Desired Dataframe-

ID-A  ID-B    time     sum   flag(Total)
A       1   09:30:00    5     2
            09:33:00    8     
            09:36:00    5     
        2   09:36:00    10    3
            09:39:00    15    
            09:42:00    2     
B       1   09:30:00    10    2
            09:33:00    12    
            09:36:00    5     

CodePudding user response:

You can groupby the two index levels, count the Ys per group and assign to the original df.

Prepare data:

import pandas as pd

data = {'time': ['09:30:00',
                 '09:33:00',
                 '09:36:00',
                 '09:36:00',
                 '09:39:00',
                 '09:42:00',
                 '09:30:00',
                 '09:33:00',
                 '09:36:00'],
         'sum': [5, 8, 5, 10, 15, 2, 10, 12, 5],
         'flag': ['Y', 'N', 'Y', 'Y', 'Y', 'Y', 'Y', 'N', 'Y']}

my_index = pd.MultiIndex.from_arrays([["A"]*6   ["B"]*3, [1, 1, 1, 2, 2, 2, 1, 1, 1]], names=["ID-A", "ID-B"])
df = pd.DataFrame(data, index=my_index)

Result:

               time  sum flag
ID-A ID-B                    
A    1     09:30:00    5    Y
     1     09:33:00    8    N
     1     09:36:00    5    Y
     2     09:36:00   10    Y
     2     09:39:00   15    Y
     2     09:42:00    2    Y
B    1     09:30:00   10    Y
     1     09:33:00   12    N
     1     09:36:00    5    Y

The rest:

>>> df.assign(flag_total=df.flag.eq("Y").groupby(level=[0, 1]).sum())
               time  sum flag  flag_total
ID-A ID-B                                
A    1     09:30:00    5    Y           2
     1     09:33:00    8    N           2
     1     09:36:00    5    Y           2
     2     09:36:00   10    Y           3
     2     09:39:00   15    Y           3
     2     09:42:00    2    Y           3
B    1     09:30:00   10    Y           2
     1     09:33:00   12    N           2
     1     09:36:00    5    Y           2
  • Related