Home > Back-end >  Pandas grouping with filtering on other columns
Pandas grouping with filtering on other columns

Time:03-10

I have the following dataframe in Pandas:

name value in out
A 50 1 0
A -20 0 1
B 150 1 0
C 10 1 0
D 500 1 0
D -250 0 1
E 800 1 0

There are maximally only 2 observations for each name: one for in and one for out. If there is only in for a name there is only one observation for it.

You can create this dataset with this code:

data = {
        'name': ['A','A','B','C','D','D','E'],
        'values': [50,-20,150,10,500,-250,800],
        'in': [1,0,1,1,1,0,1],
        'out': [0,1,0,0,0,1,0]
       }

df = pd.DataFrame.from_dict(data)

I want to sum the value column for each name but only if name has both in and out record. In other words, only when one unique name has exactly 2 rows.

The result should look like this:

name value
A 30
D 250

If I run the following code I got all the results without filtering based on in and out.

df.groupby('name').sum()

name value
A 30
B 150
C 10
D 250
E 800

How to add the beforementioned filtering based on columns?

CodePudding user response:

IIUC, you could filter before aggregation:

# check that we have exactly 1 in and 1 out per group
mask = df.groupby('name')[['in', 'out']].transform('sum').eq([1,1]).all(1)
# slice the correct groups and aggregate
out = df[mask].groupby('name', as_index=False)['values'].sum()

Or, you could filter afterwards (maybe less efficient if you have a lot of groups that would be filtered out):

(df.groupby('name', as_index=False).sum()
   .loc[lambda d: d['in'].eq(1) & d['out'].eq(1), ['name', 'values']]
)

output:

  name  values
0    A      30
1    D     250

CodePudding user response:

Maybe you can try something with groupby, agg, and query (like below):

df.groupby('name').agg({'name':'count', 'values': 'sum'}).query('name>1')[['values']]

Output:

        values
name        
A         30
D         250

You could also make .query('name==2') in above if you like but assuming it can occur max at 2 .query('name>1') would also return same.

  • Related