My 1st column is 'year'. My 2nd column is 'bank'. My last column is a "value". I want to get the SUM of VALUE for year=2003 for each 'bank' and display that in a new column. ie if my starting dataframe can be represented by the following code:
df = pd.DataFrame({'year' : [2001, 2002, 2003, 2001, 2002, 2003, 2001, 2002, 2003, 2001, 2002, 2003],
'bank' : ['sbi', 'sbi', 'sbi', 'sbi', 'sbi', 'sbi', 'icici', 'icici', 'icici', 'icici', 'icici', 'icici'],
'amt' : [1000, 2000, 3000, 4000, 5000, 6000,1,2,3,4,5,6]
})
Then the final output can be represented by the following code:
df = pd.DataFrame({'year' : [2001, 2002, 2003, 2001, 2002, 2003, 2001, 2002, 2003, 2001, 2002, 2003],
'bank' : ['sbi', 'sbi', 'sbi', 'sbi', 'sbi', 'sbi', 'icici', 'icici', 'icici', 'icici', 'icici', 'icici'],
'amt' : [1000, 2000, 3000, 4000, 5000, 6000,1,2,3,4,5,6],
'amt_2003': [9000, 9000, 9000, 9000, 9000, 9000, 9, 9, 9, 9, 9, 9]
})
ie - For 'sbi', the total 'value' in 'year'=2003 is 3000 6000=9000, which is displayed against all rows for 'sbi'. Similarly, I get 9 against all rows of 'icici'.
I am unable to use the conditional sum after using a groupby('bank') statement.
CodePudding user response:
You might want to perform a groupby.sum
, then unstack
and merge
:
df2 = df.groupby(['year', 'bank']).sum().unstack('year')
df2.columns = df2.columns.map(lambda x: f'{x[0]}_{x[1]}')
out = df.merge(df2.reset_index())
Output:
year bank amt amt_2001 amt_2002 amt_2003
0 2001 sbi 1000 5000 7000 9000
1 2002 sbi 2000 5000 7000 9000
2 2003 sbi 3000 5000 7000 9000
3 2001 sbi 4000 5000 7000 9000
4 2002 sbi 5000 5000 7000 9000
5 2003 sbi 6000 5000 7000 9000
6 2001 icici 1 5 7 9
7 2002 icici 2 5 7 9
8 2003 icici 3 5 7 9
9 2001 icici 4 5 7 9
10 2002 icici 5 5 7 9
11 2003 icici 6 5 7 9
NB. this works for more than just "amt" if you have several input columns.
Intermediate df2
:
amt_2001 amt_2002 amt_2003
bank
icici 5 7 9
sbi 5000 7000 9000
To limit the years to keep, slice with loc
:
keep = [2003]
df2 = df.groupby(['year', 'bank']).sum().loc[keep].unstack('year')
df2.columns = df2.columns.map(lambda x: f'{x[0]}_{x[1]}')
out = df.merge(df2.reset_index())
Output:
year bank amt amt_2003
0 2001 sbi 1000 9000
1 2002 sbi 2000 9000
2 2003 sbi 3000 9000
3 2001 sbi 4000 9000
4 2002 sbi 5000 9000
5 2003 sbi 6000 9000
6 2001 icici 1 9
7 2002 icici 2 9
8 2003 icici 3 9
9 2001 icici 4 9
10 2002 icici 5 9
11 2003 icici 6 9
CodePudding user response:
Use Series.map
by filtered rows for 2003
with aggregate sum
for column amt
:
df['amt_2003'] = df['bank'].map(df[df['year'].eq(2003)].groupby('bank')['amt'].sum())
Or use Series.where
for repalce non 2003
values to 0
and for new column use GroupBy.transform
:
df['amt_2003'] = df['amt'].where(df['year'].eq(2003),0).groupby(df['bank']).transform('sum')
print (df)
year bank amt amt_2003
0 2001 sbi 1000 9000
1 2002 sbi 2000 9000
2 2003 sbi 3000 9000
3 2001 sbi 4000 9000
4 2002 sbi 5000 9000
5 2003 sbi 6000 9000
6 2001 icici 1 9
7 2002 icici 2 9
8 2003 icici 3 9
9 2001 icici 4 9
10 2002 icici 5 9
11 2003 icici 6 9
Fora all columns use DataFrame.pivot_table
with DataFrame.add_prefix
and DataFrame.join
:
df1 = (df.join(df.pivot_table(index='bank', columns='year',values='amt', aggfunc='sum')
.add_prefix('amt_'), on='bank'))
print (df1)
year bank amt amt_2001 amt_2002 amt_2003
0 2001 sbi 1000 5000 7000 9000
1 2002 sbi 2000 5000 7000 9000
2 2003 sbi 3000 5000 7000 9000
3 2001 sbi 4000 5000 7000 9000
4 2002 sbi 5000 5000 7000 9000
5 2003 sbi 6000 5000 7000 9000
6 2001 icici 1 5 7 9
7 2002 icici 2 5 7 9
8 2003 icici 3 5 7 9
9 2001 icici 4 5 7 9
10 2002 icici 5 5 7 9
11 2003 icici 6 5 7 9