I have the dataframe as follow:
dataframe generator:
df = pd.DataFrame({
'year':[2000,2001,2002]*3,
'id':['a']*3 ['b']*3 ['c']*3,
'othernulcol': ['xyz']*3 [np.nan]*4 ['tyu']*2,
'val':[np.nan,2,3,4,5,6,7,8,9]
})
data looks like:
year id othernulcol val
0 2000 a xyz NaN
1 2001 a xyz 2.0
2 2002 a xyz 3.0
3 2000 b NaN 4.0
4 2001 b NaN 5.0
5 2002 b NaN 6.0
6 2000 c NaN 7.0
7 2001 c tyu 8.0
8 2002 c tyu 9.0
I want to create new 3 rows from 2000 to 2002 that is the sum
of row with id = a
and b
in the same year. othernulcol
is just other column in dataframe. When creating new rows, just set those cols as np.NaN
Expected output:
year id othernulcol val
0 2000 a xyz NaN
1 2001 a xyz 2.0
2 2002 a xyz 3.0
3 2000 b NaN 4.0
4 2001 b NaN 5.0
5 2002 b NaN 6.0
6 2000 c NaN 7.0
7 2001 c tyu 8.0
8 2002 c tyu 9.0
9 2000 ab NaN NaN
10 2001 ab NaN 10.0
11 2002 ab NaN 12.0
Thank you for reading
CodePudding user response:
Filter values by categories and convert year
to index for align same years from another DataFrame, sum values by DataFrame.add
and append to original DataFrame by concat
:
cols = ['id','val']
df1 = df[df['id'].eq('a')].set_index('year')[cols]
df2 = df[df['id'].eq('b')].set_index('year')[cols]
df = pd.concat([df, df1.add(df2).reset_index()], ignore_index=True)
print (df)
year id othernulcol val
0 2000 a xyz NaN
1 2001 a xyz 2.0
2 2002 a xyz 3.0
3 2000 b NaN 4.0
4 2001 b NaN 5.0
5 2002 b NaN 6.0
6 2000 c NaN 7.0
7 2001 c tyu 8.0
8 2002 c tyu 9.0
9 2000 ab NaN NaN
10 2001 ab NaN 7.0
11 2002 ab NaN 9.0
CodePudding user response:
Another solution could be as follows:
- Select rows from
df
withdf.id.isin(['a','b']
(seeSeries.isin
) and applydf.groupby
toyear
. - For the aggregration, use
sum
for columnid
. For columnval
use a lambda function to applySeries.sum
, which allowsskipna=False
. - Finally, use
pd.concat
to add the result to the originaldf
with ignoring the index.
out = pd.concat([df,df[df.id.isin(['a','b'])]\
.groupby('year', as_index=False)\
.agg({'id':'sum',
'val':lambda x: x.sum(skipna=False)})],
ignore_index=True)
print(out)
year id othernulcol val
0 2000 a xyz NaN
1 2001 a xyz 2.0
2 2002 a xyz 3.0
3 2000 b NaN 4.0
4 2001 b NaN 5.0
5 2002 b NaN 6.0
6 2000 c NaN 7.0
7 2001 c tyu 8.0
8 2002 c tyu 9.0
9 2000 ab NaN NaN
10 2001 ab NaN 7.0
11 2002 ab NaN 9.0