d = {'a': [0,1,0,1,1,1], 'b': [1,1,1,1,1,1], 'c': [0,0,0,0,0,1], 'd': [1,0,1,1,1,0]}
dct = {'a':'top','b':'bot','c':'top','d':'bot'}
df = pd.DataFrame(d)
So I have a df and a dictionary that assigns each column to a group. Having trouble creating a function that creates a total(sum) column for the group. This would be the output of the first 2 rows:
a | b | c | d | total_top | total_bot |
---|---|---|---|---|---|
0 | 1 | 0 | 1 | 0 | 2 |
1 | 1 | 0 | 0 | 1 | 1 |
and so on.... Ideally taking the dataframe and name of the group as inputs
CodePudding user response:
Try the following -
sums = df.rename(columns=dct).stack().groupby(level=(0,1)).sum().unstack()
df_new = df.join(sums)
df_new
a b c d bot top
0 0 1 0 1 2 0
1 1 1 0 0 1 1
2 0 1 0 1 2 0
3 1 1 0 1 2 1
4 1 1 0 1 2 1
5 1 1 1 0 1 2
Explanation -
df.rename(columns=dct)
renames the column names totop
andbot
based on the dictionarydct
df.rename(columns=dct)
top bot top bot
0 0 1 0 1
1 1 1 0 0
2 0 1 0 1
3 1 1 0 1
4 1 1 0 1
5 1 1 1 0
- Then
stack()
stacks the rows so you have a single column with 1s and 0s and multiple indexes.
df.rename(columns=dct).stack()
0 top 0
bot 1
top 0
bot 1
1 top 1
bot 1
top 0
bot 0
2 top 0
bot 1
top 0
bot 1
3 top 1
bot 1
top 0
bot 1
4 top 1
bot 1
top 0
bot 1
5 top 1
bot 1
top 1
bot 0
dtype: int64
Groupby
withsum()
groups on level 0 (row indexes) and level 1 (column with top and bot values) and sums up the 1s
df.rename(columns=dct).stack().groupby(level=(0,1)).sum()
0 bot 2
top 0
1 bot 1
top 1
2 bot 2
top 0
3 bot 2
top 1
4 bot 2
top 1
5 bot 1
top 2
dtype: int64
Unstack
to get back the 2 columnstop
andbot
df.rename(columns=dct).stack().groupby(level=(0,1)).sum().unstack()
bot top
0 2 0
1 1 1
2 2 0
3 2 1
4 2 1
5 1 2
- Last, just join the original
df
with this newsums
dataframe over indexes.
df.join(sums)
a b c d bot top
0 0 1 0 1 2 0
1 1 1 0 0 1 1
2 0 1 0 1 2 0
3 1 1 0 1 2 1
4 1 1 0 1 2 1
5 1 1 1 0 1 2
CodePudding user response:
map
the columns names to groupsgroupby
the resulting values along axis=1 andsum
add_prefix
to columns for namingjoin
todf
>>> df.join(df.groupby(df.columns.map(dct), axis=1).sum().add_prefix("total_"))
a b c d total_bot total_top
0 0 1 0 1 2 0
1 1 1 0 0 1 1
2 0 1 0 1 2 0
3 1 1 0 1 2 1
4 1 1 0 1 2 1
5 1 1 1 0 1 2
CodePudding user response:
Group by the dct
and join back:
grouped = df.groupby(dct, axis = 1).sum().add_prefix('total_')
pd.concat([df, grouped], axis = 1)
a b c d total_bot total_top
0 0 1 0 1 2 0
1 1 1 0 0 1 1
2 0 1 0 1 2 0
3 1 1 0 1 2 1
4 1 1 0 1 2 1
5 1 1 1 0 1 2