I have this df:
pd.DataFrame([
[50000,'01',991,5000],
[50000,'01',992,5000],
[50000,'01','tot',10000],
[50000,'02',777,2500],
[50000,'02','tot',2500],
[66666,'01',777,7000],
[66666,'01',788,8000],
[66666,'01','tot',15000]],
columns=['ID','group_code','section_code','budget'])
representing budgets available divided into groups, where the lines with total in the section_code
are the sum of the whole budgets for the specific group_code
.
I was trying to produce this df:
pd.DataFrame([
[50000,'01',991,5000, 10000],
[50000,'01',992,5000, 10000],
[50000,'01','tot',10000, np.NaN],
[50000,'02',777,2500, 2500],
[50000,'02','tot',2500, np.NaN],
[66666,'01',777,7000, 15000],
[66666,'01',788,8000, 15000],
[66666,'01','tot',15000, np.NaN]],
columns=['ID','group_code','section_code','budget','total available'])
Which is the same as the first one but with the sum of the whole group_code
next to each item in a new total available
column.
I tried accessing the budget
column at every line with the string 'total'
in the section_code
, but I couldn't manage to assign it into a new column using:
df.loc[(df['section_code']== 'tot') & (df['group_code']== YYY), 'budget']
Where YYY is a specific value ( I want it to be independent and the only solutions come to my mind is iterating through the whole rows)
I'm new to pandas so please pardon me if I don't get the things right.
Can anyone please give me a direction for the right solution?
CodePudding user response:
You could use a different approach: mask
the rows with "tot" and apply a groupby
transform('sum')
.
df['total_available'] = (df.mask(df['section_code'].eq('tot'))
.groupby(['ID', 'group_code'])
['budget'].transform('sum')
)
output:
ID group_code section_code budget total_available
0 50000 01 991 5000 10000.0
1 50000 01 992 5000 10000.0
2 50000 01 tot 10000 NaN
3 50000 02 777 2500 2500.0
4 50000 02 tot 2500 NaN
5 66666 01 777 7000 15000.0
6 66666 01 788 8000 15000.0
7 66666 01 tot 15000 NaN