Home > OS >  Assign a new column to DataFrame according to a value in another row
Assign a new column to DataFrame according to a value in another row

Time:10-13

I have this df:

pd.DataFrame([
    [50000,'01',991,5000],
    [50000,'01',992,5000],
    [50000,'01','tot',10000],
    [50000,'02',777,2500],
    [50000,'02','tot',2500],
    [66666,'01',777,7000],
    [66666,'01',788,8000],
    [66666,'01','tot',15000]],
    columns=['ID','group_code','section_code','budget'])

representing budgets available divided into groups, where the lines with total in the section_code are the sum of the whole budgets for the specific group_code.

I was trying to produce this df:

pd.DataFrame([
    [50000,'01',991,5000, 10000],
    [50000,'01',992,5000, 10000],
    [50000,'01','tot',10000, np.NaN],
    [50000,'02',777,2500, 2500],
    [50000,'02','tot',2500, np.NaN],
    [66666,'01',777,7000, 15000],
    [66666,'01',788,8000, 15000],
    [66666,'01','tot',15000, np.NaN]],
    columns=['ID','group_code','section_code','budget','total available'])

Which is the same as the first one but with the sum of the whole group_code next to each item in a new total available column.

I tried accessing the budget column at every line with the string 'total' in the section_code, but I couldn't manage to assign it into a new column using:

df.loc[(df['section_code']== 'tot') & (df['group_code']== YYY), 'budget']

Where YYY is a specific value ( I want it to be independent and the only solutions come to my mind is iterating through the whole rows)

I'm new to pandas so please pardon me if I don't get the things right.

Can anyone please give me a direction for the right solution?

CodePudding user response:

You could use a different approach: mask the rows with "tot" and apply a groupby transform('sum').

df['total_available'] = (df.mask(df['section_code'].eq('tot'))
                           .groupby(['ID', 'group_code'])
                           ['budget'].transform('sum')
                        )

output:

      ID group_code section_code  budget  total_available
0  50000         01          991    5000          10000.0
1  50000         01          992    5000          10000.0
2  50000         01          tot   10000              NaN
3  50000         02          777    2500           2500.0
4  50000         02          tot    2500              NaN
5  66666         01          777    7000          15000.0
6  66666         01          788    8000          15000.0
7  66666         01          tot   15000              NaN
  • Related