Home > Enterprise >  Create a pandas column based on grouping
Create a pandas column based on grouping

Time:12-02

I have a data frame like this:

Group Col A Col B
  1    A1    B1
  1    A2    B2
  2    A3    B3
  2    A4    B4

I want to create a new columns Per A and Per B respecting grouops, so the outcome should be

Group Col A Col B     Per A           Per B
  1    A1    B1   100*A1/(A1 A2)  100*B1/(B1 B2)
  1    A2    B2   100*A2/(A1 A2)  100*B2/(B1 B2)
  2    A3    B3   100*A3/(A3 A4)  100*B3/(B3 B4)
  2    A4    B4   100*A4/(A3 A4)  100*B4/(B3 B4)

I need a general case, with several groups each of which having different group sizes.

I tried using a for loop, and while I am able to find the columns, I cannot assign them to the data frame. I don't understand what is exactly that prevents it.

For example, this is a result I'd be looking for (note how I change the Group column reflecting "different group sizes")

Group Col A Col B Per A Per B
  1     1     2   100.0 100.0
  2     1     2   16.67 25.00
  2     2     2   33.33 25.00
  2     3     4   50.00 50.00

CodePudding user response:

  df = pd.DataFrame({
        'Group': [1,1,2,2,3,3,3],
        'ColA': [1,2,3,4,5,6,7],
        'ColB': [10,22,30,40,50,60,70],
    })
 
df = df.merge(df.groupby(['Group'])['ColA', 'ColB'].sum().reset_index(), 
              left_on='Group', right_on='Group')
df['PerA'] = df['ColA_x']*100/df['ColA_y']
df['PerB'] = df['ColB_x']*100/df['ColB_y']

df = df.rename(
    columns={'ColA_x': 'ColA', 'ColB_x': 'ColB'}).drop(
        columns=['ColA_y', 'ColB_y'])

print (df)

output:

   Group  ColA  ColB       PerA       PerB
0      1     1    10  33.333333  31.250000
1      1     2    22  66.666667  68.750000
2      2     3    30  42.857143  42.857143
3      2     4    40  57.142857  57.142857
4      3     5    50  27.777778  27.777778
5      3     6    60  33.333333  33.333333
6      3     7    70  38.888889  38.888889

CodePudding user response:

Groupby group and then sum. That gives you the colum sum per group.

Set group as index and then divide by the outcome above. index makes it possible to only divide similar index terms. Code below

df.set_index('group').div(df.groupby('group').sum())*100

CodePudding user response:

Try groupby transform with update

df.update(df.div(df.groupby('Group').transform('sum'))*100)
df
Out[478]: 
   Group       ColA       ColB
0      1  33.333333  31.250000
1      1  66.666667  68.750000
2      2  42.857143  42.857143
3      2  57.142857  57.142857
4      3  27.777778  27.777778
5      3  33.333333  33.333333
6      3  38.888889  38.888889
  • Related