adding a relative value per group in python-CodePudding

I have a dataframe

df =pd. DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
                   'B' : ['one', 'one', 'two', 'two', 'three', 'three', 'four', 'five'],
                  'C' : [2,3,4,9,12,12,17,13]})

I would like to add a new column New, equal to C value/ C value where B= 'one' per group(group by A), the output would like:

A   B   C   New
foo one 2   1
bar one 3   1
foo two 4   2
bar two 9   3
foo three   12  6
bar three   12  4
foo four    17  8.5
foo five    13  6.5

My code is

grouped = df.groupby(['A']).head(7)
grouped['new']= grouped['C']/df[grouped['B']=='one']['C']

output is not I expected:

A   B   C   new
foo one 2   1
bar one 3   1
foo two 4   NaN
bar two 9   NaN
foo three   12  NaN
bar three   12  NaN
foo four    17  NaN
foo five    13  NaN

CodePudding user response：

First, set up a new dataframe which just consists of rows where B == 'one':

df2 = df.loc[df['B'] == 'one',:]

Then merge these two dataframes on the A column:

df.merge(df2, left_on = 'A', right_on = 'A')

     A    B_x  C_x  B_y  C_y
0  foo    one    2  one    2
1  foo    two    4  one    2
2  foo  three   12  one    2
3  foo   four   17  one    2
4  foo   five   13  one    2
5  bar    one    3  one    3
6  bar    two    9  one    3
7  bar  three   12  one    3

Your new column is just column "C_x" divided by "C_y". You can clean up the column names as required. Also, if the order of the rows in df is important, you probably need to reorder the output as merge sorts the dataframe.

CodePudding user response：

You can use a mapping:

one = df.query('B == "one"').set_index('A')['C']

df['new'] = df['C'].div(df['A'].map(one))

Or groupby.transform:

df['new'] = df['C'].div(df['C'].where(df['B'].eq('one'))
                         .groupby(df['A'])
                         .transform('first'))

Output:

     A      B   C  new
0  foo    one   2  1.0
1  bar    one   3  1.0
2  foo    two   4  2.0
3  bar    two   9  3.0
4  foo  three  12  6.0
5  bar  three  12  4.0
6  foo   four  17  8.5
7  foo   five  13  6.5

CodePudding user response：

If B == one are always on top of all groups, use follwing code:

df.groupby('A')['C'].transform(lambda x: x/x.iloc[0])

output:

0   1.00
1   1.00
2   2.00
3   3.00
4   6.00
5   4.00
6   8.50
7   6.50
Name: C, dtype: float64

If B == one are not always on top , use follwing code:

(df.set_index('B')
 .groupby('A')['C'].transform(lambda x: x/x.loc['one'])
 .reset_index(drop=True)
)

make result to new column