I have a dataframe
df =pd. DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'two', 'three', 'three', 'four', 'five'],
'C' : [2,3,4,9,12,12,17,13]})
I would like to add a new column New, equal to C value/ C value where B= 'one' per group(group by A), the output would like:
A B C New
foo one 2 1
bar one 3 1
foo two 4 2
bar two 9 3
foo three 12 6
bar three 12 4
foo four 17 8.5
foo five 13 6.5
My code is
grouped = df.groupby(['A']).head(7)
grouped['new']= grouped['C']/df[grouped['B']=='one']['C']
output is not I expected:
A B C new
foo one 2 1
bar one 3 1
foo two 4 NaN
bar two 9 NaN
foo three 12 NaN
bar three 12 NaN
foo four 17 NaN
foo five 13 NaN
CodePudding user response:
First, set up a new dataframe which just consists of rows where B == 'one':
df2 = df.loc[df['B'] == 'one',:]
Then merge these two dataframes on the A column:
df.merge(df2, left_on = 'A', right_on = 'A')
A B_x C_x B_y C_y
0 foo one 2 one 2
1 foo two 4 one 2
2 foo three 12 one 2
3 foo four 17 one 2
4 foo five 13 one 2
5 bar one 3 one 3
6 bar two 9 one 3
7 bar three 12 one 3
Your new column is just column "C_x" divided by "C_y". You can clean up the column names as required. Also, if the order of the rows in df is important, you probably need to reorder the output as merge
sorts the dataframe.
CodePudding user response:
You can use a mapping:
one = df.query('B == "one"').set_index('A')['C']
df['new'] = df['C'].div(df['A'].map(one))
Or groupby.transform
:
df['new'] = df['C'].div(df['C'].where(df['B'].eq('one'))
.groupby(df['A'])
.transform('first'))
Output:
A B C new
0 foo one 2 1.0
1 bar one 3 1.0
2 foo two 4 2.0
3 bar two 9 3.0
4 foo three 12 6.0
5 bar three 12 4.0
6 foo four 17 8.5
7 foo five 13 6.5
CodePudding user response:
If B == one
are always on top of all groups, use follwing code:
df.groupby('A')['C'].transform(lambda x: x/x.iloc[0])
output:
0 1.00
1 1.00
2 2.00
3 3.00
4 6.00
5 4.00
6 8.50
7 6.50
Name: C, dtype: float64
If B == one
are not always on top , use follwing code:
(df.set_index('B')
.groupby('A')['C'].transform(lambda x: x/x.loc['one'])
.reset_index(drop=True)
)
make result to new
column