Question updated, see below
I have a large dataframe similar in structure to e.g.
df=pd.DataFrame({'A': [0, 0, 0, 11, 22,33], 'B': [10, 20,30, 110, 220, 330], 'C':['x', 'y', 'z', 'x', 'y', 'z']})
df
A B C
0 0 10 x
1 0 20 y
2 0 30 z
3 11 110 x
4 22 220 y
5 33 330 z
I want to create a new column by selecting the column value of B from a different row based on the value of C being equal to the current row and the value of A being 0, so the expected result is
A B C new_B_based_on_A_and_C
0 0 10 x 10
1 0 20 y 20
2 0 30 z 30
3 11 110 x 10
4 22 220 y 20
5 33 330 z 30
I want to have a simple solution without needing to have a for loop over the rows. Something like
df.apply(lambda row: df[df[(df['C']==row.C) & (df['A']==0)]]['B'].iloc[0], axis=1)
The dataframe is guaranteed to have those values and the values are unique
Update for a more general case
I am looking for a general solution that would also work for multiple columns to match on e.g.
df=pd.DataFrame({'A': [0, 0, 0,0, 11, 22,33, 44], 'B': [10, 20,30, 40, 110, 220, 330, 440], 'C':['x', 'y', 'x', 'y', 'x', 'y', 'x', 'y'], 'D': [1, 1, 5, 5, 1,1 ,5, 5]})
A B C D
0 0 10 x 1
1 0 20 y 1
2 0 30 x 5
3 0 40 y 5
4 11 110 x 1
5 22 220 y 1
6 33 330 x 5
7 44 440 y 5
and the result would be then
A B C D new_B_based_on_A_C_D
0 0 10 x 1 10
1 0 20 y 1 20
2 0 30 x 5 30
3 0 40 y 5 40
4 11 110 x 1 10
5 22 220 y 1 20
6 33 330 x 5 30
7 44 440 y 5 40
CodePudding user response:
You can do a map:
# you **must** make sure that for each unique `C` value,
# there is only one row with `A==0`.
df['new'] = df['C'].map(df.loc[df['A']==0].set_index('C')['B'])
Output:
A B C new
0 0 10 x 10
1 0 20 y 20
2 0 30 z 30
3 11 110 x 10
4 22 220 y 20
5 33 330 z 30
Explanation: Imagine you have a series s
indicating the mapping:
idx
idx1 value1
idx2 value2
idx3 value3
then that's what map does: df['C'].map(s)
.
Now, for a dataframe d
:
C B
c1 b1
c2 b2
c3 b3
we do s=d.set_index('C')['B']
to get the above form.
Finally, as mentioned, you mapping happens where A==0
, so d = df[df['A']==0]
.
Composing the forward path:
mapping_data = df[df['A']==0]
mapping_series = mapping_data.set_index('C')['B']
new_values = df['C'].map(mapping_series)
and the first piece of code is just all these lines combined.
CodePudding user response:
If I understood the question, for the general case you could use a merge like this:
df.merge(df.loc[df['A'] == 0, ['B', 'C', 'D']], on=['C', 'D'], how='left', suffixes=('', '_new'))
Output:
A B C D B_new
0 10 x 1 10
0 20 y 1 20
0 30 x 5 30
0 40 y 5 40
11 110 x 1 10
22 220 y 1 20
33 330 x 5 30
44 440 y 5 40