I have a dataframe:
df = pd.DataFrame({'col1': ['a', 'b', 'c', 'd'], 'col2': ['b', 'c', 'd', 'e'], 'col3': [1.0, 2.0, 3.0, 4.0]})
col1 col2 col3
0 a b 1.0
1 b c 2.0
2 c d 3.0
3 d e 4.0
My goal is to create an additional col4 that contains specific values from col3 with a conditon: for each row x, look at the value in col1 and if there is another row y anywhere in the df where this value is present in col2, take the col3 value form this row y and put it as col4 into the original row x. Otherwise, leave col4 empty for this row, e.g. NaN.
So the expected output for col4 is: NaN, 1, 2, 3. For the first row there is no value as there is no row in the dataframe that has 'a' is col2. Unlike in this example, the rows can be completely unsorted in the df!
Expected output:
col1 col2 col3 col4
0 a b 1.0 NaN
1 b c 2.0 1.0
2 c d 3.0 2.0
3 d e 4.0 3.0
I have tried using .mask but no luck so far. Thanks for any help!
CodePudding user response:
You can left join the dataframe to itself using col1 on the left side & col2 on the right side.
rename col3
from the right side of the join to col4
and drop the rest of the right side columns
example:
df = df.merge(df, left_on='col1', right_on='col2', how='left', suffixes=('', '_'))
df = df.rename(columns={'col3_': 'col4'})
df = df[['col1', 'col2', 'col3', 'col4']]
df looks like:
col1 col2 col3 col4
0 a b 1 NaN
1 b c 2 1.0
2 c d 3 2.0
3 d e 4 3.0