I'm trying to align the 2 columns in my dataframe such that wherever the 2nd column has the same value as the first column it will be same value on the same row as the first column. Wherever there is no match to the 1st column, I would like the 2nd column to have a default "0" value placed in. I have some sample data here, but my real data is much longer.
df = pd.DataFrame(data={'col1': [1.91, 2, 3.1, 4, 5, 6, 7.7, 8, 9, 10.8932], 'col2': [1.91, 3.1, 6, 7.7, 9, 'NaN', 'NaN', 'NaN', 'NaN','NaN']})
df
col1 col2
0 1.9100 1.91
1 2.0000 3.1
2 3.1000 6
3 4.0000 7.7
4 5.0000 9
5 6.0000 NaN
6 7.7000 NaN
7 8.0000 NaN
8 9.0000 NaN
9 10.8932 NaN
Here's what I would like as my output:
df
col1 col2
0 1.9100 1.91
1 2.0000 0
2 3.1000 3.1
3 4.0000 0
4 5.0000 0
5 6.0000 6
6 7.7000 7.7
7 8.0000 0
8 9.0000 9
9 10.8932 0
CodePudding user response:
Here you go:
df["col2"] = df.col1.where(df.col1.isin(df.col2), 0)
Output:
In [5]: df["col2"] = df.col1.where(df.col1.isin(df.col2), 0)
In [6]: df
Out[6]:
col1 col2
0 1.9100 1.91
1 2.0000 0.00
2 3.1000 3.10
3 4.0000 0.00
4 5.0000 0.00
5 6.0000 6.00
6 7.7000 7.70
7 8.0000 0.00
8 9.0000 9.00
9 10.8932 0.00
If you want the values of col2
to be binary, then the boolean mask is sufficient:
In [7]: df["col2"] = df.col1.isin(df.col2).astype(int)
Out[7]:
col1 col2
0 1.9100 1
1 2.0000 0
2 3.1000 1
3 4.0000 0
4 5.0000 0
5 6.0000 1
6 7.7000 1
7 8.0000 0
8 9.0000 1
9 10.8932 0