Home > Enterprise >  How to align 2 columns in a pandas dataframe
How to align 2 columns in a pandas dataframe

Time:11-13

I'm trying to align the 2 columns in my dataframe such that wherever the 2nd column has the same value as the first column it will be same value on the same row as the first column. Wherever there is no match to the 1st column, I would like the 2nd column to have a default "0" value placed in. I have some sample data here, but my real data is much longer.

df = pd.DataFrame(data={'col1': [1.91, 2, 3.1, 4, 5, 6, 7.7, 8, 9, 10.8932], 'col2': [1.91, 3.1, 6, 7.7, 9, 'NaN', 'NaN', 'NaN', 'NaN','NaN']})

df
  col1  col2
0   1.9100  1.91
1   2.0000  3.1
2   3.1000  6
3   4.0000  7.7
4   5.0000  9
5   6.0000  NaN
6   7.7000  NaN
7   8.0000  NaN
8   9.0000  NaN
9   10.8932 NaN

Here's what I would like as my output:

df
  col1  col2
0   1.9100  1.91
1   2.0000  0
2   3.1000  3.1
3   4.0000  0
4   5.0000  0
5   6.0000  6
6   7.7000  7.7
7   8.0000  0
8   9.0000  9
9   10.8932 0

CodePudding user response:

Here you go:

df["col2"] = df.col1.where(df.col1.isin(df.col2), 0)

Output:

In [5]: df["col2"] = df.col1.where(df.col1.isin(df.col2), 0)

In [6]: df
Out[6]:
      col1  col2
0   1.9100  1.91
1   2.0000  0.00
2   3.1000  3.10
3   4.0000  0.00
4   5.0000  0.00
5   6.0000  6.00
6   7.7000  7.70
7   8.0000  0.00
8   9.0000  9.00
9  10.8932  0.00

If you want the values of col2 to be binary, then the boolean mask is sufficient:

In [7]: df["col2"] = df.col1.isin(df.col2).astype(int)
Out[7]:
      col1  col2
0   1.9100     1
1   2.0000     0
2   3.1000     1
3   4.0000     0
4   5.0000     0
5   6.0000     1
6   7.7000     1
7   8.0000     0
8   9.0000     1
9  10.8932     0
  • Related