I have 2 dataframes that contain 3 account indictors per account number. The account numbers are like for like in the column labelled "account". I am trying to modify dataframe 2 so it matches dataframe 1 in terms of having the same NaN values for each column.
Dataframe 1:
import pandas as pd
import numpy as np
df = pd.DataFrame([[1234567890,1,np.nan,'G'],
[7854567890,np.nan,100,np.nan],
[7854567899,np.nan,np.nan,np.nan],
[7854567893,np.nan,100,np.nan],
[7854567893,np.nan,np.nan,np.nan],
[9632587415,np.nan,np.nan,'B']],
columns = ['account','ind_1','ind_2','ind_3'])
df
Output:
account ind_1 ind_2 ind_3
0 1234567890 1.0 NaN G
1 7854567890 NaN 100.0 NaN
2 7854567899 NaN NaN NaN
3 7854567893 NaN 100.0 NaN
4 7854567893 NaN NaN NaN
5 9632587415 NaN NaN B
Dataframe 2:
df2 = pd.DataFrame([[1234567890,5,np.nan,'GG'],
[7854567890,1,106,np.nan],
[7854567899,np.nan,100,'N'],
[7854567893,np.nan,100,np.nan],
[7854567893,np.nan,np.nan,np.nan],
[9632587415,3,np.nan,'B']],
columns = ['account','ind_1','ind_2','ind_3'])
df2
Output:
account ind_1 ind_2 ind_3
0 1234567890 5.0 NaN GG
1 7854567890 1.0 106.0 NaN
2 7854567899 NaN 100.0 N
3 7854567893 NaN 100.0 NaN
4 7854567893 NaN NaN NaN
5 9632587415 3.0 NaN B
Problem: I need to change dataframe 2 so that it is matching the same NaN values in dataframe 1.
For example: Column ind_1 has values in index 0, 1 and 5 in df2, whereas in df1 it only has values in index 0. I need to replace the values in index 1 and 5 in df2 with NaNs to match the same number of Nans in df1. Same logic to applied to the other 2 columns.
Expected outcome:
account ind_1 ind_2 ind_3
0 1234567890 5.0 NaN GG
1 7854567890 NaN 106.0 NaN
2 7854567899 NaN NaN NaN
3 7854567893 NaN 100.0 NaN
4 7854567893 NaN NaN NaN
5 9632587415 NaN NaN B
Is there any easy way to achieve this?
Thanks in advance Alan
CodePudding user response:
Try this
df2[~df.isna()]
df.isna()
check where df has nan value and create a mask
then slice df2 with the mask.
CodePudding user response:
You return a DataFrame with True
values where there are NaN
values in your first dataframe using df.isna()
. You can use this as a Boolean mask to set the correct cells to NaN
in your second DataFrame:
df2[df.isna()] = None
Be careful: I assume that you also need to take care that these NaNs will be associated with rows with the same value for account
. This solution does not ensure that this will happen, i.e. it will assume that the values in account
column are exactly int he same order in both dataframes.