I have two dataframes: df1 and df2. They have the same shape. Here's what they look like:
df1
1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|
20% | 10% | 5% | 1% | 0% |
20% | 10% | 5% | 1% | 0% |
20% | 10% | 5% | 1% | 0% |
df2
1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|
string | word | thing | NaN | NaN |
string | word | thing | NaN | NaN |
string | word | thing | NaN | NaN |
I want to use df2 to mask or filter df1, such that "new_df1" looks like the below. In places where df2 has "NaN', I want df1 to have NaN. In places where df2 is not NaN, I want to keep the original value of df1.
new_df1
1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|
20% | 10% | 5% | NaN | NaN |
20% | 10% | 5% | NaN | NaN |
20% | 10% | 5% | NaN | NaN |
I've found functions like df1.mask(df2=None)
, df1.mask(df2!=None)
, df1.where(df2!=None)
, and df1.where(df2=None)
. I either get a dataframe full of NaN or the opposite of what I want (NaN's in new_df1 where there is a value in df2). I suspect it's because the values in df1 are strings and not integers or floats? It looks like df.mask()
and df.where()
don't take string exceptions, and I can't find what the right function is, but it must exist. Can anyone help?
CodePudding user response:
You can try:
new_df1 = df1.mask(df2.isnull())
Or you can also do:
new_df1 = df1.where(~df2.isnull())
CodePudding user response:
Since the mask is aligned by index, you may need to reset the index of the two dataframes before doing the masking
Input
df1 = pd.DataFrame([[.1,.2], [.4,.5]], index=[1,2])
df2 = pd.DataFrame([['a',None], [None,'d']], index=[3,4])
Masking without aligning the index gives wrong result
df1.mask(df2.isnull())
0 1
1 NaN NaN
2 NaN NaN
While resetting the index first gives correct result
df1.reset_index(drop=True).mask(df2.reset_index(drop=True).isnull())
0 1
0 0.1 NaN
1 NaN 0.5