I have the following two DataFrames:
import pandas as pd
df = pd.DataFrame([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]],
index = [0, 0.25, 0.50, 0.75, 1],
columns = [0, 0.25, 0.50, 0.75, 1])
df_cross = pd.DataFrame([[0.0, 0.25],
[0.0, 0.75],
[0.5, 1]],
columns = ['indexes_to_keep',
'cols_to_keep'])
df
:
0.00 0.25 0.50 0.75 1.00
0.00 0 0 0 0 0
0.25 0 0 0 0 0
0.50 0 0 0 0 0
0.75 0 0 0 0 0
1.00 0 0 0 0 0
df_cross
:
indexes_to_keep cols_to_keep
0 0.0 0.25
1 0.0 0.75
2 0.5 1.00
In the df
I have my storaged data, and the df_cross contains the indexes and columns that I want to keep the values. The values in df
which the index and columns do not match with any row of df_cross
I want to replace by a string (for example "NaN").
The expected output is:
0.00 0.25 0.50 0.75 1.00
0.00 NaN 0 NaN 0 NaN
0.25 NaN NaN NaN NaN NaN
0.50 NaN NaN NaN NaN 0
0.75 NaN NaN NaN NaN NaN
1.00 NaN NaN NaN NaN NaN
Thanks in advance.
CodePudding user response:
Pandas does not support setting elements with arrays of coordinates. You would need to use numpy:
# integer locs
rows = df.index.get_indexer(df_cross.indexes_to_keep)
cols = df.columns.get_indexer(df_cross.cols_to_keep)
# where we want to keep the data
mask = np.full(df.shape, False)
mask[rows, cols] = True
df[:] = df.where(mask)
Another way, with just Pandas, to create mask
is:
mask = (df_cross.assign(val=True)
.set_index(['indexes_to_keep', 'cols_to_keep'])
['val'].unstack(fill_value=False)
)
Output:
0.00 0.25 0.50 0.75 1.00
0.00 NaN 0.0 NaN 0.0 NaN
0.25 NaN NaN NaN NaN NaN
0.50 NaN NaN NaN NaN 0.0
0.75 NaN NaN NaN NaN NaN
1.00 NaN NaN NaN NaN NaN
CodePudding user response:
Let us try crosstab
on df_cross
, then use where
to mask the values
s = pd.crosstab(*df_cross.values.T)
df.where(s == 1)
0.00 0.25 0.50 0.75 1.00
0.00 NaN 0.0 NaN 0.0 NaN
0.25 NaN NaN NaN NaN NaN
0.50 NaN NaN NaN NaN 0.0
0.75 NaN NaN NaN NaN NaN
1.00 NaN NaN NaN NaN NaN
PS: pd.crosstab(*df_cross.values.T)
is just a syntactical shortcut and is effectively equivalent to using pd.crosstab(df.indexes_to_keep, df.cols_to_keep)