I have the following to tables of boolean values:
df1 = pd.DataFrame(data={'w': [True, False, False],
'x': [False, True, False],
'y': [True, True, True],
'z': [True, False, True]},
index=pd.Series([1, 2, 3], name='index'))
index | w | x | y | z |
---|---|---|---|---|
1 | True | False | True | True |
2 | False | True | True | False |
3 | False | False | True | True |
I have created a new table with the same width as df1
:
pd.DataFrame(columns=[f'column{num}' for num in range(1, len(df1.columns) 1)])
column1 | column2 | column3 | column4 |
---|
What I want to do is collapse the columns from df1
so that for each row, I am only showing the columns that have True
values:
index | column1 | column2 | column3 | column4 |
---|---|---|---|---|
1 | w | y | z | NA |
2 | x | y | NA | NA |
3 | y | z | NA | NA |
CodePudding user response:
One way with dot
and str.split
:
import numpy as np
df2 = df1.dot(df1.columns ",")
.str.split(",", expand=True)
.replace(r'^\s*$', np.nan, regex=True)
.rename_axis(None)
df2.columns = [f'column{num}' for num in range(1, len(df2.columns) 1)]
>>> df2
column1 column2 column3 column4
1 w y z NaN
2 x y NaN NaN
3 y z NaN NaN
CodePudding user response:
Try:
out = np.full(df1.shape, np.nan, dtype='object')
# mask valid columns for each row
mask = np.arange(df1.shape[1]) < df1.sum(1).values[:,None]
out[mask] = np.where(df1, df1.columns, np.nan)[df1]
out = pd.DataFrame(out)
CodePudding user response:
You could first map the column names to the cells using a trick multiplication (True -> 1, 1*'w' -> 'w' / False -> 0, 0*'w' -> '')
Then sort the rows independently using a custom key (isinstance(w, float)
will push the NaNs/floats at the end)
cols = [f'column{num}' for num in range(1, len(df1.columns) 1)]
(df1*df1.columns).where(df1).apply(lambda r: pd.Series(sorted(r, key=lambda x: isinstance(x, float)),
index=cols), axis=1)
output:
column1 column2 column3 column4
index
1 w y z NaN
2 x y NaN NaN
3 y z NaN NaN
CodePudding user response:
In your case do dot
then split
out = df.dot(df.columns ',').str[:-1].str.split(',',expand=True).reindex(columns= np.arange(df.shape[1]))
Out[34]:
0 1 2 3
index
1 w y z NaN
2 x y None NaN
3 y z None NaN