Home > Back-end >  Pandas collapse dataframe horizontally
Pandas collapse dataframe horizontally

Time:10-27

I have the following to tables of boolean values:

df1 = pd.DataFrame(data={'w': [True, False, False], 
                         'x': [False, True, False],
                         'y': [True, True, True],
                         'z': [True, False, True]},
                         index=pd.Series([1, 2, 3], name='index'))
index w x y z
1 True False True True
2 False True True False
3 False False True True

I have created a new table with the same width as df1:

pd.DataFrame(columns=[f'column{num}' for num in range(1, len(df1.columns)   1)])
column1 column2 column3 column4

What I want to do is collapse the columns from df1 so that for each row, I am only showing the columns that have True values:

index column1 column2 column3 column4
1 w y z NA
2 x y NA NA
3 y z NA NA

CodePudding user response:

One way with dot and str.split:

import numpy as np

df2 = df1.dot(df1.columns ",")
         .str.split(",", expand=True)
         .replace(r'^\s*$', np.nan, regex=True)
         .rename_axis(None)
df2.columns = [f'column{num}' for num in range(1, len(df2.columns) 1)]

>>> df2

    column1 column2 column3 column4
1         w       y       z     NaN
2         x       y     NaN     NaN
3         y       z     NaN     NaN

CodePudding user response:

Try:

out = np.full(df1.shape, np.nan, dtype='object')

# mask valid columns for each row
mask = np.arange(df1.shape[1]) < df1.sum(1).values[:,None]

out[mask] = np.where(df1, df1.columns, np.nan)[df1]

out = pd.DataFrame(out)

CodePudding user response:

You could first map the column names to the cells using a trick multiplication (True -> 1, 1*'w' -> 'w' / False -> 0, 0*'w' -> '')

Then sort the rows independently using a custom key (isinstance(w, float) will push the NaNs/floats at the end)

cols = [f'column{num}' for num in range(1, len(df1.columns)   1)]
(df1*df1.columns).where(df1).apply(lambda r: pd.Series(sorted(r, key=lambda x: isinstance(x, float)),
                                                       index=cols), axis=1)

output:

      column1 column2 column3  column4
index                                 
1           w       y       z      NaN
2           x       y     NaN      NaN
3           y       z     NaN      NaN

CodePudding user response:

In your case do dot then split

out = df.dot(df.columns ',').str[:-1].str.split(',',expand=True).reindex(columns= np.arange(df.shape[1]))
Out[34]: 
       0  1     2   3
index                
1      w  y     z NaN
2      x  y  None NaN
3      y  z  None NaN
  • Related