Flatten multi-index columns into one pandas-CodePudding

I'm trying to clean up a dataframe by merging the columns on a multi-index so all values in columns that belong to the same first-level index appear in one column.

From This:

To This:

I was doing it manually by defining each column and joining them like this:

df['Subjects'] = df['Which of the following subjects are you taking this semester?'].apply(lambda x: '|'.join(x.dropna()), axis = 1) 
df.drop('Which of the following subjects are you taking this semester?', axis = 1, level = 0, inplace = True)

The problem is I have a large dataframe with many more columns then this, so I was wondering if there is a way to do this dynamically for all columns instead of copying this code and defining each column individually?

data = {('Name', ''): {0: 'Jane',
1: 'John',
2: 'Lisa',
3: 'Michael'},
('Location', ''): {0: 'Houston', 1: 'LA', 2: 'LA', 3: 
 'Dallas'},
('Which of the following subjects are you taking this 
semester?', 'Math'): {0: 'Math',
1: 'Math',
2: np.nan,
3: 'Math'},
('Which of the following subjects are you taking this 
semester?', 'Science'): {0: 'Science',
1: np.nan,
2: np.nan,
3: 'Science'},
('Which of the following subjects are you taking this 
semester?', 'Art'): {0: np.nan,
1: 'Art',
2: 'Art',
3: np.nan},
('Which of the following electronic devices do you own?', 
'Laptop'): {0: 'Laptop',
 1: 'Laptop',
 2: 'Laptop',
 3: 'Laptop'},
('Which of the following electronic devices do you own?', 
'Phone'): {0: 'Phone',
 1: 'Phone',
 2: 'Phone',
 3: 'Phone'},
 ('Which of the following electronic devices do you own?', 
'TV'): {0: np.nan,
 1: 'TV',
 2: np.nan,
 3: np.nan},
 ('Which of the following electronic devices do you own?', 
 'Tablet'): {0: 'Tablet',
 1: np.nan,
 2: 'Tablet',
 3: np.nan},
('Age', ''): {0: 24, 1: 20, 2: 19, 3: 29},
 ('Which Social Media Platforms Do You Use?', 'Instagram'): 
{0: np.nan,
 1: 'Instagram',
 2: 'Instagram',
 3: 'Instagram'},
 ('Which Social Media Platforms Do You Use?', 'Facebook'): 
 {0: 'Facebook',
 1: 'Facebook',
 2: np.nan,
 3: np.nan},
 ('Which Social Media Platforms Do You Use?', 'Tik Tok'): 
 {0: np.nan,
  1: 'Tik Tok',
  2: 'Tik Tok',
  3: np.nan},
  ('Which Social Media Platforms Do You Use?', 'LinkedIn'): 
  {0: 'LinkedIn',
  1: 'LinkedIn',
  2: np.nan,
  3: np.nan}     
   }

CodePudding user response：

You can try this:

df.T.groupby(level=0).agg(list).T

CodePudding user response：

You can use melt as starting point to flatten your dataframe, filter out nan values then pivot_table to reshape your dataframe:

pat = r'(subjects|electronic devices|Social Media Platforms)'
cols = ['Name', 'Location', 'Age']

out = df.droplevel(1, axis=1).melt(cols, ignore_index=False).query('value.notna()')
out['variable'] = out['variable'].str.extract(pat, expand=False).str.title()
out = out.reset_index().pivot_table('value', ['index']   cols, 'variable', aggfunc='|'.join) \
         .reset_index(cols).rename_axis(index=None, columns=None)

Output:

>>> out
      Name Location  Age   Electronic Devices               Social Media Platforms      Subjects
0     Jane  Houston   24  Laptop|Phone|Tablet                    Facebook|LinkedIn  Math|Science
1     John       LA   20      Laptop|Phone|TV  Instagram|Facebook|Tik Tok|LinkedIn      Math|Art
2     Lisa       LA   19  Laptop|Phone|Tablet                    Instagram|Tik Tok           Art
3  Michael   Dallas   29         Laptop|Phone                            Instagram  Math|Science