How to interchange cell values with column names in a dataframe in Python-CodePudding

I have following dataframe of 3 persons posted at 3 places on 3 days of the week:

I want to convert above data to following table format:

Effectively, I am interchanging cell values with column names.

I want some pointer on how to do this programmatically using Python.

CodePudding user response：

Hear is something that could work

Get new column names ie places by finding unique values in current dataframe
loop through new column names(places), add names of old column(people) as new entry for each new column in the new dataframe using apply method

Code:

import io
import pandas as pd
import numpy as np


txt="""
tom,dick
Mon,cafe,lib
Tue,lib,gym
Wed,gym,gym
"""


df = pd.read_csv(io.StringIO(txt))
print(df)
new_cols = np.unique(df.values)
new_df = pd.DataFrame(columns=new_cols)

for col in new_cols:
    new_df[col] = df.apply(lambda x: df.columns[x == col].values, axis=1)

print(new_df)

Output:

old df
      tom dick
Mon  cafe  lib
Tue   lib  gym
Wed   gym  gym


new df
      cafe          gym     lib
Mon  [tom]           []  [dick]
Tue     []       [dick]   [tom]
Wed     []  [tom, dick]      []

or if multiple people are never at the sample place ie you don't want a list of values:

import io
import pandas as pd
import numpy as np


txt="""
tom,dick
Mon,cafe,lib
Tue,lib,gym
Wed,gym,cafe
"""

df = pd.read_csv(io.StringIO(txt))
print(df)
new_cols = np.unique(df.values)
new_df = pd.DataFrame(columns=new_cols)

def get_place(row, col):
    place = df.columns[row == col].values
    return place[0] if len(place) else ''


for col in new_cols:
    new_df[col] = df.apply(lambda x:get_place(x, col), axis=1)

print(new_df)

Output:

old df
      tom  dick
Mon  cafe   lib
Tue   lib   gym
Wed   gym  cafe

new df
     cafe   gym   lib
Mon   tom        dick
Tue        dick   tom
Wed  dick   tom

CodePudding user response：

Given this dataframe df:

         Tom     Dick    Harry
Mon     Cafe  Library      Gym
Tue  Library      Gym     Cafe
Wed      Gym     Cafe  Library

You can get the outcome you want by

(i) unstack the dataframe to get a MultiIndex Series

(ii) rearrange the order of index levels and values of this Series

(iii) unstack the Series back

s = df.unstack()
s = pd.Series(s.index.get_level_values(0).to_numpy(), index = pd.MultiIndex.from_tuples(zip(s.index.get_level_values(1), s)))
out = s.unstack()

Output:

      Cafe    Gym Library
Mon    Tom  Harry    Dick
Tue  Harry   Dick     Tom
Wed   Dick    Tom   Harry

This example is a special case where in step (iii) works out fine because there is no duplicate index, however, in general case, you'll more likely have duplicate index.

For example, you could have the following dataframe:

         Tom     Dick    Harry
Mon     Cafe  Library      Gym
Tue  Library      Gym     Cafe
Wed  Library     Cafe  Library

where on Wednesday both Tom and Harry go to the library. In that case the above solution wouldn't work because we'll have duplicate indices after step (ii). But you can solve that issue by adding two more lines after step (ii) by aggregating over index:

s = df.unstack()
s = pd.Series(s.index.get_level_values(0).to_numpy(), index = pd.MultiIndex.from_tuples(zip(s.index.get_level_values(1), s)))
s = s.groupby(s.index).agg(list)
s.index = pd.MultiIndex.from_tuples(s.index)
out = s.unstack()

Output:

        Cafe      Gym       Library
Mon    [Tom]  [Harry]        [Dick]
Tue  [Harry]   [Dick]         [Tom]
Wed   [Dick]      NaN  [Tom, Harry]