Home > Blockchain >  Python: Transform or Melt and Groupby a Dataframe
Python: Transform or Melt and Groupby a Dataframe

Time:09-23

I have something like this:

df =
       col1  col2  col3
    0     B     C     A
    1     E     D     G
    2   NaN     F     B

EDIT : I need to convert it into this:

result =
           Name    location  
        0     B   col1,col2
        1     C        col1     
        2     A        col1
        3     E        col2     
        4     D        col2        
        5     G        col2     
        6     F        col3

Essentially getting a "location" telling me which column an "Name" is in. Thank you in advance.

CodePudding user response:

Try melt and dropna:

>>> df.melt(var_name='location').dropna().groupby('value', sort=False, as_index=False).agg(', '.join)
  value    location
0     B  col1, col3
1     E        col1
2     C        col2
3     D        col2
4     F        col2
5     A        col3
6     G        col3
>>> 

Also groupby and agg.

CodePudding user response:

Try using melt to convert columns to rows. And give the rows a column name.

Then dropna to remove the NaN values in rows.

df = df.melt(var_name="location", value_name="Name").dropna()

CodePudding user response:

You can use pandas.melt and pandas.groupby.agg:

df = df.melt(var_name="location", value_name="Name").dropna()
new_df = df.groupby("Name", as_index=False).agg(",".join)
print(new_df)

Output:

  Name   location
0    A       col3
1    B  col1,col3
2    C       col2
3    D       col2
4    E       col1
5    F       col2
6    G       col3

CodePudding user response:

Or an alternative with stack():

new = df.stack().reset_index().drop('level_0',axis=1).dropna()
new.columns = ['name','location']

prints:

   name location
0  col1        B
1  col2        C
2  col3        A
3  col1        E
4  col2        D
5  col3        G
6  col2        F

EDIT:

To get your updated output you could use a groupby along with join():

new.groupby('location').agg({'name':lambda x: ', '.join(list(x))}).reset_index()

Which gives you:

 location        name
0        A        col3
1        B  col1, col3
2        C        col2
3        D        col2
4        E        col1
5        F        col2
6        G        col3
  • Related