Home > Mobile >  Pandas melt automatically sorts the results
Pandas melt automatically sorts the results

Time:08-15

I have a dataframe that looks like this:

col1    col2    col3    targ1    targ3    targ6
a        b      c        1        2       3
a        c      d        2        4       9

I am using melt to pivot the last three columns, and the result looks like this:

pivot_df = (data.melt(id_vars=data.columns.difference(['targ1','targ3','targ6']),var_name='m_val', value_name='value')
                .assign(month=lambda d: d['m_val'].astype('str').str.extractall('(\d )').unstack().fillna('').sum(axis=1).astype(int))
            )

col1 col2 col3 m_val value
a     b    c    1     1
a     c    d    1     2
a     b    c    3     2
a     c    d    3     4
a     b    c    6     3
a     c    d    6     9 

Why is the melt sorts the m_val, I want the results to be like this:

col1 col2 col3 m_val value
a     b    c    1     1
a     c    d    3     2
a     b    c    6     3
a     c    d    1     2
a     b    c    3     4
a     c    d    6     9 

If I try to use ignore_index=False in melt() then I get this error:

ValueError: Index contains duplicate entries, cannot reshape

CodePudding user response:

If use ignore_index=False then sort index values and set default index by DataFrame.sort_index:

df = (data.melt(id_vars=data.columns.difference(['targ1','targ3','targ6']),var_name='m_val', value_name='value',ignore_index=False)
                .sort_index(ignore_index=True)
                .assign(month=lambda d: d['m_val'].astype('str').str.extractall('(\d )').unstack().fillna('').sum(axis=1).astype(int))
            )
print (df)
  col1 col2 col3  m_val  value  month
0    a    b    c  targ1      1      1
1    a    b    c  targ3      2      3
2    a    b    c  targ6      3      6
3    a    c    d  targ1      2      1
4    a    c    d  targ3      4      3
5    a    c    d  targ6      9      6

Or use melt alternative with DataFrame.stack:

df = (data.set_index(data.columns.difference(['targ1','targ3','targ6']).tolist())
          .rename_axis('m_val', axis=1)
          .stack()
          .reset_index(name='value')
          .assign(month=lambda d: d['m_val'].astype('str').str.extractall('(\d )').unstack().fillna('').sum(axis=1).astype(int)))

print (df)
  col1 col2 col3  m_val  value  month
0    a    b    c  targ1      1      1
1    a    b    c  targ3      2      3
2    a    b    c  targ6      3      6
3    a    c    d  targ1      2      1
4    a    c    d  targ3      4      3
5    a    c    d  targ6      9      6
  • Related