I have a dataframe that looks like this:
col1 col2 col3 targ1 targ3 targ6
a b c 1 2 3
a c d 2 4 9
I am using melt to pivot the last three columns, and the result looks like this:
pivot_df = (data.melt(id_vars=data.columns.difference(['targ1','targ3','targ6']),var_name='m_val', value_name='value')
.assign(month=lambda d: d['m_val'].astype('str').str.extractall('(\d )').unstack().fillna('').sum(axis=1).astype(int))
)
col1 col2 col3 m_val value
a b c 1 1
a c d 1 2
a b c 3 2
a c d 3 4
a b c 6 3
a c d 6 9
Why is the melt sorts the m_val, I want the results to be like this:
col1 col2 col3 m_val value
a b c 1 1
a c d 3 2
a b c 6 3
a c d 1 2
a b c 3 4
a c d 6 9
If I try to use ignore_index=False
in melt()
then I get this error:
ValueError: Index contains duplicate entries, cannot reshape
CodePudding user response:
If use ignore_index=False
then sort index values and set default index by DataFrame.sort_index
:
df = (data.melt(id_vars=data.columns.difference(['targ1','targ3','targ6']),var_name='m_val', value_name='value',ignore_index=False)
.sort_index(ignore_index=True)
.assign(month=lambda d: d['m_val'].astype('str').str.extractall('(\d )').unstack().fillna('').sum(axis=1).astype(int))
)
print (df)
col1 col2 col3 m_val value month
0 a b c targ1 1 1
1 a b c targ3 2 3
2 a b c targ6 3 6
3 a c d targ1 2 1
4 a c d targ3 4 3
5 a c d targ6 9 6
Or use melt
alternative with DataFrame.stack
:
df = (data.set_index(data.columns.difference(['targ1','targ3','targ6']).tolist())
.rename_axis('m_val', axis=1)
.stack()
.reset_index(name='value')
.assign(month=lambda d: d['m_val'].astype('str').str.extractall('(\d )').unstack().fillna('').sum(axis=1).astype(int)))
print (df)
col1 col2 col3 m_val value month
0 a b c targ1 1 1
1 a b c targ3 2 3
2 a b c targ6 3 6
3 a c d targ1 2 1
4 a c d targ3 4 3
5 a c d targ6 9 6