I have a large dataframe which I need to pivot to long. The dataframe is in this format:
np.random.seed(0)
df = pd.DataFrame({'2010_A(weekly)': np.random.rand(3),
'2011_A(weekly)': np.random.rand(3),
'2010_B(weekly)': np.random.rand(3),
'2011_B(weekly)': np.random.rand(3),
'X' : np.random.randint(3, size=3)})
df['id'] = df.index
df
If the names were opposite, like this:
np.random.seed(0)
df = pd.DataFrame({'A(weekly)_2010': np.random.rand(3),
'A(weekly)_2011': np.random.rand(3),
'B(weekly)_2010': np.random.rand(3),
'B(weekly)_2011': np.random.rand(3),
'X' : np.random.randint(3, size=3)})
df['id'] = df.index
df
It would be easy to use wide_to_long
to pivot my table into the desired format like this:
pd.wide_to_long(df, ['A(weekly)', 'B(weekly)'], i='id',
j='year', sep='_')
However, I have not found a way to make wide_to_long consider the names backwards.
Is there anyway to use wide_to_long
in a way where it uses the end of the column to identify the stubname?
The desired output is a 5 column long dataframe with column names being "id", "year", "X", "A(weekly)", "B(weekly)"
CodePudding user response:
It's not possible with pd.wide_to_long
. You have to use other methods or rename columns to swap fields:
>>> pd.wide_to_long(df.rename(columns=lambda x: '_'.join(x.split('_')[::-1])),
['A(weekly)', 'B(weekly)'], i='id', j='year', sep='_')
X A(weekly) B(weekly)
id year
0 2010 0 0.548814 0.437587
1 2010 1 0.715189 0.891773
2 2010 1 0.602763 0.963663
0 2011 0 0.544883 0.383442
1 2011 1 0.423655 0.791725
2 2011 1 0.645894 0.528895
CodePudding user response:
One option is with pivot_longer from pyjnanitor - for this particular use case use a .value
placeholder to represent the stub (the column you wish to remain as a header), and use names_sep
to split the columns based on a separator:
# pip insall pyjanitor
import pandas as pd
import janitor
df.pivot_longer(index = ['X', 'id'], names_to = ('year', '.value'), names_sep = '_')
X id year A(weekly) B(weekly)
0 0 0 2010 0.548814 0.437587
1 1 1 2010 0.715189 0.891773
2 1 2 2010 0.602763 0.963663
3 0 0 2011 0.544883 0.383442
4 1 1 2011 0.423655 0.791725
5 1 2 2011 0.645894 0.528895