I am trying to do pd.wide_to_long and I want to separate based off of what is inside a parenthesis. I looked up the pandas manual here and the closes example is below. I changed the column names to match something to what I have.
np.random.seed(0)
df = pd.DataFrame({'A(weekly_2010)': np.random.rand(3),
'A(weekly_2011)': np.random.rand(3),
'B(weekly_2010)': np.random.rand(3),
'B(weekly_2011)': np.random.rand(3),
'X' : np.random.randint(3, size=3)})
df['id'] = df.index
I would do
df1=pd.wide_to_long(df, stubnames=['A', 'B'], i=['id'], j='year', sep='()', suffix='\w ')
This returns an empty dataframe. I looked at this example and switched the i and J but it returns a KeyError: "None of [Index(['year'], dtype='object')] are in the [columns]" I have tried the following for sep= and still no luck. Does someone have an idea of how to get the following dataframe.
df1=pd.wide_to_long(df, stubnames=['A', 'B'], i=['id'], j='year', sep='(*)', suffix='\w ')
df1=pd.wide_to_long(df, stubnames=['A', 'B'], i=['id'], j='year', sep='\\()', suffix='\w ')
df1=pd.wide_to_long(df, stubnames=['A', 'B'], i=['id'], j='year', sep=r'\\()', suffix='\w ')
df1=pd.wide_to_long(df11, stubnames=['A', 'B'], i=['id'], j='year', sep=r'\(*\)', suffix='\w ')
The desired dataframe would be
desiredDF= pd.DataFrame({'year':["(weekly_2010)","(weekly_2010)","(weekly_2010)","(weekly_2011)","(weekly_2011)","(weekly_2011)"],
'A': np.random.rand(6),
'B': np.random.rand(6),
'X' : np.random.randint(6, size=6),
'id':np.random.randint(6, size=6)})
obviously with the same numbers, I just wanted to show the column that matters which is the year column. I need to do pivot because my dataframe is more complicated and doing other formats, messes up the numbers. If anyone knows how to write in the sep, I would greatly appreciate it!
CodePudding user response:
Try this, remove the trailing close parenthesis and separate on open parenthesis:
df.columns = df.columns.str.replace(')','', regex=False)
pd.wide_to_long(df, stubnames=['A', 'B'], i=['id'], j='year', sep='(', suffix='\w ')
Output:
X A B
id year
0 weekly_2010 0 0.548814 0.437587
1 weekly_2010 1 0.715189 0.891773
2 weekly_2010 1 0.602763 0.963663
0 weekly_2011 0 0.544883 0.383442
1 weekly_2011 1 0.423655 0.791725
2 weekly_2011 1 0.645894 0.528895
I find that sometimes you need to do a little prep work to get pd.wide_to_long to work properly.
CodePudding user response:
@ScottBoston's solution works well. I'd suggest an alternative to pd.wide_to_long
that offers more flexibility, while still being performant - pivot_longer from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(index=['X', 'id'],
names_to = ('.value', 'year'),
names_pattern=r"(.)(. )")
X id year A B
0 0 0 (weekly_2010) 0.548814 0.437587
1 1 1 (weekly_2010) 0.715189 0.891773
2 1 2 (weekly_2010) 0.602763 0.963663
3 0 0 (weekly_2011) 0.544883 0.383442
4 1 1 (weekly_2011) 0.423655 0.791725
5 1 2 (weekly_2011) 0.645894 0.528895
The .value
in this case tells the function to keep that part of the column as a header. This is determined by the argument passed to names_pattern
or names_sep
.
Sticking to pd.wide_to_long
, simply use the generic .
regex for the suffix
parameter:
pd.wide_to_long(df,
stubnames = ['A','B'],
i = 'id',
j = 'year',
sep='',
suffix='. ')
X A B
id year
0 (weekly_2010) 0 0.548814 0.437587
1 (weekly_2010) 1 0.715189 0.891773
2 (weekly_2010) 1 0.602763 0.963663
0 (weekly_2011) 0 0.544883 0.383442
1 (weekly_2011) 1 0.423655 0.791725
2 (weekly_2011) 1 0.645894 0.528895