I have a dataset where I would like to rearrange and sort quarter values in numerical order, grouping by the 'id' column
Data
id date stat
aa q1 22 y
aa q1 23 y
aa q2 22 y
aa q2 23 y
aa q3 22 y
aa q3 23 y
aa q4 22 y
aa q4 23 ok
bb q1 22 n
bb q1 23 n
bb q2 22 n
bb q2 23 n
bb q3 22 n
bb q3 23 n
bb q4 22 n
bb q4 23 ok
Desired
id date stat
aa q1 22 y
aa q2 22 y
aa q3 22 y
aa q4 22 y
aa q1 23 y
aa q2 23 y
aa q3 23 ok
aa q4 23 n
bb q1 22 n
bb q2 22 n
bb q3 22 n
bb q4 22 n
bb q1 23 n
bb q2 23 n
bb q3 23 n
bb q4 23 ok
Doing
Since my data is in quarters, I am using this
import pandas as pd
pd.to_datetime(date).sort_values().to_period('Q')
However, I also need to group these by the 'id' column as the desired output shows. Any suggestion is appreciated
CodePudding user response:
Rename axis, split q to extract integer, sort by
df[['temp1','temp2']]=df['date'].str.split('\s', expand=True)
df=df.sort_values(by=['id','temp2']).drop(columns=['temp1', 'temp2'])
id date stat
0 aa q1 22 y
1 aa q2 22 y
2 aa q3 22 y
3 aa q4 22 y
4 aa q1 23 y
5 aa q2 23 y
6 aa q3 23 ok
7 aa q4 23 n
8 bb q1 22 n
9 bb q2 22 n
10 bb q3 22 n
11 bb q4 22 n
12 bb q1 23 n
13 bb q2 23 n
14 bb q3 23 n
15 bb q4 23 ok
CodePudding user response:
This should do the job:
import pandas as pd
pd.to_datetime(df['date'])
df.sort_values(by=['id', 'date'], inplace=True)
df.index = df['date']
df.index.to_period("Q")
Explanation:
df.sort_values(by=['id', 'date'], inplace=True)
will first sort your data on the id
column and then it'll sort that sorted data(i.e., on id
column) on date
column.