I have the following grouped dataframe:
Value1 Value2
Category
------------------------------------
0 0 62 44
1 55 46
2 73 75
1 0 61 49
1 55 46
2 34 35
2 0 62 48
1 55 46
2 44 25
I want to, for each group, reorder the "Value1" column as ascending, while keeping the order of the "Category" column. The goal is that the "Category" 0 will correspond to the lowest "Value1" value and "Category" 5 will correspond to the highest "Value1" value. "Value2" values will correspond to the original "Value1" value they corresponded to. This is the output dataframe I want to produce:
Value1 Value2
Category
------------------------------------
0 0 55 46
1 62 44
2 73 75
1 0 34 35
1 55 46
2 61 49
2 0 44 25
1 55 46
2 62 48
How can I accomplish this in python? I have tried using .reset_index()
and `.sort_values(), but I am just not getting the grouped dataframe I want. I tried:
df.sort_values(['Value1'],ascending=True).groupby('Category')
but this just produces: <pandas.core.groupby.generic.DataFrameGroupBy object at ...>
which is not useful.
CodePudding user response:
One way using sort_values
with index name:
tmp = df.index.names
df.index.names = ["tmp", "Category"]
new_df = df.sort_values(["tmp", "Value1"])
new_df.index = df.index.rename(tmp)
print(new_df)
OUtput:
Value1 Value2
Category
0 0 55 46
1 62 44
2 73 75
1 0 34 35
1 55 46
2 61 49
2 0 44 25
1 55 46
2 62 48
CodePudding user response:
You can apply it as follows:
import pandas as pd
df = pd.DataFrame({'col1': [0, 1, 2, 0, 1, 2], 'col2': [8, 9, 6, 40, 3, 20], 'col3': [5, 6, 0, 40, 3, 20]})
sorted_df = df.sort_values(['col2'], ascending=True)
df[['col2', 'col3']] = sorted_df[['col2', 'col3']].values
print(df)
Output:
col1 col2 col3
0 0 3 3
1 1 6 0
2 2 8 5
3 0 9 6
4 1 20 20
5 2 40 40
CodePudding user response:
You can sort the dataframe on the Values and the first level of index:
>>> df = (df.sort_values(by=['Value1', 'Value2'])
.sort_index(level=0, sort_remaining=False)
)
Value1 Value2
Category
0 1 55 46
0 62 44
2 73 75
1 2 34 35
1 55 46
0 61 49
2 2 44 25
1 55 46
0 62 48
Then you need to rewrite the level1 using a cumcount
per group:
df.sort_values(by=['Value1', 'Value2']).sort_index(level=0, sort_remaining=False)
idx = pd.MultiIndex.from_arrays([df.index.get_level_values(0),
pd.Series(range(len(df))).groupby(df.index.get_level_values(0)).cumcount()],
names=(None, 'Category')
)
df.index = idx
output:
Value1 Value2
Category
0 0 55 46
1 62 44
2 73 75
1 0 34 35
1 55 46
2 61 49
2 0 44 25
1 55 46
2 62 48
CodePudding user response:
One line solution should be DataFrame.rename_axis
with DataFrame.sort_values
and DataFrame.set_index
:
df = df.rename_axis(index={None:'tmp'}).sort_values(['tmp', "Value1"]).set_index(df.index)
print (df)
Value1 Value2
Category
0 0 55 46
1 62 44
2 73 75
1 0 34 35
1 55 46
2 61 49
2 0 44 25
1 55 46
2 62 48