Home > Software design >  How can I sort each row data using Pandas
How can I sort each row data using Pandas

Time:06-24

Date        Product  O1_Name  O1_Price  O1_Qty  O2_Name  O2_Price  O2_Qty
05-12-2021  MK       Widget1   0.99      23     Widget2    1.50      80
05-12-2021  MK       Widget2   1.50      85     Widget1    0.99      25
05-11-2021  MK       Widget1   0.99      28     Widget2    1.50      85
05-10-2021  MK       Widget1   0.99      31     Widget2    1.50      95
05-13-2021  PS       WidgetA   0.52      49     WidgetB    0.86      43
05-12-2021  PS       WidgetA   0.52      53     WidgetB    0.86      43
05-10-2021  PS       WidgetB   0.85      66     WidgetA    0.58      60
05-13-2021  AY       WidgetZ   0.15      87     
05-12-2021  AY       WidgetZ   0.15      88     

How can I sort these so that I will get O1 (option 1) Name, Price and Quantity sorted as my data seems to jump sometimes and they don't remain the same column. Please note that O3 O4 also exists and may or may not exist in the dataset (blank)

Forgot to add: Sorting is based on O1_Name O2_Name O3_Name with the Price and Qty following it

RESULT REQUIRED:

    Date    Product  O1_Name  O1_Price  O1_Qty  O2_Name  O2_Price  O2_Qty
05-12-2021  MK       Widget1   0.99      23     Widget2    1.50      80
05-12-2021  MK       Widget1   0.99      25     Widget2    1.50      85
05-11-2021  MK       Widget1   0.99      28     Widget2    1.50      85
05-10-2021  MK       Widget1   0.99      31     Widget2    1.50      95
05-13-2021  PS       WidgetA   0.52      49     WidgetB    0.86      43
05-12-2021  PS       WidgetA   0.52      53     WidgetB    0.86      43
05-10-2021  PS       WidgetA   0.58      60     WidgetB    0.85      66
05-13-2021  AY       WidgetZ   0.15      87     
05-12-2021  AY       WidgetZ   0.15      88   

Sort Horizontally

CodePudding user response:

If you want to sort individually, try:

df.sort_values(by=['O1_Name'])  # or 'O1_Price'

If you would like to sort by all three together, you can:

df.sort_values(by=['O1_Name', 'O1_Price', 'O1_Qty'])

note that O1_Name will be the first importance for the sorting, and O1_Qty least important.

CodePudding user response:

You can do some fancy reshaping sorting and reshaping again:

df.columns = [('_').join(x) for x in df.columns.str.split('_').str[::-1]]

dfm = pd.wide_to_long(df.reset_index(), 
                ['Name', 'Price', 'Qty'], 
                i=['index', 'Date', 'Product'], 
                j='No',
                sep='_',
                suffix='.*')

dfm = dfm.sort_values(['index', 'Date', 'Product', 'Price'])\
   .reset_index('No', drop=True)

dfm = dfm.set_index('O'   (dfm.groupby(['index', 'Date', 'Product']).cumcount() 1).astype(str), 
                    append=True)

dfm  = dfm.unstack().sort_index(level=1, axis=1)
dfm.columns = dfm.columns.map('_'.join)
df_out = dfm.reset_index()
print(df_out)

Output:

   index        Date Product  Name_O1  Price_O1  Qty_O1  Name_O2  Price_O2  Qty_O2
0      0  05-12-2021      MK  Widget1      0.99    23.0  Widget2      1.50    80.0
1      1  05-12-2021      MK  Widget1      0.99    25.0  Widget2      1.50    85.0
2      2  05-11-2021      MK  Widget1      0.99    28.0  Widget2      1.50    85.0
3      3  05-10-2021      MK  Widget1      0.99    31.0  Widget2      1.50    95.0
4      4  05-13-2021      PS  WidgetA      0.52    49.0  WidgetB      0.86    43.0
5      5  05-12-2021      PS  WidgetA      0.52    53.0  WidgetB      0.86    43.0
6      6  05-10-2021      PS  WidgetA      0.58    60.0  WidgetB      0.85    66.0
7      7  05-13-2021      AY  WidgetZ      0.15    87.0     None       NaN     NaN
8      8  05-12-2021      AY  WidgetZ      0.15    88.0     None       NaN     NaN
  • Related