Home > Blockchain >  Groupby 2 columns and find .min of multiple other columns (python pandas)
Groupby 2 columns and find .min of multiple other columns (python pandas)

Time:02-11

My data frame looks like this:

|Months   |  Places | Sales_X |   Sales_Y |  Sales_Z |
|----------------------------------------------------|
|**month1 |  Place1 | 10000   |   12000   |  13000   |
|month1   |  Place2 |  300    |     200   |   1000   |
|month1   |  Place3 |  350    |    1000   |  1200**  | 
|month2   |  Place2 |  1400   |    12300  |  14000   |
|month2   |  Place3 |  9000   |    8500   |    150   |
|month2   |  Place1 |  90     |    4000   |   3000   |
|month3   |  Place2 |  12350  |    8590   |   4000   |
|month3   |  Place1 | 4500    |    7020   |   8800   |
|month3   |  Place3 |  351    |    6500   |   4567   |

I need to find the highest number from the three sales columns by month and show the name of the place with the highest number.

I have been trying to solve it by using pandas.DataFrame.idxmax and groupby but it does not seem to work.

I created a new df with the highest number/row which may help

|Months  |  Places  |  Highest_sales|
|-----------------------------------|
|**month1|   Place1 |    10000      |
|month1  |   Place2 |      200      |
|month1  |   Place3 |      350**    |
|        |          |               |
|**month2|   Place2 |     1400      |
|month2  |   Place3 |      150      |
|month2  |   Place1 |      90**     |
|        |          |               |
|**month3|  Place2  |     4000      |
|month3  |  Place1  |     4500      |
|month3  |   Place3 |     351**     |
|-----------------------------------|

Now I just need the highest number/ month and the name of the place. When using groupby on two columns and getting the min of the Lowest_sales, the result

df.groupby(['Months', 'Places'])['Highest_sales'].max()

when I run this

Months  Places        Highest Sales

1       Place1          1549.0
        Place2          2214.0
        Place3          2074.0
                    ...  
12      Place1          1500.0
        Place2          8090.0
        Place3          2074.0

the format I am looking for would be

|**Months|Places                    |Highest Sales**|
|--------|--------------------------|---------------|
|Month1  |Place(*of highest sales*) |100000         |
|Month2  |Place(*of highest sales*) |900000         |
|Month3  |Place(*of highest sales*) |3232000        |
|Month4  |Place(*of highest sales*) |1300833        |
|....    |                          |               |
|Month12 |Place(*of highest sales*) |               |
-----------------------------------------------------

12 rows and 3 columns

CodePudding user response:

Use DataFrame.filter for Sales columns, create Highest column adn then aggregate DataFrameGroupBy.idxmax only for Months and select rows and columns by list in DataFrame.loc:

#columns with substring Sales
df1 = df.filter(like='Sales')
#or all columns from third position
#df1 = df.iloc[: 2:]

df['Highest'] = df1.min(axis=1)
df = df.loc[df.groupby('Months')['Highest'].idxmax(), ['Months','Places','Highest']]

print (df)
   Months  Places  Highest
0  month1  Place1    10000
3  month2  Place2     1400
7  month3  Place1     4500
  • Related