My data frame looks like this:
|Months | Places | Sales_X | Sales_Y | Sales_Z |
|----------------------------------------------------|
|**month1 | Place1 | 10000 | 12000 | 13000 |
|month1 | Place2 | 300 | 200 | 1000 |
|month1 | Place3 | 350 | 1000 | 1200** |
|month2 | Place2 | 1400 | 12300 | 14000 |
|month2 | Place3 | 9000 | 8500 | 150 |
|month2 | Place1 | 90 | 4000 | 3000 |
|month3 | Place2 | 12350 | 8590 | 4000 |
|month3 | Place1 | 4500 | 7020 | 8800 |
|month3 | Place3 | 351 | 6500 | 4567 |
I need to find the highest number from the three sales columns by month and show the name of the place with the highest number.
I have been trying to solve it by using pandas.DataFrame.idxmax and groupby but it does not seem to work.
I created a new df with the highest number/row which may help
|Months | Places | Highest_sales|
|-----------------------------------|
|**month1| Place1 | 10000 |
|month1 | Place2 | 200 |
|month1 | Place3 | 350** |
| | | |
|**month2| Place2 | 1400 |
|month2 | Place3 | 150 |
|month2 | Place1 | 90** |
| | | |
|**month3| Place2 | 4000 |
|month3 | Place1 | 4500 |
|month3 | Place3 | 351** |
|-----------------------------------|
Now I just need the highest number/ month and the name of the place. When using groupby on two columns and getting the min of the Lowest_sales, the result
df.groupby(['Months', 'Places'])['Highest_sales'].max()
when I run this
Months Places Highest Sales
1 Place1 1549.0
Place2 2214.0
Place3 2074.0
...
12 Place1 1500.0
Place2 8090.0
Place3 2074.0
the format I am looking for would be
|**Months|Places |Highest Sales**|
|--------|--------------------------|---------------|
|Month1 |Place(*of highest sales*) |100000 |
|Month2 |Place(*of highest sales*) |900000 |
|Month3 |Place(*of highest sales*) |3232000 |
|Month4 |Place(*of highest sales*) |1300833 |
|.... | | |
|Month12 |Place(*of highest sales*) | |
-----------------------------------------------------
12 rows and 3 columns
CodePudding user response:
Use DataFrame.filter
for Sales
columns, create Highest
column adn then aggregate DataFrameGroupBy.idxmax
only for Months
and select rows and columns by list in DataFrame.loc
:
#columns with substring Sales
df1 = df.filter(like='Sales')
#or all columns from third position
#df1 = df.iloc[: 2:]
df['Highest'] = df1.min(axis=1)
df = df.loc[df.groupby('Months')['Highest'].idxmax(), ['Months','Places','Highest']]
print (df)
Months Places Highest
0 month1 Place1 10000
3 month2 Place2 1400
7 month3 Place1 4500