- Suppose I have dataframe (approx., 10 columns) focused on three columns. Let's say 'A', 'B' & 'C'.
- 'A' column has some continuous value between any range. E.g., the price of any item is between 5-20 bucks.
- 'B' column is categorical. E.g., it has two categories, like 'Old', 'New'
- 'C' column is like a unique ID for that item.
- My motive is to find the top 10 items which should be sorted by their rank in price & rank should be separated by categories mentioned in column 'B'.
- Result is required in the plot (seaborn/matplotlib). Barplot should show top 10 IDs from column C, and each bar should have it's price from column A, this should be sorted rank-wise from higher price to lower price (plot should show bar FOR EACH CATEGORY FROM COLUMN B)
Someone please help to make related code in Python using Seaborn/Matplotlib libraries.Example table like below :
A B C
0 5.0 Old A001
1 6.2 New A002
2 10.0 Old A003
3 19.6 Old A004
4 12.0 Old A005
5 11.0 New A006
6 7.0 New A007
7 8.0 Old A008
8 7.0 New A009
9 5.0 New A010
10 17.0 Old A011
11 8.0 Old A012
12 12.0 Old A013
13 13.0 New A014
14 15.0 New A015
15 9.0 Old A016
16 9.0 New A017
17 10.0 Old A018
CodePudding user response:
new answer top 5 per group
df2 = df.loc[df.groupby('B', group_keys=False)['A'].nlargest(5).index]
df2.set_index('C')['A'].plot.bar(color=df2['B'].map({'New': 'r', 'Old': 'b'}).values)
output:
old answer
IIUC, you want the top ten items in A and sort the output by B and A:
df2 = df.loc[df['A'].nlargest(10).index].sort_values(by=['B', 'A'])
output:
A B C
5 11.0 New A006
13 13.0 New A014
14 15.0 New A015
15 9.0 Old A016
2 10.0 Old A003
17 10.0 Old A018
4 12.0 Old A005
12 12.0 Old A013
10 17.0 Old A011
3 19.6 Old A004
Then plot using:
df2.set_index('C')['A'].plot.bar(color=df2['B'].map({'New': 'r', 'Old': 'b'}).values)
output:
CodePudding user response:
Below is the basic code which fulfilled my requirement, however, later I used Seaborn's barplot instead.
import matplotlib.pyplot as plt
old = df[df['B']=='Old'].sort_values(by=['A'], ascending=False).head(5)
new = df[df['B']=='New'].sort_values(by=['A'], ascending=False).head(5)
fig, a = plt.subplots(1, 2, figsize=(4,4))
old.plot.bar('C', ax=a[0])
new.plot.bar('C', ax=a[1])