Home > OS >  Seaborn bar graph after group by to create top N x label in pandas
Seaborn bar graph after group by to create top N x label in pandas

Time:09-22

I want to create a seaborn bar graph, problem is before creating the bar graph I need to create database based on column value(descending). For further explanation below is the data type

City Complaint Type Value
ARVERNE Blocked Driveway 50
ARVERNE Derelict Vehicle 32
ARVERNE Disorderly Youth 2
ARVERNE Drinking 1
ASTORIA Animal Abuse 170
ASTORIA Bike/Roller/Skate Chronic 16
ASTORIA Blocked Driveway 3436
ASTORIA Derelict Vehicle 426
BAYSIDE Animal Abuse 53
BAYSIDE Blocked Driveway 514
BAYSIDE Derelict Vehicle 231
BAYSIDE Disorderly Youth 2
BELLEROSE Animal Abuse 15
BELLEROSE Bike/Roller/Skate Chronic 1
BELLEROSE Blocked Driveway 138
BELLEROSE Derelict Vehicle 120
BREEZY POINT Animal Abuse 2
BREEZY POINT Blocked Driveway 3
BREEZY POINT Derelict Vehicle 3
BREEZY POINT Illegal Parking 16

Now I want to create graph with top 3 city with complaint and there major complain type, so if I use top 2 complain type then my data should look like the below

City Complaint Type Value
ASTORIA Blocked Driveway 3436
ASTORIA Derelict Vehicle 426
BAYSIDE Blocked Driveway 514
BAYSIDE Derelict Vehicle 231
BELLEROSE Blocked Driveway 138
BELLEROSE Derelict Vehicle 120
ARVERNE Blocked Driveway 50
ARVERNE Derelict Vehicle 32
BREEZY POINT Illegal Parking 16
BREEZY POINT Derelict Vehicle 3

Here you can clearly see that data is sorted/group by City but values are in descending order, plus only 2 major complain are selected. Can you please help on how to build this data/or plot a graph in pandas

I have tried few code where I can select the top 2 complain by city but unable to sort the city based on the values. Even when I sort the data based on values then I lose the group by. Below is the code I am currently using

df1 = df.groupby(['City','Complaint Type']).size().reset_index(name = 'size')   
df2 = df1.sort_values(by = ['City', 'size'], ascending = [True, False]).groupby('City').head(3)

CodePudding user response:

To get the dataframe the way you want, you need to first get the top 3 cities. To do this, you can sort and then call unique() on city column. Then sort the dataframe by these 3 cities (using sort_value) and get the top 2 complaints (using groupby and head()). This will give you the top 6 rows. If you want them grouped by city, you can use pd.categorical() on the data. Finally plot the graph using seaborn catplot. Code is below.

#Get top 3 cities in a list
top3=df.sort_values(by ='Value', ascending = False)['City'].unique()[0:3]
print(top3)

#Filter by these cities and pick top2 entries
df1=df[df['City'].isin(top3)].sort_values('Value', ascending = False).groupby('City').head(2)

#If you want the data grouped by city, then use categorical ordering
df1['City'] = pd.Categorical(df1['City'], top3)
df1.sort_values('City', inplace=True)
print(df1)

#Finally plot your graph
sns.catplot(data=df1, kind='bar', x='Complaint Type', y='Value', hue='City')

Note that you can exchange x and hue if you want to plot by each city in X axis

Outputs

top3

['ASTORIA' 'BAYSIDE' 'BELLEROSE']

df1

         City    Complaint Type  Value
6     ASTORIA  Blocked Driveway   3436
7     ASTORIA  Derelict Vehicle    426
9     BAYSIDE  Blocked Driveway    514
10    BAYSIDE  Derelict Vehicle    231
14  BELLEROSE  Blocked Driveway    138
15  BELLEROSE  Derelict Vehicle    120

Plot

enter image description here

  • Related