I want to create a seaborn bar graph, problem is before creating the bar graph I need to create database based on column value(descending). For further explanation below is the data type
City | Complaint Type | Value |
---|---|---|
ARVERNE | Blocked Driveway | 50 |
ARVERNE | Derelict Vehicle | 32 |
ARVERNE | Disorderly Youth | 2 |
ARVERNE | Drinking | 1 |
ASTORIA | Animal Abuse | 170 |
ASTORIA | Bike/Roller/Skate Chronic | 16 |
ASTORIA | Blocked Driveway | 3436 |
ASTORIA | Derelict Vehicle | 426 |
BAYSIDE | Animal Abuse | 53 |
BAYSIDE | Blocked Driveway | 514 |
BAYSIDE | Derelict Vehicle | 231 |
BAYSIDE | Disorderly Youth | 2 |
BELLEROSE | Animal Abuse | 15 |
BELLEROSE | Bike/Roller/Skate Chronic | 1 |
BELLEROSE | Blocked Driveway | 138 |
BELLEROSE | Derelict Vehicle | 120 |
BREEZY POINT | Animal Abuse | 2 |
BREEZY POINT | Blocked Driveway | 3 |
BREEZY POINT | Derelict Vehicle | 3 |
BREEZY POINT | Illegal Parking | 16 |
Now I want to create graph with top 3 city with complaint and there major complain type, so if I use top 2 complain type then my data should look like the below
City | Complaint Type | Value |
---|---|---|
ASTORIA | Blocked Driveway | 3436 |
ASTORIA | Derelict Vehicle | 426 |
BAYSIDE | Blocked Driveway | 514 |
BAYSIDE | Derelict Vehicle | 231 |
BELLEROSE | Blocked Driveway | 138 |
BELLEROSE | Derelict Vehicle | 120 |
ARVERNE | Blocked Driveway | 50 |
ARVERNE | Derelict Vehicle | 32 |
BREEZY POINT | Illegal Parking | 16 |
BREEZY POINT | Derelict Vehicle | 3 |
Here you can clearly see that data is sorted/group by City but values are in descending order, plus only 2 major complain are selected. Can you please help on how to build this data/or plot a graph in pandas
I have tried few code where I can select the top 2 complain by city but unable to sort the city based on the values. Even when I sort the data based on values then I lose the group by. Below is the code I am currently using
df1 = df.groupby(['City','Complaint Type']).size().reset_index(name = 'size')
df2 = df1.sort_values(by = ['City', 'size'], ascending = [True, False]).groupby('City').head(3)
CodePudding user response:
To get the dataframe the way you want, you need to first get the top 3 cities. To do this, you can sort and then call unique()
on city column. Then sort the dataframe by these 3 cities (using sort_value
) and get the top 2 complaints (using groupby
and head()
). This will give you the top 6 rows. If you want them grouped by city, you can use pd.categorical()
on the data. Finally plot the graph using seaborn catplot. Code is below.
#Get top 3 cities in a list
top3=df.sort_values(by ='Value', ascending = False)['City'].unique()[0:3]
print(top3)
#Filter by these cities and pick top2 entries
df1=df[df['City'].isin(top3)].sort_values('Value', ascending = False).groupby('City').head(2)
#If you want the data grouped by city, then use categorical ordering
df1['City'] = pd.Categorical(df1['City'], top3)
df1.sort_values('City', inplace=True)
print(df1)
#Finally plot your graph
sns.catplot(data=df1, kind='bar', x='Complaint Type', y='Value', hue='City')
Note that you can exchange x and hue if you want to plot by each city in X axis
Outputs
top3
['ASTORIA' 'BAYSIDE' 'BELLEROSE']
df1
City Complaint Type Value
6 ASTORIA Blocked Driveway 3436
7 ASTORIA Derelict Vehicle 426
9 BAYSIDE Blocked Driveway 514
10 BAYSIDE Derelict Vehicle 231
14 BELLEROSE Blocked Driveway 138
15 BELLEROSE Derelict Vehicle 120
Plot