Home > Back-end >  How to create a bar graph for 2 different variables in a column in pandas
How to create a bar graph for 2 different variables in a column in pandas

Time:08-03

I would like to plot a bar graph using relevant libraries, matplotlib or seaborn or pandas.
Two bar graphs that show the total number of both, American Airline & American Eagle Airline flights, for every year.
But not a stacked bar plot.

My current df looks like this:

df = pd.DataFrame({'Date':['2005-07-01','2005-07-01','2005-07-01','2005-08-01',
                           '2007-08-01', '2007-22-04', '2008-07-06'],
                   'Flight Name':['American Airline','American Airline','American Airline','American Eagle Airline',
                                  'American Eagle Airline','American Airline','American Eagle Airline'],
                   'GEO Summary':['Domestic','Domestic','Domestic','International',
                                  'International','Domestic','International'],
                   'Flight Name Variable':[1,1,1,0,0,1,0]})
Date Flight Name GEO Summary Flight Name Variable
2005-07-01 American Airline Domestic 1
2005-07-01 American Airline Domestic 1
2005-07-01 American Airline Domestic 1
2006-08-01 American Eagle Airline International 0
2007-08-01 American Eagle Airline International 0
2007-22-04 American Airline Domestic 1
2008-07-06 American Eagle Airline International 0

What I have tried so far and its not working;

ax = df['Flight Name Variable'].value_counts().plot.bar(color=["SkyBlue","IndianRed"], rot=0, title="test")

plt.tight_layout()
plt.show()

I cant seem to find a solution to get the yearly dates to display on the x-axis. Any suggestions.

CodePudding user response:

Are you looking for something like this? It is using matplotlib.pyplot and seaborn and shows the total count of those flights on that date:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.DataFrame({'Date':['2005-07-01','2005-07-01','2005-07-01','2005-08-01','2005-07-01', '2005-08-01', '2005-08-01'],
                   'Flight Name':['American Airline','American Airline','American Airline','American Eagle Airline','American Eagle Airline','American Airline','American Eagle Airline'],
                   'GEO Summary':['Domestic','Domestic','Domestic','International','International','Domestic','International'],
                   'Flight Name Variable':[1,1,1,0,0,1,0]})
df.Date = pd.to_datetime(df.Date).dt.date # Convert the string dates to actual datetime values
grouped = df.groupby(by=["Flight Name", "Date"], as_index=False).count()
sns.barplot(data=grouped, x="Date", y="Flight Name Variable", hue="Flight Name")
plt.title("Flights")
plt.show()

Output:

enter image description here

Or with more grouping/variables for the hue setting:

df = pd.DataFrame({'Date':['2005-07-01','2005-07-01','2005-07-01','2005-08-01','2005-07-01', '2005-08-01', '2005-08-01'],
                   'Flight Name':['American Airline','American Airline','American Airline','American Eagle Airline','American Eagle Airline','American Airline','American Eagle Airline'],
                   'GEO Summary':['International','Domestic','Domestic','International','International','Domestic','International'],
                   'Flight Name Variable':[1,1,1,0,0,1,0]})
df.Date = pd.to_datetime(df.Date).dt.date # Convert the string dates to actual datetime values
grouped = df.groupby(by=["Flight Name", "Date", "GEO Summary"], as_index=False).count()

# Define your Hue to be both the Flight Name and the GEO Summary and place them in a nice looking format with lambda
hue = grouped[['Flight Name', 'GEO Summary']].apply(lambda row: f"{row['Flight Name']}, {row['GEO Summary']}", axis=1)
sns.barplot(data=grouped, x="Date", y="Flight Name Variable", hue=hue)
plt.title("Flights")
plt.legend(loc="best")
plt.show()

Outputenter image description here

  • Related