I have a dataframe, df, containing information about a company, the country they are located in, and the year they were founded. I now need to plot the development of the amount of companies founded per country for each year in the dataset (between 1995 - 2015) in a line, however all I manage to create is a pie chart with the total companies funded per country, but not including the year_founded information.
The data looks like this:
Company | Country | Year_founded |
---|---|---|
A | USA | 1996 |
B | NLD | 2004 |
C | CAN | 2014 |
D | USA | 2000 |
E | NLD | 1999 |
F | CAN | 2000 |
etc. |
Ideally I would like to plot the total amount of companies per country in a line chart with different lines per country.
Any ideas on how to approach this problem?
CodePudding user response:
IIUC, you can use crosstab
and plot.line
:
ax = pd.crosstab(df['Year_founded'], df['Country']).plot.line()
ax.set_ylabel('Number of founded companies')
from matplotlib.ticker import MaxNLocator
ax.xaxis.set_major_locator(MaxNLocator(integer=True))
output:
crosstab:
Country CAN NLD USA
Year_founded
1996 0 0 1
1999 0 1 0
2000 1 0 1
2004 0 1 0
2014 1 0 0
CodePudding user response:
You could use groupby
and reindex
so that all years from 1995-2015 are in your graph:
data = df.groupby(["Country", "Year_founded"])["Company"].count().unstack(0).reindex(range(1995,2016)).fillna(0)
>>> data.plot()
>>> data
Country CAN NLD USA
Year_founded
1995 0.0 0.0 0.0
1996 0.0 0.0 1.0
1997 0.0 0.0 0.0
1998 0.0 0.0 0.0
1999 0.0 1.0 0.0
2000 1.0 0.0 1.0
2001 0.0 0.0 0.0
2002 0.0 0.0 0.0
2003 0.0 0.0 0.0
2004 0.0 1.0 0.0
2005 0.0 0.0 0.0
2006 0.0 0.0 0.0
2007 0.0 0.0 0.0
2008 0.0 0.0 0.0
2009 0.0 0.0 0.0
2010 0.0 0.0 0.0
2011 0.0 0.0 0.0
2012 0.0 0.0 0.0
2013 0.0 0.0 0.0
2014 1.0 0.0 0.0
2015 0.0 0.0 0.0