I have a data set of a tea export company and it includes total export and tea types and weight categories.
It looked like this
Date Type Weight Quantity Price
2016-01-01 black bags 1734136.51 1131.30
2016-01-01 black bulk 10722389.66 510.86
2016-01-01 black 4g_1kg 6817078.01 588.72
2016-01-01 black 1kg_3kg 86444.50 565.91
2016-01-01 black 3kg_5kg 1003986.73 552.39
Now that I have grouped the data with this
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date']).dt.date
df['YearMonth'] = df['Date'].map(lambda date: 100*date.year date.month)
df = df.groupby(['YearMonth','Type', 'Weight']).agg({'Quantity':'sum'})
And the dataframe now looks like this
YearMonth Type Weight Quantity
201601 black 1kg_3kg 86444.50
3kg_5kg 1003986.73
4g_1kg 6817078.01
5kg_10kg 2816810.33
bags 1734136.51
bulk 10722389.66
green 3kg_5kg 12.00
4g_1kg 53014.95
5kg_10kg 1132.00
bags 41658.19
bulk 112400.00
instant 4g_1kg 28.80
lt3kg 89486.40
201602 black 1kg_3kg 215539.60
Now I want to plot this in a graph. Date in x axis and type and weight in y axis. I tried the following code but it throws me an error.
fig, ax = plt.subplots(figsize=(10,4))
for key, df in df.groupby(['Quantity']):
ax.plot(df['Type'], df['Weight'], label=key)
ax.legend()
plt.show()
The error I'm getting is
KeyError Traceback (most recent call last)
Cell In[798], line 9
7 fig, ax = plt.subplots(figsize=(10,4))
8 for key, df in df.groupby(['Quantity']):
----> 9 ax.plot(df['Type'], df['Weight'], label=key)
11 ax.legend()
...
3807 # InvalidIndexError. Otherwise we fall through and re-raise
3808 # the TypeError.
3809 self._check_indexing_error(key)
KeyError: 'Type'
Can anyone tell me why this happens and the best way to graph a plot with the data frame I have grouped above?
CodePudding user response:
You have a multi-index after the groupby so there is no column label 'type'. You could select the group with index 'black' and yearmonth 201601 as follows and then plot as below; same could be done for different types using different symbols and marker colors.
df2 = df.loc[201601, 'black']
plt.scatter(df2.index, df2['Quantity'])
plt.show()
CodePudding user response:
Not too sure about the kind of visualisation you're after, but here is what I think might be of use. I used
However, it has a lot of information in one graph, it may look better if we use multiple graphs instead.
(
# Using `color` instead of `alpha` would be easier to distinguish different `Weight`s.
so.Plot(df_sample, x="YearMonth", y="Quantity", alpha="Weight")
.facet(row="Type")
.add(so.Area(), so.Stack())
.scale(
x=so.Continuous().tick(every=1).label(like="{x:.0f}"),
y=so.Continuous().label(like="{x:.0f}")
)
.layout(size=(10, 8))
)
Hope it helps!