Home > Software design >  Pandas dataframe plot for grouped data
Pandas dataframe plot for grouped data

Time:02-05

I have a data set of a tea export company and it includes total export and tea types and weight categories.

It looked like this

Date        Type    Weight    Quantity      Price
2016-01-01  black   bags      1734136.51    1131.30
2016-01-01  black   bulk      10722389.66   510.86
2016-01-01  black   4g_1kg    6817078.01    588.72
2016-01-01  black   1kg_3kg   86444.50      565.91
2016-01-01  black   3kg_5kg   1003986.73    552.39

Now that I have grouped the data with this

df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date']).dt.date
df['YearMonth'] = df['Date'].map(lambda date: 100*date.year   date.month)

df = df.groupby(['YearMonth','Type', 'Weight']).agg({'Quantity':'sum'})

And the dataframe now looks like this

YearMonth   Type    Weight     Quantity
201601      black   1kg_3kg    86444.50
                    3kg_5kg    1003986.73
                    4g_1kg     6817078.01
                    5kg_10kg   2816810.33
                    bags       1734136.51
                    bulk       10722389.66
            green   3kg_5kg    12.00
                    4g_1kg     53014.95
                    5kg_10kg   1132.00
                    bags       41658.19
                    bulk       112400.00
            instant 4g_1kg     28.80
                    lt3kg      89486.40
201602      black   1kg_3kg    215539.60

Now I want to plot this in a graph. Date in x axis and type and weight in y axis. I tried the following code but it throws me an error.

fig, ax = plt.subplots(figsize=(10,4))
for key, df in df.groupby(['Quantity']):
    ax.plot(df['Type'], df['Weight'], label=key)

ax.legend()
plt.show()

The error I'm getting is

KeyError                                  Traceback (most recent call last)
Cell In[798], line 9
      7 fig, ax = plt.subplots(figsize=(10,4))
      8 for key, df in df.groupby(['Quantity']):
----> 9     ax.plot(df['Type'], df['Weight'], label=key)
     11 ax.legend()
...
   3807     #  InvalidIndexError. Otherwise we fall through and re-raise
   3808     #  the TypeError.
   3809     self._check_indexing_error(key)

KeyError: 'Type'

Can anyone tell me why this happens and the best way to graph a plot with the data frame I have grouped above?

CodePudding user response:

You have a multi-index after the groupby so there is no column label 'type'. You could select the group with index 'black' and yearmonth 201601 as follows and then plot as below; same could be done for different types using different symbols and marker colors.

df2 = df.loc[201601, 'black']
plt.scatter(df2.index, df2['Quantity'])

plt.show()

CodePudding user response:

Not too sure about the kind of visualisation you're after, but here is what I think might be of use. I used enter image description here

However, it has a lot of information in one graph, it may look better if we use multiple graphs instead.

(
    # Using `color` instead of `alpha` would be easier to distinguish different `Weight`s.
    so.Plot(df_sample, x="YearMonth", y="Quantity", alpha="Weight")
    .facet(row="Type")
    .add(so.Area(), so.Stack())
    .scale(
        x=so.Continuous().tick(every=1).label(like="{x:.0f}"),
        y=so.Continuous().label(like="{x:.0f}")
    )
    .layout(size=(10, 8))
)

enter image description here

Hope it helps!

  • Related