Home > other >  Plots, by Label, frequency of words
Plots, by Label, frequency of words

Time:11-02

I need to create separates plot based on a label. My dataset is

    Label   Word    Frequency
439 10.0    glass   600
471 10.0    tv      34
463 10.0    screen  31
437 10.0    laptop  15
454 10.0    info    15
65  -1.0    dog 1
68  -1.0    cat 1
69  -1.0    win 1
70  -2.0    man 1
71  -2.0    woman   1

In this case I would expect three plots, one for 10, one for -1 and one for -2, with on the x axis Word column and on the y-axis the Frequency (it is already sorted in descending order by Label).

I have tried as follows:

df['Word'].hist(by=df['Label'])

But it seems to be wrong as the output is far away from the expected one.

Any help would be great

CodePudding user response:

You don't want to be using a histogram here: a histogram plot is where the columns of your dataframe contain raw data, and the hist function buckets the raw values and finds the frequencies of each bucket, and then plots.

Your dataframe is already bucketed, with a column in which the frequencies have already been calculated; what you need is the df.plot.bar() method. Unfortunately, this is quite new, and does not yet allow a by parameter, so you have to deal with the subplots manually.

Full walkthrough code for the cut-down example you have provided follows. Obviously you can make it more generic by not hardcoding the number of subplots required in the line marked [1].

# Set up:
import matplotlib.pyplot as plt
import pandas as pd
import io
txt = """Label,Word,Frequency
10.0,glass,600
10.0,tv,34
10.0,screen,31
10.0,laptop,15
10.0,info,15
-1.0,dog,1
-1.0,cat,1
-1.0,win,1
-2.0,man,1
-2.0,woman,1"""
dfA = pd.read_csv((io.StringIO(txt)))

labels = dfA["Label"].unique()

# Set up subplots on which to plot.
# Make more generic by not hardcoding nrows and ncols in [1],
# but calculating them depending on how many labels you have.
fig, axes = plt.subplots(nrows=2, ncols=2) # [1]
ax_list = axes.flatten() # axes is a list of lists;
                         # ax_list is a simple list which is easier to index.

# Loop through labels and plot the bar chart to the corresponding axis object.
for i in range(len(labels)):
  dfA[dfA["Label"]==labels[i]].plot.bar(x="Word", y="Frequency", ax=ax_list[i])
  • Related