Home > Blockchain >  Create multiple barplots based off groupby conditions
Create multiple barplots based off groupby conditions

Time:10-10

I am trying to create mutliple horizontal barplots for a dataset. The data deals with race times from a running race.

Dataframe has the following columns: Name, Age Group, Finish Time, Finish Place, Hometown. Sample data below.

Name Age Group Finish Time Finish Place Hometown Times Ran The Race
John 30-39 15.5 1 New York City 2
Mike 30-39 17.2 2 Denver 1
Travis 40-49 20.4 1 Louisville 3
James 40-49 22.1 2 New York City 1

I would like to create a bar plot similar to what is shown below. There would be 1 bar chart per age group, fastest runner on bottom of chart, runner name with city and number of times ran the race below their name.

Do I need a for loop or would a simple groupby work? The number and sizing of each age group can be dynamic based off the race so it is not a constant, but would be dependent on the dataframe that is used for each race.

bar chart

CodePudding user response:

I employed a looping process. I use the extraction by age group as a temporary data frame, and then accumulate label information for multiple x-axis to prepare for reuse. The accumulated label information is decomposed into strings and stored in a new list. Next, draw a horizontal bar graph and update the labels on the x-axis.

for ag in df['Age Group'].unique():
    label_all = []
    tmp = df[df['Age Group'] == ag]
    labels = [[x,y,z] for x,y,z in zip(tmp.Name.values, tmp.Hometown.values, tmp['Times Ran The Race'].values)]
    for k in range(len(labels)):
        label_all.append(labels[k])
    l_all = []
    for l in label_all:
        lbl = l[0]   '\n'  l[1]   '\n'   str(l[2])   ' Time'
        l_all.append(lbl)    

    ax = tmp[['Name', 'Finish Time']].plot(kind='barh', legend=False)
    ax.set_title(ag  ' Age Group')
    ax.set_yticklabels([l_all[x] for x in range(len(l_all))])
    ax.grid(axis='x')
    for i in ['top','bottom','left','right']:
        ax.spines[i].set_visible(False)

enter image description here

CodePudding user response:

Here's a quite compact solution. Only tricky part is the ordinal number, if you really want to have that. I copied the lambda solution from Ordinal numbers replacement

Give this a try and please mark the answer with Up-button if you like it.

import matplotlib.pyplot as plt

ordinal = lambda n: "{}{}".format(n,"tsnrhtdd"[(n/10%10!=1)*(n%10<4)*n%10::4])

for i, a in enumerate(df['Age Group'].unique()):

    plt.figure(i)
    dfa = df.loc[df['Age Group'] == a].copy()
    dfa['Info'] = dfa.Name   '\n'   dfa.Hometown   '\n'   \
              [ordinal(row) for row in dfa['Times Ran The Race']]   ' Time'
    plt.barh(dfa.Info, dfa['Finish Time'])
    plt.title(f'{a} Age Group')
    plt.xlabel("Time (Minutes)")
  • Related