Home > Back-end >  How to build a pie chart, with plotly, using value_counts in pandas dataframe?
How to build a pie chart, with plotly, using value_counts in pandas dataframe?

Time:04-20

I have the following pandas dataframe:

import numpy as np
import pandas as pd
import plotly.express as px

df1 = pd.DataFrame({'Name_City': ['Paris', 'Rio', 'Madri', 
                                  'Lisboa', 'Santiago', 'Toronto',
                                  'Porto', 'Rio', 'Paris',
                                  'Paris', 'Paris', 'Rio',
                                  'Lisboa', 'Lisboa', 'Lisboa',
                                  'Lisboa', 'Rio', 'Madri'],
                   'First': ['P', 'R', 'M', 'L', 'S', 'T', 'P', 'R', 'P',
                             'P', 'P', 'R', 'L', 'L', 'L', 'L', 'R', 'M']})

print(df1)

          Name_City First
                 Paris        P
                 Rio          R
                 Madri        M
                 Lisboa       L
                 Santiago     S
                 Toronto      T
                 Porto        P
                 Rio          R
                 Paris        P
           Paris          P
                 Paris        P
                 Rio          R
                Lisboa        L
                Lisboa        L
                Lisboa        L
                Lisboa        L
                 Rio          R
                Madri         M

First I would like to count how many times each city appeared in the dataframe, so I did:

series = df1['Name_City'].value_counts()

df_result = pd.DataFrame(series)

df_result = df_result.reset_index()  

df_result.columns = ['City', 'Total']

print(df_result)

                City      Total
               Lisboa      5
                Rio        4
               Paris       4
               Madri       2
              Santiago     1
              Toronto      1
                Porto      1

Note: This first part is working as expected.

In the second part, I would like to build a graph (of the pie type) to represent the five cities that appeared the most. The other cities that appeared less I would like to appear on the chart as "other".

I tried to do the following:

df_result_part = df_result.head(5)
print(df_result_part)

                 City   Total
                  Lisboa      5
                   Rio        4
                  Paris       4
                  Madri       2
                 Santiago     1


    fig = px.pie(df_result_part,
                 values='Total',
                 names='City')

    fig.show()

I would need to group the ones that appeared the least and name them "others". I would like to ask how can I do this automatically?

The desired output is:

print(df_desired)

            City    Total
             Lisboa   5
            Rio       4
          Paris       4
          Madri       2
           Santiago   1
         Others       2


fig = px.pie(df_desired,
             values='Total',
             names='City')

fig.show()

CodePudding user response:

Starting from df_result, you can select the rows after the fifth one and replace all city names with "other". This will always work as by default value_counts returns its output is sorted order (decreasing counts):

df_result.loc[5:, 'City'] = 'other'

import plotly.express as px
fig = px.pie(df_result,
             values='Total',
             names='City')

output:

enter image description here

  • Related