Create function to perform value counts and plot data-CodePudding

I have data that looks like this :

import pandas as pd

data = {'Q7a':  ['Nonkopir', 'Nonkopir','Mara', 'Mara','Miami', 'Miami'],
        'Q8a': ['Littering', 'Littering','Littering', 'NAN','Littering','Littering',],
        'Q8b': ['Affect health', 'Affect health','NaN', 'Affect health','Affect health', 'Affect health',],
        'Q8c': ['NAN', 'Affect environ','Affect environ', 'Affect environ','Affect environ', 'Affect environ'],
        'Q8d': ['Others', 'NAN','Others', 'NAN','Others', 'Rodents',]
        }

df = pd.DataFrame (data, columns = ['Q7a','Q8a','Q8b','Q8c','Q8d'])
print (df)

With this data I have performed value_counts() to given columns but the challenge is I have to repeat the code to achieve the result I want for all columns, this is what I did:

waste_priority1= pd.DataFrame(waste_priority.groupby("Q7a")["Q8a"].value_counts()).rename(columns={'Q8a': 'counts'}).reset_index()
waste_priority2= pd.DataFrame(waste_priority.groupby("Q7a")["Q8b"].value_counts()).rename(columns={'Q8b': 'counts'}).reset_index()
waste_priority3= pd.DataFrame(waste_priority.groupby("Q7a")["Q8c"].value_counts()).rename(columns={'Q8c': 'counts'}).reset_index()
waste_priority4= pd.DataFrame(waste_priority.groupby("Q7a")["Q8d"].value_counts()).rename(columns={'Q8d': 'counts'}).reset_index()

There after I plot a bargraph with same repetition of lines to get different bars in same chart.Here is how the plotting code looks:

fig3 = go.Figure(data=[
    go.Bar(name='Littering smells and looks bad', x=waste_priority1.Q7a, y=waste_priority1.counts,text=waste_priority1.counts,textposition='auto'),
    go.Bar(name='Effect on human health', x=waste_priority2.Q7a, y=waste_priority2.counts,text=waste_priority2.counts,textposition='auto'),
    go.Bar(name='Effect on environment', x=waste_priority3.Q7a, y=waste_priority3.counts,text=waste_priority3.counts,textposition='auto'),
    go.Bar(name='Others', x=waste_priority4.Q7a, y=waste_priority4.counts,text=waste_priority4.counts,textposition='auto'),
    ])

# Change the bar mode
fig.update_layout(title_text="Waste prioty per Estate",
    barmode='group',
                  legend=dict(
   orientation="h",
    yanchor="bottom",
    y=1.02,
    xanchor="right",
    x=1
))

fig.show()

I dont think this coding is clean and I feel this lines can be shortened instead of repetition, especially if I can use a function. How can I be able to make this shorter and clean, How can I create a function to perform all this quicklY?

CodePudding user response：

switch to Plotly Express. Need to construct dataframe that is well structured for Plotly Express
only loop / repetition is structuring dfp
1. your code is doing groupby()' and value_counts()` for all columns except Q7a
2. have restructured you pandas code to be a bit simpler. Rename can be achieved through renaming the series
3. new column q is the source column
the legend text is a dict mapping of column name to text. Put this in dataframe as well
then it's a simple case of building traces with px.bar()

import pandas as pd
import plotly.express as px

data = {'Q7a':  ['Nonkopir', 'Nonkopir','Mara', 'Mara','Miami', 'Miami'],
        'Q8a': ['Littering', 'Littering','Littering', 'NAN','Littering','Littering',],
        'Q8b': ['Affect health', 'Affect health','NaN', 'Affect health','Affect health', 'Affect health',],
        'Q8c': ['NAN', 'Affect environ','Affect environ', 'Affect environ','Affect environ', 'Affect environ'],
        'Q8d': ['Others', 'NAN','Others', 'NAN','Others', 'Rodents',]
        }

waste_priority = pd.DataFrame (data, columns = ['Q7a','Q8a','Q8b','Q8c','Q8d'])

dfp = pd.concat(
    [
        waste_priority.groupby("Q7a")[c]
        .value_counts()
        .rename("counts")
        .reset_index()
        .assign(q=c)
        for c in waste_priority.columns
        if c[0:2] == "Q8"
    ]
)
dfp["q_name"] = dfp["q"].map(
    {
        "Q8a": "Littering smells and looks bad",
        "Q8b": "Effect on human health",
        "Q8c": "Effect on environment",
        "Q8d": "Others",
    }
)
px.bar(
    dfp, x="Q7a", y="counts", text="counts", color="q_name", barmode="group"
).update_layout(
    title_text="Waste prioty per Estate",
    barmode="group",
    xaxis_title="",
    yaxis_title="",
    legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1, title=""),
)