I have data that looks like this :
import pandas as pd
data = {'Q7a': ['Nonkopir', 'Nonkopir','Mara', 'Mara','Miami', 'Miami'],
'Q8a': ['Littering', 'Littering','Littering', 'NAN','Littering','Littering',],
'Q8b': ['Affect health', 'Affect health','NaN', 'Affect health','Affect health', 'Affect health',],
'Q8c': ['NAN', 'Affect environ','Affect environ', 'Affect environ','Affect environ', 'Affect environ'],
'Q8d': ['Others', 'NAN','Others', 'NAN','Others', 'Rodents',]
}
df = pd.DataFrame (data, columns = ['Q7a','Q8a','Q8b','Q8c','Q8d'])
print (df)
With this data I have performed value_counts() to given columns but the challenge is I have to repeat the code to achieve the result I want for all columns, this is what I did:
waste_priority1= pd.DataFrame(waste_priority.groupby("Q7a")["Q8a"].value_counts()).rename(columns={'Q8a': 'counts'}).reset_index()
waste_priority2= pd.DataFrame(waste_priority.groupby("Q7a")["Q8b"].value_counts()).rename(columns={'Q8b': 'counts'}).reset_index()
waste_priority3= pd.DataFrame(waste_priority.groupby("Q7a")["Q8c"].value_counts()).rename(columns={'Q8c': 'counts'}).reset_index()
waste_priority4= pd.DataFrame(waste_priority.groupby("Q7a")["Q8d"].value_counts()).rename(columns={'Q8d': 'counts'}).reset_index()
There after I plot a bargraph with same repetition of lines to get different bars in same chart.Here is how the plotting code looks:
fig3 = go.Figure(data=[
go.Bar(name='Littering smells and looks bad', x=waste_priority1.Q7a, y=waste_priority1.counts,text=waste_priority1.counts,textposition='auto'),
go.Bar(name='Effect on human health', x=waste_priority2.Q7a, y=waste_priority2.counts,text=waste_priority2.counts,textposition='auto'),
go.Bar(name='Effect on environment', x=waste_priority3.Q7a, y=waste_priority3.counts,text=waste_priority3.counts,textposition='auto'),
go.Bar(name='Others', x=waste_priority4.Q7a, y=waste_priority4.counts,text=waste_priority4.counts,textposition='auto'),
])
# Change the bar mode
fig.update_layout(title_text="Waste prioty per Estate",
barmode='group',
legend=dict(
orientation="h",
yanchor="bottom",
y=1.02,
xanchor="right",
x=1
))
fig.show()
I dont think this coding is clean and I feel this lines can be shortened instead of repetition, especially if I can use a function. How can I be able to make this shorter and clean, How can I create a function to perform all this quicklY?
CodePudding user response:
- switch to Plotly Express. Need to construct dataframe that is well structured for Plotly Express
- only loop / repetition is structuring
dfp
- your code is doing
groupby()' and
value_counts()` for all columns except Q7a - have restructured you pandas code to be a bit simpler. Rename can be achieved through renaming the series
- new column q is the source column
- your code is doing
- the legend text is a dict mapping of column name to text. Put this in dataframe as well
- then it's a simple case of building traces with
px.bar()
import pandas as pd
import plotly.express as px
data = {'Q7a': ['Nonkopir', 'Nonkopir','Mara', 'Mara','Miami', 'Miami'],
'Q8a': ['Littering', 'Littering','Littering', 'NAN','Littering','Littering',],
'Q8b': ['Affect health', 'Affect health','NaN', 'Affect health','Affect health', 'Affect health',],
'Q8c': ['NAN', 'Affect environ','Affect environ', 'Affect environ','Affect environ', 'Affect environ'],
'Q8d': ['Others', 'NAN','Others', 'NAN','Others', 'Rodents',]
}
waste_priority = pd.DataFrame (data, columns = ['Q7a','Q8a','Q8b','Q8c','Q8d'])
dfp = pd.concat(
[
waste_priority.groupby("Q7a")[c]
.value_counts()
.rename("counts")
.reset_index()
.assign(q=c)
for c in waste_priority.columns
if c[0:2] == "Q8"
]
)
dfp["q_name"] = dfp["q"].map(
{
"Q8a": "Littering smells and looks bad",
"Q8b": "Effect on human health",
"Q8c": "Effect on environment",
"Q8d": "Others",
}
)
px.bar(
dfp, x="Q7a", y="counts", text="counts", color="q_name", barmode="group"
).update_layout(
title_text="Waste prioty per Estate",
barmode="group",
xaxis_title="",
yaxis_title="",
legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1, title=""),
)