I have a CSV file that includes the results from a survey where the user was asked to answer their age (0 to 100) and their mood ( 0=happy 1=mid 2=sad). I intend to make a bar chart on python using matplotlib or any other graphing library with the number of people of each age on the y-axis and then a triple bar on the x-axis to show how many sad, happy and mid people there are for each age. The issue is in the CSV file there is no column that directly contains data about the count of the total number of people of each age, number of happy people of each age and number of sad people of each age etc. Any tips on how to tackle this issue would be very helpful. The table below shows a couple lines of the CSV file. Thanks
Age | Mood level |
---|---|
12 | 0 |
83 | 1 |
55 | 2 |
CodePudding user response:
Suppose we have the following dataframe:
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame(
{
"Age": [20, 16, 16, 20, 20, 16, 18, 18, 18, 20, 16, 16, 18, 18, 18, 20, 20],
"Mood Level": [0, 2, 1, 2, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 2, 2, 1],
}
)
Then we need to create some encoding based on the Mood Level, meaning we need to create columns Mood_Level_0
, Mood_Level_1
and Mood_Level_2
with values 0 (False) or 1 (True). This can be done via:
df = pd.concat([df, pd.get_dummies(df["Mood Level"], prefix="Mood_Level")], axis=1)
and will result in:
Age Mood level Mood_Level_0 Mood_Level_1 Mood_Level_2
0 20 0 1 0 0
1 16 2 0 0 1
2 16 1 0 1 0
3 20 2 0 0 1
4 20 0 1 0 0
5 16 1 0 1 0
6 18 0 1 0 0
7 18 1 0 1 0
8 18 1 0 1 0
9 20 0 1 0 0
10 16 0 1 0 0
11 16 1 0 1 0
12 18 0 1 0 0
13 18 1 0 1 0
14 18 2 0 0 1
15 20 2 0 0 1
16 20 1 0 1 0
Finally, we need to group by Age and sum the 1s for each of the above created columns:
grouped_per_age = df.groupby(["Age"], as_index=True,).agg(
mood_level_0=("Mood_Level_0", "sum"),
mood_level_1=("Mood_Level_1", "sum"),
mood_level_2=("Mood_Level_2", "sum"),
)
This will result in:
mood_level_0 mood_level_1 mood_level_2
Age
16 1 3 1
18 2 3 1
20 3 1 2
Plotting the above dataframe:
ax = grouped_per_age.plot.bar(rot=0)
plt.xlabel("Age")
plt.ylabel("Count")
plt.legend()
plt.show()