I have this table in excel which I am trying to analyze. I am not able to plot the number of S and D (in column 3) according to months (in Col4). I am plotting the number of S and D in col3 using the following. But how to plot the number of S & D according to the months. How to do that?
I would like to get two line plots showing the number of S and D respectively with the corresponding months on the X-axis.
#to plot the number of S and D in col3
df = pd.read_csv (r'C:\Users\data.csv', usecols = ['Col1','Col2','Col3','Col4'])
df['Col4'] = pd.to_datetime(df['Col4'], format="%m/%d/%Y").dt.date
df.head()
df1 = df[['Col3']].copy()
my_dict = df1['Col3'].value_counts().to_dict()
myList = my_dict.items()
x, y = zip(*myList)
plt.bar(x, y, color = "tomato")
plt.ylabel('Count')
plt.title('Outcome')
plt.show()
Col1 Col2 Col3 Col4
0 Y MA S 2/2/2022
1 N YJ D 4/25/2022
2 N YJ D 3/11/2022
3 N YJ D 4/28/2022
4 Y YJ D 4/21/2022
5 N YJ D 4/21/2022
6 Y WE D 5/25/2022
7 Y WE S 5/7/2022
8 N WE D 3/30/2022
9 N PR D 3/22/2022
10 Y PR S 3/22/2022
CodePudding user response:
The following should do what the OP wants given the data frame as posted:
df = pd.DataFrame({'Col1': ['Y', 'N', 'N', 'N', 'Y', 'N', 'Y', 'Y', 'N', 'N', 'Y'],
'Col2': ['MA', 'YJ', 'YJ', 'YJ', 'YJ', 'YJ', 'WE', 'WE', 'WE', 'PR', 'PR'],
'Col3': ['S', 'D', 'D', 'D', 'D', 'D', 'D', 'S', 'D', 'D', 'S'],
'Col4': ['2/2/2022', '4/25/2022', '3/11/2022', '4/28/2022', '4/21/2022',
'4/21/2022', '5/25/2022', '5/7/2022', '3/30/2022', '3/22/2022', '3/22/2022']})
- First extract the month from the dates in
Col4
(and sort ascending by month):
df.loc[:, 'Month'] = pd.to_datetime(df.Col4).dt.month
df = df.sort_values('Month', ascending=True)
Col1 Col2 Col3 Col4 Month
0 Y MA S 2/2/2022 2
2 N YJ D 3/11/2022 3
8 N WE D 3/30/2022 3
9 N PR D 3/22/2022 3
10 Y PR S 3/22/2022 3
1 N YJ D 4/25/2022 4
3 N YJ D 4/28/2022 4
4 Y YJ D 4/21/2022 4
5 N YJ D 4/21/2022 4
6 Y WE D 5/25/2022 5
7 Y WE S 5/7/2022 5
- Create a pivot table with
Month
as the index, the values ofCol3
(i.e.,S
andD
) as columns, and counts as the cell values:
df1 = df.groupby(['Month', 'Col3'])\
.size()\
.unstack(fill_value=0)\
.reset_index()
Col3 Month D S
0 2 0 1
1 3 3 1
2 4 4 0
3 5 1 1
- Plot the results
plt.plot(df1.Month, df1.D, label='D')
plt.plot(df1.Month, df1.S, label='S')
plt.xlabel('Month')
plt.ylabel('Count')
plt.legend()
plt.show()