Home > Enterprise >  Plot the number of instances in first column with respect to second column in python?
Plot the number of instances in first column with respect to second column in python?

Time:06-17

I have this table in excel which I am trying to analyze. I am not able to plot the number of S and D (in column 3) according to months (in Col4). I am plotting the number of S and D in col3 using the following. But how to plot the number of S & D according to the months. How to do that?

I would like to get two line plots showing the number of S and D respectively with the corresponding months on the X-axis.

#to plot the number of S and D in col3

df = pd.read_csv (r'C:\Users\data.csv', usecols = ['Col1','Col2','Col3','Col4'])
df['Col4'] = pd.to_datetime(df['Col4'], format="%m/%d/%Y").dt.date
df.head()

df1 = df[['Col3']].copy()
my_dict = df1['Col3'].value_counts().to_dict()  
myList = my_dict.items()
x, y = zip(*myList) 
plt.bar(x, y, color = "tomato")
plt.ylabel('Count')
plt.title('Outcome')
plt.show() 


    Col1    Col2    Col3    Col4
0   Y   MA  S   2/2/2022
1   N   YJ  D   4/25/2022
2   N   YJ  D   3/11/2022
3   N   YJ  D   4/28/2022
4   Y   YJ  D   4/21/2022
5   N   YJ  D   4/21/2022
6   Y   WE  D   5/25/2022
7   Y   WE  S   5/7/2022
8   N   WE  D   3/30/2022
9   N   PR  D   3/22/2022
10  Y   PR  S   3/22/2022

enter image description here

CodePudding user response:

The following should do what the OP wants given the data frame as posted:

df = pd.DataFrame({'Col1': ['Y', 'N', 'N', 'N', 'Y', 'N', 'Y', 'Y', 'N', 'N', 'Y'],
                   'Col2': ['MA', 'YJ', 'YJ', 'YJ', 'YJ', 'YJ', 'WE', 'WE', 'WE', 'PR', 'PR'],
                   'Col3': ['S', 'D', 'D', 'D', 'D', 'D', 'D', 'S', 'D', 'D', 'S'],
                   'Col4': ['2/2/2022', '4/25/2022', '3/11/2022', '4/28/2022', '4/21/2022',
                            '4/21/2022', '5/25/2022', '5/7/2022',  '3/30/2022', '3/22/2022', '3/22/2022']})
  1. First extract the month from the dates in Col4 (and sort ascending by month):
df.loc[:, 'Month'] = pd.to_datetime(df.Col4).dt.month
df = df.sort_values('Month', ascending=True)
    Col1 Col2 Col3       Col4  Month
 0     Y   MA    S   2/2/2022      2
 2     N   YJ    D  3/11/2022      3
 8     N   WE    D  3/30/2022      3
 9     N   PR    D  3/22/2022      3
 10    Y   PR    S  3/22/2022      3
 1     N   YJ    D  4/25/2022      4
 3     N   YJ    D  4/28/2022      4
 4     Y   YJ    D  4/21/2022      4
 5     N   YJ    D  4/21/2022      4
 6     Y   WE    D  5/25/2022      5
 7     Y   WE    S   5/7/2022      5

  1. Create a pivot table with Month as the index, the values of Col3 (i.e., S and D) as columns, and counts as the cell values:
df1 = df.groupby(['Month', 'Col3'])\
        .size()\
        .unstack(fill_value=0)\
        .reset_index()
 Col3  Month  D  S
 0         2  0  1
 1         3  3  1
 2         4  4  0
 3         5  1  1
  1. Plot the results
plt.plot(df1.Month, df1.D, label='D')
plt.plot(df1.Month, df1.S, label='S')
plt.xlabel('Month')
plt.ylabel('Count')
plt.legend()
plt.show()

enter image description here

  • Related