Home > database >  Calculate column value count as a bar plot in Python dataframe
Calculate column value count as a bar plot in Python dataframe

Time:06-09

I have time series data and want to see total number of Septic (1) and Non-septic (0) patients in the SepsisLabel column. The Non-septic patients don't have entries of '1'. While the Septic patients have first 'Zeros (0)' then it changes to '1' means it now becomes septic. The data looks like this:

HR SBP DBP SepsisLabel Gender P_ID
92 120 80 0 0 0
98 115 85 0 0 0
93 125 75 0 1 1
95 130 90 0 1 1
93 125 75 1 1 1
95 130 90 1 1 1
93 125 75 1 1 1
95 130 90 1 1 1
102 120 80 0 0 2
109 115 75 0 0 2
94 135 100 0 0 2
97 100 70 0 0 3
85 120 80 0 0 3
88 115 75 1 0 3
93 125 85 1 0 3
78 130 90 1 1 4
115 140 110 1 1 4

Here, there are 3 Septic patients (P_ID = 1, 3, 4) and 2 Non-septic patients (P_ID = 0, 2). I want to plot this number as a bar plot. So, I manually did this using the following code:

pic

import matplotlib.pyplot as plt
fig = plt.figure(figsize=(7, 6))
ax = fig.add_axes([0,0,1,1])
sepsis = ['Non-Septic patients', 'Septic patients']
count = [2, 3]
ax.bar(sepsis, count)

ax.set_title("Septic and Non-septic patient count in the dataset", y = 1, fontsize = 15)
ax.set_xlabel('Patients', fontsize = 12)
ax.set_ylabel('Count', fontsize = 12)

for bars in ax.containers:
    ax.bar_label(bars)
ax.margins(y=0.1)  
plt.show()

  1. However, I don't want to manually calculate the septic and non-septic patient count as the data I have is very large. This is just the dummy data. I know I must use P_ID column but not sure how.
  2. Second thing I want to plot is Out of these septic and non-septic patients, how many are Male (1) and Female (0) based on the Gender column. I want something like this graph:

pic2

****Update****

Using drop_duplicates keeps only first row by default. So, the septic patient which has initially 0s then it changes to 1, there arise problem for them. Using the code only take first row even the patient is septic. Thus total number of septic patients drops, while number of non-septic patients increases, which shouldn't. Is it possible to keep only those rows in septic patients where 0 changes to 1? So, all septic patients will have 1 in SepsisLabel in their first row instead of 0. This will give the correct number of septic patients.

CodePudding user response:

For 1) use np.where. For 2), you can use seaborn for the second purpose:

dedup = df.groupby('P_ID')[['SepsisLabel', 'Gender']].max().reset_index()

dedup['SepticType'] = np.where(dedup.SepsisLabel, 'Septic', 'NonSeptic')
sns.countplot(data=dedup, x='SepticType', hue='Gender')

Output:

enter image description here

  • Related