Data: Diabetes dataset found here:
See figure above. This is where I need help. You cant really make heads or tails of this. There is a class imbalance in this age group - infact 312 samples are not-diabetic while only 84 are. How can I adjust the plot to better depict this class imbalance?
CodePudding user response:
- The difference in
'Outcome'
for each'Age'
can most easily be seen with a bar plot showing the count, which can be done directly with aseaborn.countplot
, or calculating the counts in pandas, and plotting withpandas.DataFrmame.plot
. - Tested in
python 3.8.12
,pandas 1.3.3
,matplotlib 3.4.3
,seaborn 0.11.2
Data and Imports
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# data
df = pd.read_csv('https://raw.githubusercontent.com/LahiruTjay/Machine-Learning-With-Python/master/datasets/diabetes.csv')
# filter for less than 30
u30 = df[df.Age.lt(30)]
Use
Use
Data
- Incase the data at the GitHub link is no longer available
Age,Outcome
21,0
26,1
29,0
27,0
29,1
22,0
28,1
22,0
28,0
27,1
26,0
25,1
29,0
22,0
24,0
22,0
26,0
21,0
22,0
21,0
24,0
25,0
27,0
28,1
26,0
23,0
22,0
22,0
27,0
26,1
24,0
22,0
22,0
22,0
27,0
26,0
24,0
21,0
21,0
24,0
22,0
23,0
22,0
21,0
24,0
27,0
21,0
27,0
25,0
24,1
24,1
23,0
25,0
25,0
22,0
21,0
25,1
24,0
23,0
23,1
26,1
23,0
26,0
21,0
22,0
29,0
28,0
22,0
23,0
21,0
22,0
24,0
23,0
21,0
23,0
22,0
27,0
21,0
22,0
29,0
29,0
29,1
25,0
23,0
26,1
23,0
21,0
27,0
25,1
21,0
29,1
21,0
23,1
26,1
29,1
21,0
28,0
27,0
27,0
21,0
25,0
24,0
24,1
25,1
21,1
26,0
22,0
26,0
24,1
24,0
22,1
22,0
29,0
23,0
26,1
23,1
27,0
21,0
22,0
22,1
29,0
23,0
23,0
27,0
24,0
25,0
21,1
25,0
24,0
27,1
24,0
25,1
24,0
21,0
28,1
21,0
21,0
25,0
29,1
23,0
22,0
28,1
29,1
26,0
21,0
25,1
24,1
28,0
29,1
24,0
25,1
28,1
29,0
21,0
25,1
22,0
27,1
25,0
26,0
29,1
28,0
25,1
21,0
24,0
23,1
25,0
22,0
26,0
22,0
22,0
22,0
23,0
26,0
29,0
24,0
21,0
28,1
29,1
29,1
29,1
21,0
22,0
25,1
21,0
21,0
25,0
28,0
22,0
22,0
24,0
22,0
21,0
25,0
25,0
24,0
28,0
27,1
21,0
25,0
22,1
25,0
25,1
26,0
25,0
28,1
28,0
25,0
22,0
21,0
21,1
22,1
22,0
27,0
28,1
26,0
21,0
21,0
21,0
25,0
26,0
23,0
22,0
29,0
29,1
28,0
21,0
22,0
24,0
25,1
28,0
26,0
22,1
26,0
23,0
23,1
25,0
24,0
24,0
26,0
21,0
22,0
25,0
27,0
28,0
22,0
22,0
24,0
29,1
29,0
28,0
23,0
24,1
21,0
28,0
24,0
22,0
25,0
21,0
28,0
21,0
21,0
21,0
22,0
24,0
28,1
25,0
26,0
26,0
24,0
21,0
21,0
24,0
22,0
22,0
24,0
29,0
24,0
23,1
23,0
27,1
25,0
29,0
28,0
21,0
25,0
23,0
28,0
28,1
24,0
27,0
22,0
21,0
21,0
22,0
22,0
23,0
25,0
21,1
21,1
27,0
22,0
29,0
25,0
24,0
25,0
22,1
21,0
26,0
24,0
28,0
21,0
22,1
25,0
27,0
23,0
24,0
26,0
27,0
23,0
24,1
28,0
28,0
21,0
21,0
29,0
21,0
21,0
21,0
24,0
23,0
22,0
23,0
28,0
27,0
24,0
27,0
22,1
23,0
23,0
27,0
28,0
27,0
22,0
25,1
22,0
27,1
22,1
24,0
21,0
22,0
25,0
25,1
23,0
22,0
26,1
22,0
27,1
25,0
22,0
29,0
23,0
23,0
25,0
22,0
28,0
26,0
26,0
27,0
28,0
22,0
23,1
24,0
21,0
24,0
21,0
25,0
22,0
22,0
22,0
22,1
24,1
22,0
28,0
21,0
21,0
26,0
22,0
27,1
22,1
28,0
25,0
26,1
26,0
22,0
27,0
23,0
Data
- Incase the data at the GitHub link is no longer available
Age,Outcome
21,0
26,1
29,0
27,0
29,1
22,0
28,1
22,0
28,0
27,1
26,0
25,1
29,0
22,0
24,0
22,0
26,0
21,0
22,0
21,0
24,0
25,0
27,0
28,1
26,0
23,0
22,0
22,0
27,0
26,1
24,0
22,0
22,0
22,0
27,0
26,0
24,0
21,0
21,0
24,0
22,0
23,0
22,0
21,0
24,0
27,0
21,0
27,0
25,0
24,1
24,1
23,0
25,0
25,0
22,0
21,0
25,1
24,0
23,0
23,1
26,1
23,0
26,0
21,0
22,0
29,0
28,0
22,0
23,0
21,0
22,0
24,0
23,0
21,0
23,0
22,0
27,0
21,0
22,0
29,0
29,0
29,1
25,0
23,0
26,1
23,0
21,0
27,0
25,1
21,0
29,1
21,0
23,1
26,1
29,1
21,0
28,0
27,0
27,0
21,0
25,0
24,0
24,1
25,1
21,1
26,0
22,0
26,0
24,1
24,0
22,1
22,0
29,0
23,0
26,1
23,1
27,0
21,0
22,0
22,1
29,0
23,0
23,0
27,0
24,0
25,0
21,1
25,0
24,0
27,1
24,0
25,1
24,0
21,0
28,1
21,0
21,0
25,0
29,1
23,0
22,0
28,1
29,1
26,0
21,0
25,1
24,1
28,0
29,1
24,0
25,1
28,1
29,0
21,0
25,1
22,0
27,1
25,0
26,0
29,1
28,0
25,1
21,0
24,0
23,1
25,0
22,0
26,0
22,0
22,0
22,0
23,0
26,0
29,0
24,0
21,0
28,1
29,1
29,1
29,1
21,0
22,0
25,1
21,0
21,0
25,0
28,0
22,0
22,0
24,0
22,0
21,0
25,0
25,0
24,0
28,0
27,1
21,0
25,0
22,1
25,0
25,1
26,0
25,0
28,1
28,0
25,0
22,0
21,0
21,1
22,1
22,0
27,0
28,1
26,0
21,0
21,0
21,0
25,0
26,0
23,0
22,0
29,0
29,1
28,0
21,0
22,0
24,0
25,1
28,0
26,0
22,1
26,0
23,0
23,1
25,0
24,0
24,0
26,0
21,0
22,0
25,0
27,0
28,0
22,0
22,0
24,0
29,1
29,0
28,0
23,0
24,1
21,0
28,0
24,0
22,0
25,0
21,0
28,0
21,0
21,0
21,0
22,0
24,0
28,1
25,0
26,0
26,0
24,0
21,0
21,0
24,0
22,0
22,0
24,0
29,0
24,0
23,1
23,0
27,1
25,0
29,0
28,0
21,0
25,0
23,0
28,0
28,1
24,0
27,0
22,0
21,0
21,0
22,0
22,0
23,0
25,0
21,1
21,1
27,0
22,0
29,0
25,0
24,0
25,0
22,1
21,0
26,0
24,0
28,0
21,0
22,1
25,0
27,0
23,0
24,0
26,0
27,0
23,0
24,1
28,0
28,0
21,0
21,0
29,0
21,0
21,0
21,0
24,0
23,0
22,0
23,0
28,0
27,0
24,0
27,0
22,1
23,0
23,0
27,0
28,0
27,0
22,0
25,1
22,0
27,1
22,1
24,0
21,0
22,0
25,0
25,1
23,0
22,0
26,1
22,0
27,1
25,0
22,0
29,0
23,0
23,0
25,0
22,0
28,0
26,0
26,0
27,0
28,0
22,0
23,1
24,0
21,0
24,0
21,0
25,0
22,0
22,0
22,0
22,1
24,1
22,0
28,0
21,0
21,0
26,0
22,0
27,1
22,1
28,0
25,0
26,1
26,0
22,0
27,0
23,0