Home > Software engineering >  How to visualize categorical frequency difference
How to visualize categorical frequency difference

Time:10-24

Data: Diabetes dataset found here: enter image description here

See figure above. This is where I need help. You cant really make heads or tails of this. There is a class imbalance in this age group - infact 312 samples are not-diabetic while only 84 are. How can I adjust the plot to better depict this class imbalance?

CodePudding user response:

  • The difference in 'Outcome' for each 'Age' can most easily be seen with a bar plot showing the count, which can be done directly with a seaborn.countplot, or calculating the counts in pandas, and plotting with pandas.DataFrmame.plot.
  • Tested in python 3.8.12, pandas 1.3.3, matplotlib 3.4.3, seaborn 0.11.2

Data and Imports

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# data
df = pd.read_csv('https://raw.githubusercontent.com/LahiruTjay/Machine-Learning-With-Python/master/datasets/diabetes.csv')

# filter for less than 30
u30 = df[df.Age.lt(30)]

Use enter image description here

Use enter image description here

Data

  • Incase the data at the GitHub link is no longer available
Age,Outcome
21,0
26,1
29,0
27,0
29,1
22,0
28,1
22,0
28,0
27,1
26,0
25,1
29,0
22,0
24,0
22,0
26,0
21,0
22,0
21,0
24,0
25,0
27,0
28,1
26,0
23,0
22,0
22,0
27,0
26,1
24,0
22,0
22,0
22,0
27,0
26,0
24,0
21,0
21,0
24,0
22,0
23,0
22,0
21,0
24,0
27,0
21,0
27,0
25,0
24,1
24,1
23,0
25,0
25,0
22,0
21,0
25,1
24,0
23,0
23,1
26,1
23,0
26,0
21,0
22,0
29,0
28,0
22,0
23,0
21,0
22,0
24,0
23,0
21,0
23,0
22,0
27,0
21,0
22,0
29,0
29,0
29,1
25,0
23,0
26,1
23,0
21,0
27,0
25,1
21,0
29,1
21,0
23,1
26,1
29,1
21,0
28,0
27,0
27,0
21,0
25,0
24,0
24,1
25,1
21,1
26,0
22,0
26,0
24,1
24,0
22,1
22,0
29,0
23,0
26,1
23,1
27,0
21,0
22,0
22,1
29,0
23,0
23,0
27,0
24,0
25,0
21,1
25,0
24,0
27,1
24,0
25,1
24,0
21,0
28,1
21,0
21,0
25,0
29,1
23,0
22,0
28,1
29,1
26,0
21,0
25,1
24,1
28,0
29,1
24,0
25,1
28,1
29,0
21,0
25,1
22,0
27,1
25,0
26,0
29,1
28,0
25,1
21,0
24,0
23,1
25,0
22,0
26,0
22,0
22,0
22,0
23,0
26,0
29,0
24,0
21,0
28,1
29,1
29,1
29,1
21,0
22,0
25,1
21,0
21,0
25,0
28,0
22,0
22,0
24,0
22,0
21,0
25,0
25,0
24,0
28,0
27,1
21,0
25,0
22,1
25,0
25,1
26,0
25,0
28,1
28,0
25,0
22,0
21,0
21,1
22,1
22,0
27,0
28,1
26,0
21,0
21,0
21,0
25,0
26,0
23,0
22,0
29,0
29,1
28,0
21,0
22,0
24,0
25,1
28,0
26,0
22,1
26,0
23,0
23,1
25,0
24,0
24,0
26,0
21,0
22,0
25,0
27,0
28,0
22,0
22,0
24,0
29,1
29,0
28,0
23,0
24,1
21,0
28,0
24,0
22,0
25,0
21,0
28,0
21,0
21,0
21,0
22,0
24,0
28,1
25,0
26,0
26,0
24,0
21,0
21,0
24,0
22,0
22,0
24,0
29,0
24,0
23,1
23,0
27,1
25,0
29,0
28,0
21,0
25,0
23,0
28,0
28,1
24,0
27,0
22,0
21,0
21,0
22,0
22,0
23,0
25,0
21,1
21,1
27,0
22,0
29,0
25,0
24,0
25,0
22,1
21,0
26,0
24,0
28,0
21,0
22,1
25,0
27,0
23,0
24,0
26,0
27,0
23,0
24,1
28,0
28,0
21,0
21,0
29,0
21,0
21,0
21,0
24,0
23,0
22,0
23,0
28,0
27,0
24,0
27,0
22,1
23,0
23,0
27,0
28,0
27,0
22,0
25,1
22,0
27,1
22,1
24,0
21,0
22,0
25,0
25,1
23,0
22,0
26,1
22,0
27,1
25,0
22,0
29,0
23,0
23,0
25,0
22,0
28,0
26,0
26,0
27,0
28,0
22,0
23,1
24,0
21,0
24,0
21,0
25,0
22,0
22,0
22,0
22,1
24,1
22,0
28,0
21,0
21,0
26,0
22,0
27,1
22,1
28,0
25,0
26,1
26,0
22,0
27,0
23,0
  • Related