How to plot scatter plot of given two different column values in python?-CodePudding

I have such a data frame

import pandas as pd

sample_df = pd.DataFrame({'ID': [25,25,25,18,18,18],
                          'AGE': [11,11,12,11,12,13],
                          'RECORD':[1,2,2,1,1,2]})

ID	AGE	RECORD
25	11	1
25	11	2
25	12	2
18	11	1
18	12	1
18	13	2

I would like to plot number of profiles vs age given this dataframe. My expectation is to have a plot for each age, for example age 11, there should be 3 profiles. Or for age 12, there should be 2 profiles. I tried using df.query, but I ended up confusing. Could you help me?

Expected output should look like below. Legend is not necessary for each ID

CodePudding user response：

Try:

sample_df.groupby('AGE')['ID'].count().plot(rot=0, title='Number Of Profiles', marker='o', linestyle='none')

Output:

CodePudding user response：

You can specify column name whose values will be used to color the marker points according to a colormap:

sample_df.groupby(['AGE', 'ID']).count().reset_index()\
    .plot.scatter(x='AGE', y='RECORD', c='ID', colormap='viridis')

CodePudding user response：

Using seaborn but transform your dataframe first:

import seaborn as sns
import matplotlib.pyplot as plt

df1 = (sample_df.value_counts(['ID', 'AGE']).to_frame('PROFILE')
                .reset_index().astype(str))

sns.scatterplot(data=df1, x='AGE', y='PROFILE', hue='ID')
# OR
sns.catplot(data=df1.sort_values('PROFILE', ascending=True), x='AGE', y='PROFILE', hue='ID')

plt.show()