Home > database >  How to plot scatter plot of given two different column values in python?
How to plot scatter plot of given two different column values in python?

Time:02-01

I have such a data frame

import pandas as pd

sample_df = pd.DataFrame({'ID': [25,25,25,18,18,18],
                          'AGE': [11,11,12,11,12,13],
                          'RECORD':[1,2,2,1,1,2]})
ID AGE RECORD
25 11 1
25 11 2
25 12 2
18 11 1
18 12 1
18 13 2

I would like to plot number of profiles vs age given this dataframe. My expectation is to have a plot for each age, for example age 11, there should be 3 profiles. Or for age 12, there should be 2 profiles. I tried using df.query, but I ended up confusing. Could you help me?

Expected output should look like below. Legend is not necessary for each IDenter image description here

CodePudding user response:

Try:

sample_df.groupby('AGE')['ID'].count().plot(rot=0, title='Number Of Profiles', marker='o', linestyle='none')

Output:

enter image description here

CodePudding user response:

You can specify column name whose values will be used to color the marker points according to a colormap:

sample_df.groupby(['AGE', 'ID']).count().reset_index()\
    .plot.scatter(x='AGE', y='RECORD', c='ID', colormap='viridis')

enter image description here

CodePudding user response:

Using seaborn but transform your dataframe first:

import seaborn as sns
import matplotlib.pyplot as plt

df1 = (sample_df.value_counts(['ID', 'AGE']).to_frame('PROFILE')
                .reset_index().astype(str))

sns.scatterplot(data=df1, x='AGE', y='PROFILE', hue='ID')
# OR
sns.catplot(data=df1.sort_values('PROFILE', ascending=True), x='AGE', y='PROFILE', hue='ID')

plt.show()

enter image description here

enter image description here

  • Related