Home > Net >  Plotting complex graph in pandas
Plotting complex graph in pandas

Time:01-26

I have the following dataset

ids             count
1               2000210
2              -23123
3               100
4               500
5               102300120
...
1 million       123213

I want a graph where I have group of ids (all unique ids) in the x axis and count in y axis and a distribution chart that looks like the following

enter image description here

How can I achieve this in pandas dataframe in python.

I tried different ways but I am only getting a basic plot and not as complex as the drawing.

What I tried

df = pd.DataFrame(np.random.randn(1000000, 2), columns=["count", "ids"]).cumsum()

df["range"] = pd.Series(list(range(len(df))))

df.plot(x="range", y="count");

But the plots dont make any sense. I am also new to plotting in pandas. I searched for a long time for charts like this in the internet and could really use some help with such graphs

CodePudding user response:

From a distribution plot of ids you can use:

import numpy as np
import pandas as pd

np.random.seed(seed=123)
df = pd.DataFrame(np.random.randn(1000000), columns=["ids"])
df['ids'].plot(kind='kde')

enter image description here

CodePudding user response:

From what I understood from your question and comments here is what you can do:

1) Import the libraries and set the default theme:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme()

2) Create your dataframe:

df = pd.DataFrame(np.random.randn(1000000, 2), columns=["count", "ids"]).cumsum()
df["range"] = pd.Series(list(range(len(df))))

3) Plot your data

3.1) Simple take using only the seaborn library:

sns.kdeplot(data=df, x="count", weights="range")

Output:

seaborn_only_solution

3.2) More complex take using seaborn and matplotlib libraries:

sns.histplot(x=df["count"], weights=df["range"], discrete=True,
             color='darkblue', edgecolor='black',
             kde=True, kde_kws={'cut': 2}, line_kws={'linewidth': 4})
plt.ylabel("range")
plt.show()

Output:

seaborn_matplotlib_solution

Personal note: please make sure to check all the solutions, if they are not enough comment and we will work together in order to find you a solution

  • Related