Home > Software design >  How does the Pandas Histogram Data Get to the Graph without Passing it In?
How does the Pandas Histogram Data Get to the Graph without Passing it In?

Time:12-12

Pretty straight forward question here.

I'm loading data in from a csv. The csv column for age is then converted into a histogram. Finally I'm showing a graph and the data is populated to it.

For the life of me though, I don't understand how the matplotlib plt is getting the data from the pandas command dftrain.age.hist() without me explicitly passing it in.

Is hist an extension method? That's the only thing that makes sense to me currently.

import pandas as pd
import matplotlib.pyplot as plt

#load csv files 

##training data
dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')

#generate a histogram of ages
dftrain.age.hist()

#show the graph
plt.show()

enter image description here

CodePudding user response:

So according to this article I think internally the pandas hist function is calling the matploitlib hist function

https://www.educba.com/pandas-hist/

Pandas hist() function is utilized to develop Histograms in Python using the panda’s library. A histogram is a portrayal of the conveyance of information. This capacity calls matplotlib.pyplot.hist(), on every arrangement in the DataFrame, bringing about one histogram for each section or column.

CodePudding user response:

every pandas dataframe in python is itself a class. So you can access its variables as usual with other python classes.

So when you code dftrain.age, you are accessing that column. And by dftrain.age.hist() you can just generate a histogram of the values of that column.

For example:

import pandas as pd
  
# Creating the DataFrame
df = pd.DataFrame({'Weight':[45, 88, 56, 15, 71],
                   'Name':['Sam', 'Andrea', 'Alex', 'Robin', 'Kia'],
                   'Age':[14, 25, 55, 8, 21]})

print("Type of a dataframe: ",type(df))
print("Type of a dataframe column: ",type(df.Age))
print("Printing that column\n",df.Age)

Output will be this:

Type of a dataframe:  <class 'pandas.core.frame.DataFrame'>
Type of a dataframe column:  <class 'pandas.core.series.Series'>
Printing that column
 0    14
 1    25
 2    55
 3     8
 4    21
  • Related