I need to check data distributions of all my numeric columns in a dataset. I chose Fitter
library to do so. I loop over all columns but have only one plot summary table as an outcome instead. What is wrong with my code?
from fitter import Fitter
import numpy as np
df_numeric = df.select_dtypes(include=np.number).sample(n=5000)
num_cols = df_numeric.columns.tolist()
distr = ['cauchy',
'chi2',
'expon',
'exponpow',
'gamma',
'beta',
'lognorm',
'logistic',
'norm',
'powerlaw',
'rayleigh',
'uniform']
for col in num_cols:
modif_col = df_numeric[col].fillna(0).values
dist_fitter = Fitter(modif_col, distributions=distr)
dist_fitter.fit()
dist_fitter.summary()
Maybe there is another approach to check distributions in a loop?
CodePudding user response:
It looks like your code is correctly looping over all the numeric columns in the dataframe, fitting different distributions to each column using the Fitter library, and then printing a summary of the fitting results. However, you're only seeing one plot and summary table as the outcome because you're overwriting the plot and summary table for each iteration of the loop.
To see a separate plot and summary table for each column, you should move the calls to dist_fitter.summary() and dist_fitter.plot() inside the loop and make sure to give each plot and summary table a unique name or title, so you can distinguish them when viewing them.
Here is the code example you can use it
import matplotlib.pyplot as plt
for col in num_cols:
modif_col = df_numeric[col].fillna(0).values
dist_fitter = Fitter(modif_col, distributions=distr)
dist_fitter.fit()
plt.figure()
dist_fitter.plot()
plt.title(col)
plt.show()
print(col)
dist_fitter.summary()