Home > Enterprise >  Exploratory Data Analysis on Datasets with too much variables
Exploratory Data Analysis on Datasets with too much variables

Time:09-19

My question is a little bit theoretical.

I have a dataset with 100 columns, Every EDA method that I use results in a messed-up plot, How can I get more interpretable plots and tables with such data?

CodePudding user response:

What aspects of EDA do you wish to visualize? I'd say that you can still do multiple things if you have 100 columns. However, some aspects will require laborious manual inspection.

  • Visualize correlation matrix with e.g. Seaborn (example)
  • Matrix plot for missing data (could also be a bar plot or any other type that works) (example)
  • Boxplots to demonstrate spread/skewness of your variables
  • Simple data summaries such as df.info(), df.describe()

CodePudding user response:

@Zine

Try using only the variables you need in the visualizations.


You can use Principal Components Analysis (PCA) to reduce the variables. It is an effective way of reducing the variables but contains the same quality data. For your reference, links to learn PCA: -

  1. https://www.sartorius.com/en/knowledge/science-snippets/what-is-principal-component-analysis-pca-and-how-it-is-used-507186

2)https://www.geeksforgeeks.org/ml-principal-component-analysispca/

3)https://www.machinelearningplus.com/machine-learning/principal-components-analysis-pca-better-explained/

  • Related