Pandas: How to look at the shape of a dataframe?-CodePudding

I want to process raw data data_mrna_agilent_microarray_zscores_ref_all_samples.txt and look at the shape of the dataframe.

import pandas as pd

dir = "/content/gdrive/MyDrive/Cancer_Pathways/gbm_tcga/"

class DataProcessing:

    def __init__(self, data, header=0):
        self.df = pd.read_csv(data, sep="\t",header = header)

    def split_data(self):
        X = self.df.iloc[:, :-1]
        y = self.df.iloc[:, -1]
        return X, y

    def kegg_genes(self):
        # do something
        self

    def pathways(self):
        # do something
        self

    def pca(self):
        pca = PCA()
        if np.any(np.isnan(self.df)):
            pass
        elif np.all(np.isfinite(self.df)):
            pass
        else:
            pca.fit(self.df.iloc[1:, 3:])
            self.pca_components = pca.components_
            return self.pca_components

rna = DataProcessing(dir   "data_mrna_agilent_microarray_zscores_ref_all_samples.txt")
rna.shape

Traceback:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-11-ee837472fd17> in <module>()
     33 
     34 rna = DataProcessing(dir   "data_mrna_agilent_microarray_zscores_ref_all_samples.txt")
---> 35 rna.shape

AttributeError: 'DataProcessing' object has no attribute 'shape'

CodePudding user response：

More of a python question than of a pandas question. Dataframes indeed have a property called shape but again, it is the dataframes that have that property, not your own custom class. The dataframe here is in rna.dfwhich means to get the shape of that you need to access the df instead. rna.df.shape

CodePudding user response：

you should use rna.df.shape() instead of rna.shape().

CodePudding user response：

If it's an array, try:

np.shape(rna)