Home > Blockchain >  How to display different columns and remove them using pandas
How to display different columns and remove them using pandas

Time:08-13

Data table

I have csv file like this with 10000 different parameters some of the parameters are empty and some of the parameters have only 0 and 1 combination. I want to display the parameters with 0 and 1 combination and I want to remove the parameters which are empty from the table and then I have to display the table without NA, NaN and empty values.

Any help will be appreciated

CodePudding user response:

You can first drop the columns or parameters which are empty and select the rows with only 1 or 0 values.

To get column names which are having all null values

df.columns[df.isna().all()]

Next step you can drop null columns.

df.dropna(how='all', axis=1,inplace = True)
df.loc[:, ((df == 0) | (df == 1) ).all()]

CodePudding user response:

Putting Together the Dataframe

To get started, let’s put together a sample dataframe that you can use throughout the rest of the tutorial. Take a look at the code below to put together the dataframe:

df = pd.DataFrame({'Name': ['Nik', 'Jim', 'Alice', 'Jane', 'Matt', 'Kate'],
               'Score': [100, 120, 96, 75, 68, 123],
               'Height': [178, 180, 160, 165, 185, 187],
            

'Weight': [180, 175, 143, 155, 167, 189]})enter code here

print(df.head())

By using the df.head() function, you can see what the dataframe’s first five rows look like:


Name    Score   Height  Weight

0 Nik 100 178 180 1 Jim 120 180 175 2 Alice 96 160 143 3 Jane 75 165 155 4 Matt 68 185 167

DataFrame.drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

CodePudding user response:

  1. Load your csv :

For this we will use the very useful library Pandas.

import pandas as pd    
df = pd.read_csv(path_to_your_file)
  1. With pandas.Series.isin() :

It will return a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly. After that you drop the not a number (NaN) values. You can use in one line :

df[df.isin([0, 1])].dropna(axis=1)

I think it will be faster than the first answer and the condition with .all(). Maybe a time comparison can be helpful for you, with your dataset.

  • Related