How to display different columns and remove them using pandas-CodePudding

Data table

I have csv file like this with 10000 different parameters some of the parameters are empty and some of the parameters have only 0 and 1 combination. I want to display the parameters with 0 and 1 combination and I want to remove the parameters which are empty from the table and then I have to display the table without NA, NaN and empty values.

Any help will be appreciated

CodePudding user response：

You can first drop the columns or parameters which are empty and select the rows with only 1 or 0 values.

To get column names which are having all null values

df.columns[df.isna().all()]

Next step you can drop null columns.

df.dropna(how='all', axis=1,inplace = True)
df.loc[:, ((df == 0) | (df == 1) ).all()]

CodePudding user response：

Putting Together the Dataframe

To get started, let’s put together a sample dataframe that you can use throughout the rest of the tutorial. Take a look at the code below to put together the dataframe:

df = pd.DataFrame({'Name': ['Nik', 'Jim', 'Alice', 'Jane', 'Matt', 'Kate'],
               'Score': [100, 120, 96, 75, 68, 123],
               'Height': [178, 180, 160, 165, 185, 187],

'Weight': [180, 175, 143, 155, 167, 189]})enter code here

print(df.head())

By using the df.head() function, you can see what the dataframe’s first five rows look like:


Name    Score   Height  Weight

0 Nik 100 178 180 1 Jim 120 180 175 2 Alice 96 160 143 3 Jane 75 165 155 4 Matt 68 185 167

DataFrame.drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

CodePudding user response：

Load your csv :

For this we will use the very useful library Pandas.

import pandas as pd    
df = pd.read_csv(path_to_your_file)

With pandas.Series.isin() :

It will return a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly. After that you drop the not a number (NaN) values. You can use in one line :

df[df.isin([0, 1])].dropna(axis=1)

I think it will be faster than the first answer and the condition with .all(). Maybe a time comparison can be helpful for you, with your dataset.