I have csv file like this with 10000 different parameters some of the parameters are empty and some of the parameters have only 0 and 1 combination. I want to display the parameters with 0 and 1 combination and I want to remove the parameters which are empty from the table and then I have to display the table without NA, NaN and empty values.
Any help will be appreciated
CodePudding user response:
You can first drop the columns or parameters which are empty and select the rows with only 1 or 0 values.
To get column names which are having all null values
df.columns[df.isna().all()]
Next step you can drop null columns.
df.dropna(how='all', axis=1,inplace = True)
df.loc[:, ((df == 0) | (df == 1) ).all()]
CodePudding user response:
Putting Together the Dataframe
To get started, let’s put together a sample dataframe that you can use throughout the rest of the tutorial. Take a look at the code below to put together the dataframe:
df = pd.DataFrame({'Name': ['Nik', 'Jim', 'Alice', 'Jane', 'Matt', 'Kate'],
'Score': [100, 120, 96, 75, 68, 123],
'Height': [178, 180, 160, 165, 185, 187],
'Weight': [180, 175, 143, 155, 167, 189]})enter code here
print(df.head())
By using the df.head() function, you can see what the dataframe’s first five rows look like:
Name Score Height Weight
0 Nik 100 178 180 1 Jim 120 180 175 2 Alice 96 160 143 3 Jane 75 165 155 4 Matt 68 185 167
DataFrame.drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
CodePudding user response:
- Load your csv :
For this we will use the very useful library Pandas.
import pandas as pd
df = pd.read_csv(path_to_your_file)
- With pandas.Series.isin() :
It will return a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly. After that you drop the not a number (NaN) values. You can use in one line :
df[df.isin([0, 1])].dropna(axis=1)
I think it will be faster than the first answer and the condition with .all()
. Maybe a time comparison can be helpful for you, with your dataset.