Home > database >  How to drop a NAN value based on multiple conditions
How to drop a NAN value based on multiple conditions

Time:09-02

First of all I am importing a csv file data to pandas

Now I have the following data

Schooltyp Name ParentName ParentAddress
Public Tom John Nanostreet
Private Bill Sally NaN
Public Ron Tony Burystreet
Public Danny Nate NaN
Private Stewart Ben PringleStreet

I need to remove data
where Schooltyp = Public and ParentAddress is null.

I tried different solutions. This is the latest solution I have currently used which results in an error due to the particular condition I chose (data['ParentAddress'].isnull)

data = pd.read_csv("dataload.csv", sep = ';',  error_bad_lines=False )

indexnames = data[(data['Schooltype']=='Public') & (data['ParentAddress'].isnull)].indexes
data.drop(indexnames, inplace = True)

data.to_csv('finaldata.csv', sep=';', index=False)

Am I using the right approach, is there a better way in doing this?

CodePudding user response:

To remove all rows where Schooltyp is "Public" and ParentAddress is null:

should_be_removed = (df['Schooltyp'] == 'Public') & df['ParentAddress'].isna()

df.loc[~ should_be_removed]

Result:

  Schooltyp     Name ParentName  ParentAddress
0    Public      Tom       John     Nanostreet
1   Private     Bill      Sally            NaN
2    Public      Ron       Tony     Burystreet
4   Private  Stewart        Ben  PringleStreet

Notes:

.ne() is equivalent to !=, just less typing.

There is also a method .eq() which is the same as ==.

To invert a condition, you can put ~ before it.

  • Related