First of all I am importing a csv file data to pandas
Now I have the following data
Schooltyp | Name | ParentName | ParentAddress |
---|---|---|---|
Public | Tom | John | Nanostreet |
Private | Bill | Sally | NaN |
Public | Ron | Tony | Burystreet |
Public | Danny | Nate | NaN |
Private | Stewart | Ben | PringleStreet |
I need to remove data
where Schooltyp = Public and ParentAddress is null.
I tried different solutions. This is the latest solution I have currently used which results in an error due to the particular condition I chose (data['ParentAddress'].isnull)
data = pd.read_csv("dataload.csv", sep = ';', error_bad_lines=False )
indexnames = data[(data['Schooltype']=='Public') & (data['ParentAddress'].isnull)].indexes
data.drop(indexnames, inplace = True)
data.to_csv('finaldata.csv', sep=';', index=False)
Am I using the right approach, is there a better way in doing this?
CodePudding user response:
To remove all rows where Schooltyp
is "Public" and ParentAddress
is null:
should_be_removed = (df['Schooltyp'] == 'Public') & df['ParentAddress'].isna()
df.loc[~ should_be_removed]
Result:
Schooltyp Name ParentName ParentAddress
0 Public Tom John Nanostreet
1 Private Bill Sally NaN
2 Public Ron Tony Burystreet
4 Private Stewart Ben PringleStreet
Notes:
.ne()
is equivalent to!=
, just less typing.There is also a method
.eq()
which is the same as==
.To invert a condition, you can put
~
before it.