I have been given a CSV data file. Using JupyterHub, Python and Pandas I have been able to read the dataframe and have deleted any rows with NaN values. I am looking to do the same for any values that are negative. I have tried to search a similar problem on thsi site, but can't seem to find a solution to try that fits well. Below is how I deleted the rows with NaNs Please help!
df=pd.read_csv("cereal.csv")
df1=df.dropna(how='any',axis =0).reset_index(drop=True)
df1.shape
df1.head()
CodePudding user response:
You can drop rows for which in a specific column the value is negative using pandas.DataFrame.drop
as follows:
import pandas as pd
df = pd.DataFrame({
'colA': [-1, 2, 3, 4, None],
'colB': [True, True, False, False, True],
})
df = df.drop(df.index[df['colA'] < 0])
Output:
>>> df
colA colB
1 2.0 True
2 3.0 False
3 4.0 False
4 NaN True
CodePudding user response:
Another option, similar syntax but doesn't use .drop
. Retain the rows where condition is not met (~
works as a negation):
>>> df.loc[~(df['colA'] < 0)]
colA colB
1 2.0 True
2 3.0 False
3 4.0 False
4 NaN True
And another one, for you to choose depending if you want the NaN values in your "colA" or not:
>>> df.loc[df['colA'] >= 0)]
colA colB
1 2.0 True
2 3.0 False
3 4.0 False