I'm trying to drop rows where there are NaN values in columns 1, 2, and 3 for the years 2019 and 2020.
Here is the dataframe:
col1 col2 col3 col4 year
NaN NaN NaN NaN 2018
NaN NaN NaN NaN 2019
NaN NaN NaN 100 2019
100 200 100 NaN 2020
NaN NaN NaN 200 2020
100 100 150 150 2021
Expected result:
col1 col2 col3 col4 year
NaN NaN NaN NaN 2018
100 200 100 NaN 2020
100 100 150 150 2021
I tried this but it couldn't get it to work:
df.drop(df[(df['year'] == [2019, 2020])
& (df[['col1', 'col2', 'col3']].isnull())].index, inplace=True)
CodePudding user response:
You need to use isin
to check for several values, and all
to aggregate you boolean DataFrame into a single Series:
# is the year 2019 or 2020?
m1 = df['year'].isin([2019, 2020])
# are all the values in col1/col2/col3 null for a given row?
m2 = df[['col1', 'col2', 'col3']].isnull().all(1)
# invert the above two conditions to select rows to keep
df = df[~(m1&m2)]
# or
# df.drop(df[m1&m2].index, inplace=True)
output:
col1 col2 col3 col4 year
0 NaN NaN NaN NaN 2018
3 100.0 200.0 100.0 NaN 2020
5 100.0 100.0 150.0 150.0 2021
CodePudding user response:
df.loc[~(df[["col1", "col2", "col3"]].isna().all(axis="columns")
& df["year"].isin([2019, 2020]))]
Conditions are:
NaN values in columns 1, 2, and 3
df[["col1", "col2", "col3"]].isna().all(axis="columns")
years 2019 and 2020
df["year"].isin([2019, 2020])
Then &
these, invert the result and index with it; we will get the desired rows at the end.
CodePudding user response:
As you are filtering based on both year
and nan
values - you could probably split your dataframe and rejoin
msk = df['year'].isin([2019, 2020]) # drop where True, keep the complement as-is
df_oth = df[~msk]
df_req = df[msk].dropna(how='all', subset=['col1', 'col2', 'col3'])
df_out = pd.concat([df_oth, df_req])
Output
col1 col2 col3 col4 year
0 NaN NaN NaN NaN 2018
5 100.0 100.0 150.0 150.0 2021
3 100.0 200.0 100.0 NaN 2020