Home > other >  Filtering for different conditions in different columns pandas
Filtering for different conditions in different columns pandas

Time:11-24

Python 3.9 and Pandas 1.3.4

So here's the df:

1   First Name  Last Name  fullname
2   Freddie     Mercury    Freddie Mercury
3   John        Lennon     John Lennon
4   David       Bowie      David Bowie
5                          John Doe
6   Joseph                 Joseph
7               Jovi       Jovi

My piece of code currently just finds the fullname column is just First Name Last Name.

I'm currently trying to filter for blank entries in the First Name column, Last Name column, and any "John Does" in the fullname column.

Current code:

import pandas as pd

df = pd.read_csv('file.csv', dtype=str, header=0)

df2 = pd.DataFrame(df, columns=['First Name', 'Last Name', 'fullname'])

df['fullname'] = (df[['First Name', 'Last Name']].fillna('').agg(' '.join, axis=1).str.strip().replace('', 'John Doe'))

df_sort = df2.loc[df2['First Name'] == " "] | df2.loc[df2['Last Name'] == " "] | df2.loc[df2['fullname'] == "John Doe"]


df.to_csv('file.csv', index=False)
df_sort.to_csv('missing names.csv', index=False)

Currently I am having the missing names write to a new file and outputs only this:

First Name   Last Name   fullname

Everything is empty under.

I would like for the output to be:

First Name   Last Name   fullname
                         John Doe
Joseph
             Jovi

CodePudding user response:

Replace possible missing values to empty string, compare and test if at least one value match in DataFrame.any:

df_sort = df2[df2[['First Name', 'Last Name']].fillna('').eq('').any(axis=1)]

Or if there are missing values use:

df_sort = df2[df2[['First Name', 'Last Name']].isna().any(axis=1)]

CodePudding user response:

You don't need to use .loc at each condition but you need to add some ( ):

>>> df2[(df2['First Name'] == " ")
        | (df2['Last Name'] == " ")
        | (df2['fullname'] == "John Doe")]

  First Name Last Name  fullname
3        NaN       NaN  John Doe
4     Joseph       NaN    Joseph
5        NaN      Jovi      Jovi

Note: adapt your missing values to your code. Here I use == " " (blank space) like you but you can use == "" (empty string) or use .isna() (missing values).

  • Related