Python 3.9 and Pandas 1.3.4
So here's the df:
1 First Name Last Name fullname
2 Freddie Mercury Freddie Mercury
3 John Lennon John Lennon
4 David Bowie David Bowie
5 John Doe
6 Joseph Joseph
7 Jovi Jovi
My piece of code currently just finds the fullname
column is just First Name
Last Name
.
I'm currently trying to filter for blank entries in the First Name
column, Last Name
column, and any "John Does" in the fullname
column.
Current code:
import pandas as pd
df = pd.read_csv('file.csv', dtype=str, header=0)
df2 = pd.DataFrame(df, columns=['First Name', 'Last Name', 'fullname'])
df['fullname'] = (df[['First Name', 'Last Name']].fillna('').agg(' '.join, axis=1).str.strip().replace('', 'John Doe'))
df_sort = df2.loc[df2['First Name'] == " "] | df2.loc[df2['Last Name'] == " "] | df2.loc[df2['fullname'] == "John Doe"]
df.to_csv('file.csv', index=False)
df_sort.to_csv('missing names.csv', index=False)
Currently I am having the missing names write to a new file and outputs only this:
First Name Last Name fullname
Everything is empty under.
I would like for the output to be:
First Name Last Name fullname
John Doe
Joseph
Jovi
CodePudding user response:
Replace possible missing values to empty string, compare and test if at least one value match in DataFrame.any
:
df_sort = df2[df2[['First Name', 'Last Name']].fillna('').eq('').any(axis=1)]
Or if there are missing values use:
df_sort = df2[df2[['First Name', 'Last Name']].isna().any(axis=1)]
CodePudding user response:
You don't need to use .loc
at each condition but you need to add some ( )
:
>>> df2[(df2['First Name'] == " ")
| (df2['Last Name'] == " ")
| (df2['fullname'] == "John Doe")]
First Name Last Name fullname
3 NaN NaN John Doe
4 Joseph NaN Joseph
5 NaN Jovi Jovi
Note: adapt your missing values to your code. Here I use == " "
(blank space) like you but you can use == ""
(empty string) or use .isna()
(missing values).