I have the following command which I do not understand completely
def func1(df):
df.loc[df['First Name'].isin(['n.a.', 'null']),
df.columns.drop(["Last Name", "Middle Name", "First Name"])] = "n.a."
return df
I wrote the following dataframe to test what it returns
df1 = pd.DataFrame({"First Name": ['Alex', 'n.a.', 'null'],
"Last Name": ['Peterson', 'Doe', 8],
"Middle Name": ['John', 'Jack', 3],
"Pet": [2, 9, 3]})
If I understood correctly, it checks in the column 'First Name' if the value is 'n.a.' or 'null' and then it drops all the other columns except the "Last Name", "Middle Name", "First Name"? But what is the equal to n.a. at the end? By running the function on the aforementioned dataframe it basically return the same dataframe without changing anything. For this reason I tried to split the function to check it separately
def func2(df):
df.loc[df['First Name'].isin(['n.a.', 'null'])] = "n.a."
return df
tried it with the same dataframe and I noticed that for the rows that have n.a. or null in the First Name, it turns the other elements into n.a. Why my dataframe does not change the same way for func1?
CodePudding user response:
First, the provided DataFrame is invalid, all lists must have the same length.
Let us use:
df1 = pd.DataFrame({"First Name": ['Alex', 'n.a.', 'null'],
"Last Name": ['Peterson', 'Doe', 8],
"Middle Name": ['John', 'Jack', 3],
"Pet": [2, 9, 3]}) # removed 2 values
Now, the function selects all rows in which "First Name" is ['n.a.', 'null']
and all columns that are not ["Last Name", "Middle Name", "First Name"]
. This leaves us with rows 1/2 and column "Pet". Assignment makes them "n.a.", and indeed the output is:
First Name Last Name Middle Name Pet
0 Alex Peterson John 2
1 n.a. Doe Jack n.a. # this value was updated
2 null 8 3 n.a. # this value was updated