Hi I have an excel sheet with fields like this
FName, LastName, DOB, BirthPlace, Address, Desig
John, Cash, 21-09-1986, Darwin, , ,
John, Cash, , , , 22 Howard Springs Darwin,
John, Cash, , , 20 Howard Springs Darwin , Supervisor
So I want to groupby Fname, LastName, DOB, Birth Place but then I want pick only "last" non null fields off of other columns like Address and Desig. Also please note that I won't get all the col values in every row though there can be many updates for the same person. Like in 2nd row above I don't have DOB, BirthPlace, Desig. I can have many such rows with only one col updated each item but names and dob fields present most of the time.
I tried to do it this way
df2 = input_df.groupby(['Fname','LastName','BirthPlace','DOB']).agg({ 'Address':'last',
'Desig':'last'
}).reset_index()
But I get none or nulls in the output.
Expected output for me should be (address updated and designation set as well to last non null value in those cols)
John, Cash,21-09-1986, Darwin, 20 Howard Springs Darwin , Supervisor
CodePudding user response:
If there is possible defined groups by only Fname,LastName
solution should be simplify by
Pardon the use of tail(1) instead of last(). Don't know why it didn't work. But I think you get the picture. Extend your agg with a lambda function to get rid of the blanks first before doing last.