Home > Mobile >  Pandas Group by multiple columns and select non null last values from non-grouped cols
Pandas Group by multiple columns and select non null last values from non-grouped cols

Time:10-14

Hi I have an excel sheet with fields like this

FName, LastName, DOB, BirthPlace, Address, Desig

John,  Cash,     21-09-1986, Darwin, , , 
John,  Cash,             , ,   , 22 Howard Springs Darwin, 
John,  Cash,        ,          , 20 Howard Springs Darwin     , Supervisor

So I want to groupby Fname, LastName, DOB, Birth Place but then I want pick only "last" non null fields off of other columns like Address and Desig. Also please note that I won't get all the col values in every row though there can be many updates for the same person. Like in 2nd row above I don't have DOB, BirthPlace, Desig. I can have many such rows with only one col updated each item but names and dob fields present most of the time.

I tried to do it this way

 df2 = input_df.groupby(['Fname','LastName','BirthPlace','DOB']).agg({ 'Address':'last',
                             'Desig':'last' 
                              }).reset_index()

But I get none or nulls in the output.

Expected output for me should be (address updated and designation set as well to last non null value in those cols)

John, Cash,21-09-1986, Darwin, 20 Howard Springs Darwin , Supervisor

CodePudding user response:

If there is possible defined groups by only Fname,LastName solution should be simplify by enter image description here

Pardon the use of tail(1) instead of last(). Don't know why it didn't work. But I think you get the picture. Extend your agg with a lambda function to get rid of the blanks first before doing last.

  • Related