Home > Software design >  Getting next value of groupby after explode with pandas
Getting next value of groupby after explode with pandas

Time:03-17

I am trying to get the first non-null value of the list inside each row of Emails column to write to the Email_final1 then get the next value of the list inside each row of Emails, if there is one, to Emails_final2 otherwise to write Emails2 value to Emails2_final if not blank and doesn't equal 'Emails' otherwise leave Emails_final2 blank. Lastly if a value from Emails 2 was written to Emails_final1 then make Emails_final2 None I have tried many different ways to achieve this to no avail here is what I have so far including pseudo-code:

My Current Code:

df = pd.DataFrame({'Emails': [['[email protected]', '[email protected]', '[email protected]'],[None, '[email protected]'],[None,None,None]],
                   'Emails 2': ['[email protected]', '[email protected]','[email protected]'],
                   'num_specimen_seen': [10, 2,3]},
                  index=['falcon', 'dog','cat'])
df['Emails_final1'] = df['Emails'].explode().groupby(level=0).first()
#pseudo code
df['Emails_final2'] = df['Emails'].explode().groupby(level=0).next()  #I know next doesn't exist but I want it to try to get the next value of 'Emails' before trying to get 'Emails 2 values. 

Desired Output:

         Emails_final1      Emails_final2  
falcon   [email protected]      [email protected]                
falcon   [email protected]      [email protected]                
falcon   [email protected]      None                 

Any direction of how to approach a problem like this would be appreciated.

CodePudding user response:

It looks a bit messy but it works. Basically, we keep a boolean mask from the first step in filling "Emails_final1" and use it in the second step to fill "Emails_final1".

To fill the second column, the idea is to use groupby nth to get the second elements and if they don't match the previously selected emails; keep it (for example for the first row) but if it doesn't select from "Emails 2" column, unless it was already selected before (for example in the 3rd row):

exp_g = df['Emails'].explode().groupby(level=0)
df['Emails_final1'] = exp_g.first()
msk = df['Emails_final1'].notna()
df['Emails_final1'] = df['Emails_final1'].fillna(df['Emails 2'])
df['Emails_final2'] = exp_g.nth(1)
df['Emails_final2'] = df['Emails_final2'].mask(lambda x: ((x == df['Emails_final1']) | x.isna()) & msk, df['Emails 2'])

The relevant columns are:

        Emails_final1  Emails_final2
falcon  [email protected]   [email protected]
dog     [email protected]  [email protected]
cat     [email protected]           None
  • Related