I am trying to get the first non-null value of the list inside each row of Emails
column to write to the Email_final1
then get the next value of the list inside each row of Emails
, if there is one, to Emails_final2
otherwise to write Emails2
value to Emails2_final
if not blank and doesn't equal 'Emails' otherwise leave Emails_final2
blank. Lastly if a value from Emails 2
was written to Emails_final1
then make Emails_final2
None
I have tried many different ways to achieve this to no avail here is what I have so far including pseudo-code:
My Current Code:
df = pd.DataFrame({'Emails': [['[email protected]', '[email protected]', '[email protected]'],[None, '[email protected]'],[None,None,None]],
'Emails 2': ['[email protected]', '[email protected]','[email protected]'],
'num_specimen_seen': [10, 2,3]},
index=['falcon', 'dog','cat'])
df['Emails_final1'] = df['Emails'].explode().groupby(level=0).first()
#pseudo code
df['Emails_final2'] = df['Emails'].explode().groupby(level=0).next() #I know next doesn't exist but I want it to try to get the next value of 'Emails' before trying to get 'Emails 2 values.
Desired Output:
Emails_final1 Emails_final2
falcon [email protected] [email protected]
falcon [email protected] [email protected]
falcon [email protected] None
Any direction of how to approach a problem like this would be appreciated.
CodePudding user response:
It looks a bit messy but it works. Basically, we keep a boolean mask from the first step in filling "Emails_final1" and use it in the second step to fill "Emails_final1".
To fill the second column, the idea is to use groupby
nth
to get the second elements and if they don't match the previously selected emails; keep it (for example for the first row) but if it doesn't select from "Emails 2" column, unless it was already selected before (for example in the 3rd row):
exp_g = df['Emails'].explode().groupby(level=0)
df['Emails_final1'] = exp_g.first()
msk = df['Emails_final1'].notna()
df['Emails_final1'] = df['Emails_final1'].fillna(df['Emails 2'])
df['Emails_final2'] = exp_g.nth(1)
df['Emails_final2'] = df['Emails_final2'].mask(lambda x: ((x == df['Emails_final1']) | x.isna()) & msk, df['Emails 2'])
The relevant columns are:
Emails_final1 Emails_final2
falcon [email protected] [email protected]
dog [email protected] [email protected]
cat [email protected] None