I am trying to get the first value of the the list in each row of df['Emails'] but in real life (this is a sample df) I don't know what the length of the list will be so I am just assuming that the longest will be length of 5 and then trying to whittle it down until I find the right length and selecting that index position but I am getting IndexError: index 5 is out of bounds for axis 0 with size 2
and I can't figure out what to do about it. Any help appreciated. Thanks.
my current code:
df = pd.DataFrame({'Emails': [['[email protected]', '[email protected]', '[email protected]'],[None, '[email protected]']],
'num_wings': [2, 0],
'num_specimen_seen': [10, 2]},
index=['falcon', 'dog'])
df['Emails'] = np.select([df['Emails'][0],df['Emails'][1],df['Emails'][2]],[df['Emails'][0],df['Emails'][1],df['Emails'][2]])
print(data['Emails'])
Expected output:
Assuming the original dataframe has None
in the first index position I want it to take the next index position that isn't None
Desired Output
Emails num_wings num_specimen_seen
falcon [email protected] 2 10
dog [email protected] 0 2
CodePudding user response:
Whenever you have a column containing lists, explode
will often be your friend, and this is the case here.
Use explode
, groupby(level=0)
(to group on the 0th (first) level of the index), and first
(which selects the first non-null value (including None, NaN, etc.))
df['Emails'] = df['Emails'].explode().groupby(level=0).first()
Output:
>>> df
Emails num_wings num_specimen_seen
falcon [email protected] 2 10
dog [email protected] 0 2