I have a dataframe like as below
df1 = pd.DataFrame({'region': ['ASEAN','ANZ','INDIA','KOREA'],
'first_contact':['user1','user2','user3','user4'],'first_contact_email':['[email protected]','[email protected]','[email protected]','[email protected]'],
'second_contact':['user11','user21','user31','user41'],'second_contact_email':['[email protected]','[email protected]','[email protected]','[email protected]'],
'third_contact':['user111','user211','user311','user411'],'third_contact_email':['[email protected]','[email protected]','[email protected]','[email protected]'],
'fourth_contact':['user1111','user2111','user3111','user4111'],'fourth_contact_email':['[email protected]','[email protected]','[email protected]','[email protected]'],
'fifth_contact':['user11111','user21111','user31111','user41111'],'fifth_contact_email':['[email protected]','[email protected]',np.nan,np.nan],
'sixth_contact':['user111111','user211111','user311111','user411111'],'sixth_contact_email':['[email protected]','[email protected]',np.nan,np.nan]})
I would like to do the below
a) iterate through each row
b) get all email_id for that row
c) append it to a list
So, I tried the below
region_list = ['ASEAN','INDIA','KOREA','ANZ']
for region in region_list:
email_list=[]
temp_email_df = df1[df1['Region']==region]
email_list.append(temp_email_df.iloc[0,2:,2].to_string(header=False, index=False))
print(email_list)
--send email code follows here
for each region, I would like to get their list of email ids and use them to send an email.
But now the issue is in extracting all the emails.
How can I do to extract email-ids from all the email columns using iloc or positions etc
I expect my output to print email_list for each region
CodePudding user response:
You can use isin
method to filter the regions based on the list you have, then filter the columns that has email
in the name, finally stack
it which will essentially remove NaN
values, then call to_list()
to create list out of it:
>>> df1[df1['region'].isin(region_list)].filter(like='email').stack().to_list()
['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']
But if you need to get the email list by region, you can group the dataframe by regions
after filtering the regions of interest then follow the rest of logic
(df1[df1['region'].isin(region_list)]
.set_index('region')
.filter(like='email')
.groupby(level=0)
.apply(lambda x: x.stack().tolist())
.reset_index())
region 0
0 ANZ [[email protected], [email protected], user211@gm...
1 ASEAN [[email protected], [email protected], user111@gm...
2 INDIA [[email protected], [email protected], user311@gm...
3 KOREA [[email protected], [email protected], user411@gm...
CodePudding user response:
Try as follows.
- Use
df.melt
to unpivotdf1
from wide to long format for allcolumns
that contain the substring "email". - Use
df.dropna
withvalue
(column that now contains all "emails") as the subset to get rid ofNaN
values. - Finally, use
df.groupby
to group "emails" per region, and thenapply(list)
to group the individual values in a list. - Result will be a
pd.Series
with the regions asindex
. So, to access the specific lists, you use, e.g.,emails['ANZ']
, and then first email:emails['ANZ'][0]
(i.e. "[email protected]").
emails = df1.melt(id_vars='region',
value_vars=[col for col in df1.columns if 'email' in col]
).dropna(subset='value').groupby('region')['value'].apply(list)
print(emails)
region
ANZ [[email protected], [email protected], user211@gm...
ASEAN [[email protected], [email protected], user111@gm...
INDIA [[email protected], [email protected], user311@gm...
KOREA [[email protected], [email protected], user411@gm...
Name: value, dtype: object
# e.g. for `ANZ`:
print(emails['ANZ'])
['[email protected]', '[email protected]', '[email protected]',
'[email protected]', '[email protected]', '[email protected]']