I have a dataframe like as below
data = pd.DataFrame({'email_id': ['[email protected];[email protected]','[email protected];[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]'],
'Dept_id': [21,23,25,26,28,29,31],
'dept_name':['Science','Chemistry','Maths','Social','Physics','Botany','Zoology'],
'location':['KOR','ASN','ANZ','IND','AUS','NZ','NZ']})
I would like to do the below
a) Split the dataframe based the unique email_id
values in the email_id
column
b) Store the split data in a excel file. each email_id row will have one file.
c) Name the excel file based on corresponding unique values present in dept_id
, dept_name
and location
columns
I tried the below
for k, v in data.groupby(['email_id']):
dept_unique_ids = v['Dept_id'].unique()
dept_unique_names = v['dept_name'].unique()
location_unique = v['location'].unique()
writer = pd.ExcelWriter(f"{k}.xlsx", engine='xlsxwriter')
v.to_excel(writer,columns=col_list,sheet_name=f'{k}',index=False, startrow = 1)
While the above code splits the file successfully but it names it based on the email_id
which is used as key. Instead I want to name the file based on dept_id
, dept_name
and location
for that specific key.
For ex: If you take email_id = [email protected]
, they have two unique dept_ids which is 29 and 31, their unique dept_name is Botany and Zoology
and unique location is NZ
.
So, I want my file name to be 29_31_Botany_Zoology_NZ.xlsx
.
Therefore, I expect my output files (for each unique email id row to have filenames like below)
update - error message
CodePudding user response:
IIUC, you can use:
for k, v in data.groupby(['email_id']):
dept_unique_ids = '_'.join(v['Dept_id'].astype(str).unique())
dept_unique_names = '_'.join(v['dept_name'].unique())
location_unique = '_'.join(v['location'].unique())
filename = '_'.join([dept_unique_ids, dept_unique_names, location_unique])
print(filename)
# Use a context manager
with pd.ExcelWriter(f"{filename}.xlsx", engine='xlsxwriter') as writer:
v.to_excel(writer,columns=col_list,sheet_name=f'{k}',index=False, startrow=1)
Output:
23_Chemistry_ASN
21_Science_KOR
25_Maths_ANZ
26_Social_IND
28_Physics_AUS
29_31_Botany_Zoology_NZ