Could you please help me to get output for considering two worksheets (Emp Salary- Nov-22.CSV / Emp Salary- Dec-22.CSV). I want to get each column's unique name value in new output file.
CodePudding user response:
Welcome to stackoverflow, you are required to post whatever work you have done so far to tackle the problem.
To answer your question, this can be done is excel with pivot tables. But if you are looking for a pandas method... I have created 2 dataframes like you have
import pandas as pd
import numpy as np
df1 = pd.DataFrame(
{ "Name": ['Govind', 'Chetan', 'Rahul'],
"City": ['Mumbai', 'Banglore', 'Pune'],
"Salary": [1, 1, 1] })
df2 = pd.DataFrame(
{ "Name": ['Govind', 'Chetan', 'Kalpesh'],
"City": ['Mumbai', 'Banglore', 'Pune'],
"Salary": [1, 1, 1] })
You can then use concat to concatenate them
df = pd.concat([df1, df2], axis=0)
df
and you can use groupby() and reset_index() to get what you want
df.groupby(['Name','City'])['Salary'].sum().reset_index()
CodePudding user response:
You can use pandas.read_excel
with sheet_name=None
to read all the sheets at once and then pass the dictionnary of dataframes made to pandas.concat
and finally use Groupby.sum
for aggregation :
import pandas as pd
out = (
pd.concat(pd.read_excel("/input_spreadsheet.xlsx", sheet_name=None), ignore_index=True)
.groupby(["Name", "City"], as_index=False)["Salary"].sum()
)
After that, if needed, you can make an new spreadsheet with pandas.DataFrame.to_excel
and/or a (.csv
) file with pandas.DataFrame.to_csv
:
out.to_excel("/output_spreadsheet.xlsx", sheet_name="Emp Salary (Total).xlsx", index=False)
out.to_csv("/output_csvfile.csv", sheet_name="Emp Salary (Total).csv", sep=",", index=False) #sep="," by default
# Output :
print(out)
Name City Salary
0 Chetan Bangalore 60000
1 Dipesh Pune 50000
2 Govind Mumbai 200000
3 Kalpesh Kolkata 40000
4 Rahul Kolkata 40000
5 Santosh Pune 50000
6 Siddharth Hyderabad 50000