Home > OS >  Transforming multiindex df into sorted xlsx in python
Transforming multiindex df into sorted xlsx in python

Time:05-31

long-time reader, first-time poster. I've been doing some time tracking for two projects, grouped the data by project and date using pandas, and would like to fill it into an existing Excel template for a client that is sorted by date (y-axis) and project (x-axis). But I'm stumped. I've been struggling to convert the multi-index dataframe into a sorted xlsx file.

Example data I want to sort

|Date       |  Project                  | Hours   |  
|-----------|---------------------------|---------| 
|2022-05-09 |Project 1                  |     5.50| 
|2022-05-09 |Project 1                  |     3.75| 
|2022-05-11 |Project 2                  |     1.50| 
|2022-05-11 |Project 2                  |     4.75| 
etc.

Desired template

|Date       |Project 1|Project 2| 
|-----------|---------|---------| 
|2022-05-09 |      5.5|     3.75| 
|2022-05-11 |     4.75|      1.5| 
etc...

So far I've tried a very basic iteration using openpyxl that has inserted the dates, but I can't figure out how to

a) rearrange the data in pandas so I can simply insert it or

b) how to write conditionally in openpyxl for a given date and project

# code grouping dates and projects
df = df.groupby(["Date", "Project"]).sum("Hours")

r = 10 # below the template headers and where I would start inserting time tracked
for date in df.index:
        sheet.cell(row=r, column=1).value = date
        r =1

I've trawled StackOverflow for answers but am coming up empty. Thanks for any help you can provide.

CodePudding user response:

I think your data sample is not correct. the 2nd row, instead of 2022-05-09 |Project 1|3.75, it should be 2022-05-09 |Project 2|3.75. The same with 4th row.

As I understand, your data is in long-format and your output is wide-format. In this case, pd.pivot_table can help:

pd.pivot_table(data=df, columns='name', index='year', values='hours').reset_index()

CodePudding user response:

df.pivot_table(index='Date', columns='Project', values='Hours')
Date    Project1    Project2
2022-05-09  5.5 3.75
2022-05-11  4.75    1.5
  • Related