Here is the dataset I have:
Employee | Jul | Aug | Sep | Oct | Nov | Dec | Jan | Feb | Mar | Apr | May | Jun | Year | Index | Program | Status | Bid Category | Account |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
E_123456789 | #missing | #missing | #missing | #missing | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | FY22 | I_1234 | NA Entity | Final | Janitor 1 | Overhead |
E_123456799 | #missing | 0 | 0 | 0 | 0 | 0 | 0 | .5 | .9 | .1 | 0 | 0 | FY22 | I_1255 | P12345123 | Extended | Cook 3 | Utilization |
This is what I'd like to have:
Employee | Month | Year | Index | Program | Status | Bid Category | Account | Percentage |
---|---|---|---|---|---|---|---|---|
E_123456789 | Jul | FY22 | I_1234 | NA Entity | Final | Janitor 1 | Overhead | #missing |
E_123456789 | Aug | FY22 | I_1234 | NA Entity | Final | Janitor 1 | Overhead | #missing |
E_123456789 | Sep | FY22 | I_1234 | NA Entity | Final | Janitor 1 | Overhead | #missing |
E_123456789 | Oct | FY22 | I_1234 | NA Entity | Final | Janitor 1 | Overhead | #missing |
E_123456789 | Nov | FY22 | I_1234 | NA Entity | Final | Janitor 1 | Overhead | 1 |
E_123456789 | Dec | FY22 | I_1234 | NA Entity | Final | Janitor 1 | Overhead | 1 |
E_123456789 | Jan | FY22 | I_1234 | NA Entity | Final | Janitor 1 | Overhead | 1 |
E_123456789 | Feb | FY22 | I_1234 | NA Entity | Final | Janitor 1 | Overhead | 1 |
E_123456789 | Mar | FY22 | I_1234 | NA Entity | Final | Janitor 1 | Overhead | 1 |
E_123456789 | Apr | FY22 | I_1234 | NA Entity | Final | Janitor 1 | Overhead | 1 |
E_123456789 | May | FY22 | I_1234 | NA Entity | Final | Janitor 1 | Overhead | 1 |
E_123456789 | Jun | FY22 | I_1234 | NA Entity | Final | Janitor 1 | Overhead | 1 |
E_123456799 | Jul | FY22 | I_1255 | P12345123 | Extended | Cook 3 | Utilization | #missing |
E_123456799 | Aug | FY22 | I_1255 | P12345123 | Extended | Cook 3 | Utilization | 0 |
E_123456799 | Sep | FY22 | I_1255 | P12345123 | Extended | Cook 3 | Utilization | 0 |
E_123456799 | Oct | FY22 | I_1255 | P12345123 | Extended | Cook 3 | Utilization | 0 |
E_123456799 | Nov | FY22 | I_1255 | P12345123 | Extended | Cook 3 | Utilization | 0 |
E_123456799 | Dec | FY22 | I_1255 | P12345123 | Extended | Cook 3 | Utilization | 0 |
E_123456799 | Jan | FY22 | I_1255 | P12345123 | Extended | Cook 3 | Utilization | 0 |
E_123456799 | Feb | FY22 | I_1255 | P12345123 | Extended | Cook 3 | Utilization | .5 |
E_123456799 | Mar | FY22 | I_1255 | P12345123 | Extended | Cook 3 | Utilization | .9 |
E_123456799 | Apr | FY22 | I_1255 | P12345123 | Extended | Cook 3 | Utilization | .1 |
E_123456799 | May | FY22 | I_1255 | P12345123 | Extended | Cook 3 | Utilization | 0 |
E_123456799 | Jun | FY22 | I_1255 | P12345123 | Extended | Cook 3 | Utilization | 0 |
I found steps that seemed to do the reverse of this (convert the "Month" column into Jul/Aug/Sep/etc) but I can't find anything related to this. I am very new to Python (read: a couple weeks of use) and am sure there is a relatively easy way to do this that I just don't know.
Here is my file:
import pandas as pd
data = pd.read_csv('SMART_Export_Test.csv')
Using pivot does the opposite of what I want (it spreads the values in the Jul column into new column names):
data.pivot(columns = 'Jul', values='Jul')
Any help/direction would be greatly appreciated!
CodePudding user response:
Try with melt
:
>>> df.melt(["Employee", "Year", "Index", "Program", "Status", "Bid Category", "Account"],
["Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"],
"Percentage")
Employee Year Index ... Account Percentage value
0 E_123456789 FY22 I_1234 ... Overhead Jan 1
1 E_123456799 FY22 I_1255 ... Utilization Jan 0
2 E_123456789 FY22 I_1234 ... Overhead Feb 1.0
3 E_123456799 FY22 I_1255 ... Utilization Feb 0.5
4 E_123456789 FY22 I_1234 ... Overhead Mar 1.0
5 E_123456799 FY22 I_1255 ... Utilization Mar 0.9
6 E_123456789 FY22 I_1234 ... Overhead Apr 1.0
7 E_123456799 FY22 I_1255 ... Utilization Apr 0.1
8 E_123456789 FY22 I_1234 ... Overhead May 1
9 E_123456799 FY22 I_1255 ... Utilization May 0
10 E_123456789 FY22 I_1234 ... Overhead Jun 1
11 E_123456799 FY22 I_1255 ... Utilization Jun 0
12 E_123456789 FY22 I_1234 ... Overhead Jul #missing
13 E_123456799 FY22 I_1255 ... Utilization Jul #missing
14 E_123456789 FY22 I_1234 ... Overhead Aug #missing
15 E_123456799 FY22 I_1255 ... Utilization Aug 0
16 E_123456789 FY22 I_1234 ... Overhead Sep #missing
17 E_123456799 FY22 I_1255 ... Utilization Sep 0
18 E_123456789 FY22 I_1234 ... Overhead Oct #missing
19 E_123456799 FY22 I_1255 ... Utilization Oct 0
20 E_123456789 FY22 I_1234 ... Overhead Nov 1
21 E_123456799 FY22 I_1255 ... Utilization Nov 0
22 E_123456789 FY22 I_1234 ... Overhead Dec 1
23 E_123456799 FY22 I_1255 ... Utilization Dec 0
[24 rows x 9 columns]