I have a dataset that gives me a unique identifier and different columns with values for each year. It looks as follows:
ID | value90 | value91 | value92 | value93 | .. | value16 |
---|---|---|---|---|---|---|
ABCD | 1 | 2 | 3 | 4 | .. | 5 |
EFGH | 6 | 7 | 8 | 9 | .. | 10 |
IJKL | 11 | 12 | 13 | 14 | .. | 15 |
df = pd.Dataframe({'ID':[ABCD, EFGH, IJKL], 'value90':[1, 6, 11], 'value91':[2,7,12], 'value92':[3,8,13], 'value93':[4,9,14], 'value16':[5,10,15]}
I am looking for the following output:
Year | ID | value |
---|---|---|
1990 | ABC | 1 |
1991 | ABC | 2 |
1992 | ABC | 3 |
... | ..... | ... |
2016 | ABC | 5 |
... | ..... | ... |
2016 | IJKL | 15 |
I'm using pandas and am not sure how to produce the final output I'm seeking. Thanks!
CodePudding user response:
You can use pd.melt to get most of the way there, and then sort:
import pandas as pd
df = pd.DataFrame({'ID':['ABCD', 'EFGH', 'IJKL'], 'value90':[1, 6, 11], 'value91':[2,7,12], 'value92':[3,8,13], 'value93':[4,9,14], 'value16':[5,10,15]})
print(df)
df2 = pd.melt(df, id_vars=["ID"],var_name="Year", value_name="Value")
df2 = df2.sort_values(by=['ID'])
df2 = df2.filter(["Year","ID","Value"])
print(df2)
[Output]
ID value90 value91 value92 value93 value16
0 ABCD 1 2 3 4 5
1 EFGH 6 7 8 9 10
2 IJKL 11 12 13 14 15
Year ID Value
0 value90 ABCD 1
3 value91 ABCD 2
6 value92 ABCD 3
9 value93 ABCD 4
12 value16 ABCD 5
1 value90 EFGH 6
4 value91 EFGH 7
7 value92 EFGH 8
10 value93 EFGH 9
13 value16 EFGH 10
2 value90 IJKL 11
5 value91 IJKL 12
8 value92 IJKL 13
11 value93 IJKL 14
14 value16 IJKL 15