Split dataset into yearly time series-CodePudding

I have a dataset that gives me a unique identifier and different columns with values for each year. It looks as follows:

ID	value90	value91	value92	value93	..	value16
ABCD	1	2	3	4	..	5
EFGH	6	7	8	9	..	10
IJKL	11	12	13	14	..	15

df = pd.Dataframe({'ID':[ABCD, EFGH, IJKL], 'value90':[1, 6, 11], 'value91':[2,7,12], 'value92':[3,8,13], 'value93':[4,9,14], 'value16':[5,10,15]}

I am looking for the following output:

Year	ID	value
1990	ABC	1
1991	ABC	2
1992	ABC	3
...	.....	...
2016	ABC	5
...	.....	...
2016	IJKL	15

I'm using pandas and am not sure how to produce the final output I'm seeking. Thanks!

CodePudding user response：

You can use pd.melt to get most of the way there, and then sort:

import pandas as pd

df = pd.DataFrame({'ID':['ABCD', 'EFGH', 'IJKL'], 'value90':[1, 6, 11], 'value91':[2,7,12], 'value92':[3,8,13], 'value93':[4,9,14], 'value16':[5,10,15]})

print(df)

df2 = pd.melt(df, id_vars=["ID"],var_name="Year", value_name="Value")
df2 = df2.sort_values(by=['ID'])
df2 = df2.filter(["Year","ID","Value"])

print(df2)

[Output]

     ID  value90  value91  value92  value93  value16
0  ABCD        1        2        3        4        5
1  EFGH        6        7        8        9       10
2  IJKL       11       12       13       14       15



       Year    ID  Value
0   value90  ABCD      1
3   value91  ABCD      2
6   value92  ABCD      3
9   value93  ABCD      4
12  value16  ABCD      5
1   value90  EFGH      6
4   value91  EFGH      7
7   value92  EFGH      8
10  value93  EFGH      9
13  value16  EFGH     10
2   value90  IJKL     11
5   value91  IJKL     12
8   value92  IJKL     13
11  value93  IJKL     14
14  value16  IJKL     15