Home > Net >  Split dataset into yearly time series
Split dataset into yearly time series

Time:10-21

I have a dataset that gives me a unique identifier and different columns with values for each year. It looks as follows:

ID value90 value91 value92 value93 .. value16
ABCD 1 2 3 4 .. 5
EFGH 6 7 8 9 .. 10
IJKL 11 12 13 14 .. 15
df = pd.Dataframe({'ID':[ABCD, EFGH, IJKL], 'value90':[1, 6, 11], 'value91':[2,7,12], 'value92':[3,8,13], 'value93':[4,9,14], 'value16':[5,10,15]}

I am looking for the following output:

Year ID value
1990 ABC 1
1991 ABC 2
1992 ABC 3
... ..... ...
2016 ABC 5
... ..... ...
2016 IJKL 15

I'm using pandas and am not sure how to produce the final output I'm seeking. Thanks!

CodePudding user response:

You can use pd.melt to get most of the way there, and then sort:

import pandas as pd

df = pd.DataFrame({'ID':['ABCD', 'EFGH', 'IJKL'], 'value90':[1, 6, 11], 'value91':[2,7,12], 'value92':[3,8,13], 'value93':[4,9,14], 'value16':[5,10,15]})

print(df)

df2 = pd.melt(df, id_vars=["ID"],var_name="Year", value_name="Value")
df2 = df2.sort_values(by=['ID'])
df2 = df2.filter(["Year","ID","Value"])

print(df2)

[Output]

     ID  value90  value91  value92  value93  value16
0  ABCD        1        2        3        4        5
1  EFGH        6        7        8        9       10
2  IJKL       11       12       13       14       15



       Year    ID  Value
0   value90  ABCD      1
3   value91  ABCD      2
6   value92  ABCD      3
9   value93  ABCD      4
12  value16  ABCD      5
1   value90  EFGH      6
4   value91  EFGH      7
7   value92  EFGH      8
10  value93  EFGH      9
13  value16  EFGH     10
2   value90  IJKL     11
5   value91  IJKL     12
8   value92  IJKL     13
11  value93  IJKL     14
14  value16  IJKL     15
  • Related