I have the following dataframe:
id start_date end_date value
1 2010 2014 A
2 2010 2013 C
3 2012 2013 D
and I want to create new variables with the year and the value in it:
id 2010 2011 2012 2013 2014
1 A A A A A
2 C C C C Na
3 Na Na D D Na
CodePudding user response:
Create range
by list comprehension, then use DataFrame.explode
and DataFrame.pivot
:
df['y'] = [range(s, e 1) for s, e in zip(df['start_date'], df['end_date'])]
df = df.explode('y').pivot('id','y','value')
print (df)
y 2010 2011 2012 2013 2014
id
1 A A A A A
2 C C C C NaN
3 NaN NaN D D NaN
If performance is important use Index.repeat
with GroupBy.cumcount
:
df = df.loc[df.index.repeat(df['end_date'].sub(df['start_date']).add(1))]
df['y'] = df['start_date'].add(df.groupby(level=0).cumcount())
df = df.pivot('id','y','value')
print (df)
y 2010 2011 2012 2013 2014
id
1 A A A A A
2 C C C C NaN
3 NaN NaN D D NaN