Home > Software engineering >  python: creating dummy variables per date
python: creating dummy variables per date

Time:11-14

I have the following dataframe:

id start_date end_date value 
1  2010       2014     A
2  2010       2013     C
3  2012       2013     D

and I want to create new variables with the year and the value in it:

id 2010 2011 2012 2013 2014
1  A    A    A    A    A
2  C    C    C    C    Na
3  Na   Na   D    D    Na  

CodePudding user response:

Create range by list comprehension, then use DataFrame.explode and DataFrame.pivot:

df['y'] = [range(s, e 1) for s, e in zip(df['start_date'], df['end_date'])]
df = df.explode('y').pivot('id','y','value')
print (df)
y  2010 2011 2012 2013 2014
id                         
1     A    A    A    A    A
2     C    C    C    C  NaN
3   NaN  NaN    D    D  NaN

If performance is important use Index.repeat with GroupBy.cumcount:

df = df.loc[df.index.repeat(df['end_date'].sub(df['start_date']).add(1))]
df['y'] = df['start_date'].add(df.groupby(level=0).cumcount())

df = df.pivot('id','y','value')
print (df)
y  2010 2011 2012 2013 2014
id                         
1     A    A    A    A    A
2     C    C    C    C  NaN
3   NaN  NaN    D    D  NaN
  • Related