Home > Software engineering >  Generate a Pandas dataframe for joining longitudinal data
Generate a Pandas dataframe for joining longitudinal data

Time:11-04

I have disparate longitudinal data. I want to create a "scaffolding" dataframe to join those data to. I have N longitudinal individuals and I know that each timeseries component should be Y periods long, uniform longitudinal segments. I'm trying to figure out a clean way to build this scaffolding datafame, with one column for individual ID and another for time, without using loops. Let's say that Y = 10. Here's a demo of what I have in mind, for two individuals:

timeseries = pd.DataFrame(np.arange(10),columns=['DATE'])

block1 = timeseries.copy()

block1['ID'] = 1

block2 = timeseries.copy()

block2['ID'] = 2

example = pd.concat([block1,block2])

example[['ID','DATE']] 

Building this out with a loop N times isn't the end of the world, but there's got to be a better way to do it.

CodePudding user response:

Use assign in a list comprehension and concat:

Y = 10
example = pd.concat([timeseries.assign(ID=n 1) for n in range(Y)])[['ID', 'DATE']]

Alternative:

Y = 10
example = (pd.concat([timeseries]*Y)
             .assign(ID=lambda d: np.arange(len(d))//len(timeseries) 1)
             [['ID', 'DATE']]
           )

output:

    ID  DATE
0    1     0
1    1     1
2    1     2
3    1     3
4    1     4
..  ..   ...
5   10     5
6   10     6
7   10     7
8   10     8
9   10     9

[100 rows x 2 columns]
  • Related