I have disparate longitudinal data. I want to create a "scaffolding" dataframe to join those data to. I have N longitudinal individuals and I know that each timeseries component should be Y periods long, uniform longitudinal segments. I'm trying to figure out a clean way to build this scaffolding datafame, with one column for individual ID and another for time, without using loops. Let's say that Y = 10. Here's a demo of what I have in mind, for two individuals:
timeseries = pd.DataFrame(np.arange(10),columns=['DATE'])
block1 = timeseries.copy()
block1['ID'] = 1
block2 = timeseries.copy()
block2['ID'] = 2
example = pd.concat([block1,block2])
example[['ID','DATE']]
Building this out with a loop N times isn't the end of the world, but there's got to be a better way to do it.
CodePudding user response:
Use assign
in a list comprehension and concat
:
Y = 10
example = pd.concat([timeseries.assign(ID=n 1) for n in range(Y)])[['ID', 'DATE']]
Alternative:
Y = 10
example = (pd.concat([timeseries]*Y)
.assign(ID=lambda d: np.arange(len(d))//len(timeseries) 1)
[['ID', 'DATE']]
)
output:
ID DATE
0 1 0
1 1 1
2 1 2
3 1 3
4 1 4
.. .. ...
5 10 5
6 10 6
7 10 7
8 10 8
9 10 9
[100 rows x 2 columns]