I'm trying to create a dataframe in Pandas that has two variables ("date" and "time_of_day" where "date" is 120 observations long with 30 days (each day has four observations: 1,1,1,1; 2,2,2,2; etc.) and then the second variable "time_of_day) repeats 30 times with values of 1,2,3,4.
The closest I found to this question was here: How to create a series of numbers using Pandas in Python, which got me the below code, but I'm receiving an error that it must be a 1-dimensional array.
df = pd.DataFrame({'date': np.tile([pd.Series(range(1,31))],4), 'time_of_day': pd.Series(np.tile([1, 2, 3, 4],30 ))})
So the final dataframe would look something like
date | time_of_day |
---|---|
1 | 1 |
1 | 2 |
1 | 3 |
1 | 4 |
2 | 1 |
2 | 2 |
2 | 3 |
2 | 4 |
Thanks much!
CodePudding user response:
you need once np.repeat
and once np.tile
df = pd.DataFrame({'date': np.repeat(range(1,31),4),
'time_of_day': np.tile([1, 2, 3, 4],30)})
print(df.head(10))
date time_of_day
0 1 1
1 1 2
2 1 3
3 1 4
4 2 1
5 2 2
6 2 3
7 2 4
8 3 1
9 3 2
or you could use pd.MultiIndex.from_product
, same result.
df = (
pd.MultiIndex.from_product([range(1,31), range(1,5)],
names=['date','time_of_day'])
.to_frame(index=False)
)
or product
from itertools
from itertools import product
df = pd.DataFrame(product(range(1,31), range(1,5)), columns=['date','time_of_day'])
CodePudding user response:
New feature in merge cross
out = pd.DataFrame(range(1,31)).merge(pd.DataFrame([1, 2, 3, 4]),how='cross')