I am trying to create a data frame that includes all the dates between '1/1/2019' and '28/02/2022', for each of 3 countries and each of 9 SKUs. I am following this approach for days that is working fine:
days = pd.DataFrame(pd.date_range(start='1/1/2019', end='28/02/2022',freq='D'))
for i in range(26):
days=days.append(pd.DataFrame(pd.date_range(start='1/1/2019', end='28/02/2022',freq='D')))
days.rename(columns={0:'Date'},inplace=True)
days.reset_index(inplace=True,drop=True)
This data frame gives a column with 31185 rows that correspond to the 1155 days between that range multiplied by 27 (3x9). Now I would like to assign the country name and sku name. To populate the countries I am following this manual approach, but it takes a while to load. Is there a smarter way to do it? I would like to do the same for the 9 SKUs.
for i in range(0,10395):
days.loc[i,'country']= 'Austria'
for i in range(10395,20790):
days.loc[i,'country']= 'Germany'
for i in range(20790,31185):
days.loc[i,'country']= 'Switzerland'
Thanks in advance!
CodePudding user response:
Try days.loc[range(0,10395),'country']= 'Austria'
CodePudding user response:
A more efficient apporach is cross-multiply the 3 arrays representing Date
, Country
and SKU
together:
df = pd.MultiIndex.from_product([
pd.date_range(start='1/1/2019', end='28/02/2022', freq='D'),
['Austria', 'Germany', 'Switzerland'],
[f'SKU {i}' for i in range(9)]
], names=['Date', 'Country', 'SKU']).to_frame(index=False)