Home > OS >  How to expand row into multiple rows given a start and end date
How to expand row into multiple rows given a start and end date

Time:02-04

I'm trying to make a dataframe made of job records that looks something like this

id start date End Date
01 2023-01-30 2023-02-19
02 2023-06-12 2023-06-25

My goal is to expand each record so that there are x many rows for each id where x is the number of weeks between start and end dates.

It should look something like this:

id start date End Date Week Week Count Year Week # Period
01 2023-01-30 2023-02-19 2023-01-30 1 2023 5 2023-5
01 2023-01-30 2023-02-19 2023-02-06 2 2023 6 2023-6
01 2023-01-30 2023-02-19 2023-02-13 3 2023 7 2023-7
02 2023-06-12 2023-06-25 2023-06-12 1 2023 24 2023-24
02 2023-06-12 2023-06-25 2023-06-19 2 2023 25 2023-25

I tried manually calculating the number of weeks before dates and then expanding the dataframe using

df = df.loc[df.index.repeat(df['# of weeks')].reset_index(drop=True)

I was wondering if there was a better way to automatically extract the number of weeks between dates and each week for every row per job record?

Trying the method here's the code I used and the resulting error:

df['Id'] = df['Id'].astype(str)
df['weeks'] = np.ceil((df['End_Date__c'] - df['Start_Date__c']) / np.timedelta64(1, "W"))

exp_df = pd.concat([pd.DataFrame({
    'Id': np.repeat(idx 1, weeks),
    'Start_Date__c': start,
    'End_Date__c ': end,
    'Week': pd.date_range(start, periods=weeks, freq='W-MON'),
    'Week Count': np.arange(1, weeks 1),
    'Year': pd.date_range(start, periods=weeks, freq='W-MON').year,
    'Week #': pd.date_range(start, periods=weeks, freq='W-MON').strftime('%U'),
    'Period': pd.date_range(start, periods=weeks, freq='W-MON').strftime('%Y-%U')
}) for idx, (start, end, weeks) in df.loc[:, ['Start_Date__c', 'End_Date__c', 'End_Date__c']].iterrows()])

exp_df

Error:

---------------------------------------------------------------------------
TypeError                       Traceback (most recent call last)
Cell In[11], line 4
  1 df['Id'] = df['Id'].astype(str)
  2 df['weeks'] = np.ceil((df['End_Date__c'] - df['Start_Date__c']) / np.timedelta64(1, "W"))
----> 4 exp_df = pd.concat([pd.DataFrame({
      5     'Id': np.repeat(idx 1, weeks),
      6     'Start_Date__c': start,
      7     'End_Date__c ': end,
      8     'Week': pd.date_range(start, periods=weeks, freq='W-MON'),
      9     'Week Count': np.arange(1, weeks 1),
     10     'Year': pd.date_range(start, periods=weeks, freq='W-MON').year,
     11     'Week #': pd.date_range(start, periods=weeks, freq='W-MON').strftime('%U'),
     12     'Period': pd.date_range(start, periods=weeks, freq='W-MON').strftime('%Y-%U')
     13 }) for idx, (start, end, weeks) in df.loc[:, ['Start_Date__c', 'End_Date__c', 'End_Date__c']].iterrows()])
     15 exp_df

Cell In[11], line 5, in <listcomp>(.0)
      1 df['Id'] = df['Id'].astype(str)
      2 df['weeks'] = np.ceil((df['End_Date__c'] - df['Start_Date__c']) / np.timedelta64(1, "W"))
      4 exp_df = pd.concat([pd.DataFrame({
----> 5     'Id': np.repeat(idx 1, weeks),
      6     'Start_Date__c': start,
      7     'End_Date__c ': end,
      8     'Week': pd.date_range(start, periods=weeks, freq='W-MON'),
      9     'Week Count': np.arange(1, weeks 1),
     10     'Year': pd.date_range(start, periods=weeks, freq='W-MON').year,
     11     'Week #': pd.date_range(start, periods=weeks, freq='W-MON').strftime('%U'),
     12     'Period': pd.date_range(start, periods=weeks, freq='W-MON').strftime('%Y-%U')
     13 }) for idx, (start, end, weeks) in df.loc[:, ['Start_Date__c', 'End_Date__c', 'End_Date__c']].iterrows()])
     15 exp_df

File <__array_function__ internals>:5, in repeat(*args, **kwargs)

File /opt/anaconda3/lib/python3.9/site-packages/numpy/core/fromnumeric.py:479, in repeat(a, repeats, axis)
    436 @array_function_dispatch(_repeat_dispatcher)
    437 def repeat(a, repeats, axis=None):
    438     """
    439     Repeat elements of an array.
    440 
   (...)
    477 
    478     """
--> 479     return _wrapfunc(a, 'repeat', repeats, axis=axis)

File /opt/anaconda3/lib/python3.9/site- 
packages/numpy/core/fromnumeric.py:54, in _wrapfunc(obj, method, *args, **kwds)
     52 bound = getattr(obj, method, None)
     53 if bound is None:
---> 54     return _wrapit(obj, method, *args, **kwds)
     56 try:
     57     return bound(*args, **kwds)

File /opt/anaconda3/lib/python3.9/site-packages/numpy/core/fromnumeric.py:43, in _wrapit(obj, method, *args, **kwds)
     41 except AttributeError:
     42     wrap = None
---> 43 result = getattr(asarray(obj), method)(*args, **kwds)
     44 if wrap:
     45     if not isinstance(result, mu.ndarray):

TypeError: int() argument must be a string, a bytes-like object or a number, not 'Timestamp'

CodePudding user response:

Here is an approach using numpy and pd.concat(). import numpy as np

df['start date'] = pd.to_datetime(df['start date'])
df['End Date'] = pd.to_datetime(df['End Date'])

df['weeks'] = np.ceil((df['End Date'] - df['start date']) / np.timedelta64(1, "W"))

exp_df = pd.concat([pd.DataFrame({
    'id': np.repeat(idx, weeks),
    'start date': start,
    'End Date': end,
    'Week': pd.date_range(start, periods=weeks, freq='W-MON'),
    'Week Count': np.arange(1, weeks 1),
    'Year': pd.date_range(start, periods=weeks, freq='W-MON').year,
    'Week #': pd.date_range(start, periods=weeks, freq='W-MON').strftime('%U'),
    'Period': pd.date_range(start, periods=weeks, freq='W-MON').strftime('%Y-%U')
}) for idx, (start, end, weeks) in df.loc[:, ['start date', 'End Date', 'weeks']].iterrows()])

exp_df = exp_df.reset_index(drop=True)
print(exp_df)

Other methods used:
np.ceil()
df.loc()


   id start date   End Date       Week  Week Count  Year Week #   Period
0   1 2023-01-30 2023-02-19 2023-01-30         1.0  2023     05  2023-05
1   1 2023-01-30 2023-02-19 2023-02-06         2.0  2023     06  2023-06
2   1 2023-01-30 2023-02-19 2023-02-13         3.0  2023     07  2023-07
3   2 2023-06-12 2023-06-25 2023-06-12         1.0  2023     24  2023-24
4   2 2023-06-12 2023-06-25 2023-06-19         2.0  2023     25  2023-25
  • Related