Home > Back-end >  Generate weeks from column with dates
Generate weeks from column with dates

Time:10-25

I have a large dataset which contains a date column that covers from the year 2019. Now I do want to generate number of weeks on a separate column that are contained in those dates.

Here is how the date column looks like:

import pandas as pd

data = {'date': ['2019-09-10', 'NaN', '2019-10-07', '2019-11-04', '2019-11-28',
       '2019-12-02', '2020-01-24', '2020-01-29', '2020-02-05',
       '2020-02-12', '2020-02-14', '2020-02-24', '2020-03-11',
       '2020-03-16', '2020-03-17', '2020-03-18', '2021-09-14',
       '2021-09-30', '2021-10-07', '2021-10-08', '2021-10-12',
       '2021-10-14', '2021-10-15', '2021-10-19', '2021-10-21',
       '2021-10-26', '2021-10-28', '2021-10-29', '2021-11-02',
       '2021-11-15', '2021-11-16', '2021-12-01', '2021-12-07',
       '2021-12-09', '2021-12-10', '2021-12-14', '2021-12-15',
       '2022-01-13', '2022-01-14', '2022-01-21', '2022-01-24',
       '2022-01-25', '2022-01-27', '2022-01-31', '2022-02-01',
       '2022-02-10', '2022-02-11', '2022-02-16', '2022-02-24']}

df = pd.DataFrame(data)

Now as from the first day this data was collected, I want to count 7 days using the date column and create a week out it. an example if the first week contains the 7 dates, I create a column and call it week one. I want to do the same process until the last week the data was collected.

Maybe it will be a good idea to organize the dates in order as from the first date to current one.

I have tried this but its not generating weeks in order, it actually has repetitive weeks.

pd.to_datetime(df['date'], errors='coerce').dt.week

My intention is, as from the first date the date was collected, count 7 days and store that as week one then continue incrementally until the last week say week number 66.

Here is the expected column of weeks created from the date column

import pandas as pd

week_df = {'weeks': ['1', '2', "3", "5", '6']}

df_weeks = pd.DataFrame(week_df)

CodePudding user response:

IIUC use:

df['date'] = pd.to_datetime(df['date'])

df['week'] = df['date'].sub(df['date'].iat[0]).dt.days // 7   1
print (df.head(10))
        date  week
0 2019-09-10   1.0
1        NaT   NaN
2 2019-10-07   4.0
3 2019-11-04   8.0
4 2019-11-28  12.0
5 2019-12-02  12.0
6 2020-01-24  20.0
7 2020-01-29  21.0
8 2020-02-05  22.0
9 2020-02-12  23.0

CodePudding user response:

You have more than 66 weeks here, so either you want the real week count since the beginning or you want a dummy week rank. See below for both solutions:

# convert to week period
s = pd.to_datetime(df['date']).dt.to_period('W')

# get real week number
df['week'] = s.sub(s.iloc[0]).dropna().apply(lambda x: x.n).add(1)

# get dummy week rank
df['week2'] = s.rank(method='dense')

output:

          date   week  week2
0   2019-09-10    1.0    1.0
1          NaN    NaN    NaN
2   2019-10-07    5.0    2.0
3   2019-11-04    9.0    3.0
4   2019-11-28   12.0    4.0
5   2019-12-02   13.0    5.0
6   2020-01-24   20.0    6.0
7   2020-01-29   21.0    7.0
8   2020-02-05   22.0    8.0
9   2020-02-12   23.0    9.0
10  2020-02-14   23.0    9.0
11  2020-02-24   25.0   10.0
12  2020-03-11   27.0   11.0
13  2020-03-16   28.0   12.0
14  2020-03-17   28.0   12.0
15  2020-03-18   28.0   12.0
16  2021-09-14  106.0   13.0
17  2021-09-30  108.0   14.0
18  2021-10-07  109.0   15.0
19  2021-10-08  109.0   15.0
...
42  2022-01-27  125.0   26.0
43  2022-01-31  126.0   27.0
44  2022-02-01  126.0   27.0
45  2022-02-10  127.0   28.0
46  2022-02-11  127.0   28.0
47  2022-02-16  128.0   29.0
48  2022-02-24  129.0   30.0
  • Related