Efficient way of converting year_week to datetime in pandas-CodePudding

I have a pandas df with two columns year and week_number.

df = pd.DataFrame({'year': [2019, 2020, 2021, 2022], 'week_number':[3,12,38,42]})
df

    year    week_number
0   2019    3
1   2020    12
2   2021    38
3   2022    42

I know I can apply something like following to each row and convert them to datetime values however, I want to know if there is more efficient way to do this for the big dataframes and store the results in third column?

import datetime
single_day = "2013-26"
converted_date = datetime.datetime.strptime(single_day   '-1', "%Y-%W-%w")
print(converted_date)

CodePudding user response：

I wouldn't say your way is inefficient, but if you want a fully vectorized way, without having to import another library, and which appends your dataframe, this might be what you're looking for

import pandas as pd
df = pd.DataFrame({'year': [2019, 2020, 2021, 2022], 'week_number':[3,12,38,42]})
df['date'] = pd.to_datetime((df['year']*100 df['week_number']).astype(str)   '0', format='%Y%W%w')
df

CodePudding user response：

If you are on Python >= 3.8, use datetime.date.fromisocalendar. Also works for datetime.

# 11. May 2022 is a Wednsesday in the 19h week
>>> date.fromisocalendar(2022, 19, 3)
datetime.date(2022, 5, 11)

As new Column:

df['date'] = df[['year', 'week_number']].apply(lambda args: date.fromisocalendar(args[0], args[1], 1), axis=1)

CodePudding user response：

Use apply to loop over rows (axis=1) and a lambda function that concatenates the two columns as a string and then do exactly the thing you did it above :) Perhaps this wasn't the answer you were looking for thou, since you looking for the most efficent solution. However, this does the job!

df['convert_date']=df.apply(lambda x: datetime.strptime(f"{x.year}-{x.week_number}"   '-1', "%Y-%W-%w"), axis=1)