Using np.where to increase Year & Week in prediction column-CodePudding

I am working to create a target column that includes the following week's number and year. I have the beginning of this complete, but I am struggling to account for week 52 and the need to reset the week to 1 and add 1 to the year.

Here is the coding for the YEAR_WEEK-targ column that is running successfully.

df_calls_subweek_target['YEAR_WEEK_targ'] = df_calls_subweek_target['YEAR'].astype(str)   "_"   (df_calls_subweek_target['WEEK'] 1).astype(str)

YEAR_WEEK	YEAR_WEEK_targ
2008_38	2008_39
2008_39	2008_40

I am trying to figure out how to code for the first week of the year. Using np.where makes sense to me, but as you can see, I am not an expert.

df_calls_subweek_target['YEAR_WEEK_targ'] = np.where(df_calls_subweek_target['YEAR_WEEK_targ'] == 52,   (df_calls_subweek_target['YEAR'] 1).astype(str)   "_"   (df_calls_subweek_target['WEEK'] ==1).astype(str))

So, need to figure out how to say when week = 52, increase year 1 and week = 1.

Looking for any recommendations on how to approach this case!

CodePudding user response：

def function1(ss:pd.Series):
    ss=ss.astype(int)
    if ss.col1==52:
        ss.col1=1
        ss.col0=ss.col0 1
    return "{}_{}".format(ss.col0,ss.col1)

df1['YEAR_WEEK_targ'] = df1.YEAR_WEEK.str.split("_",expand=True).add_prefix("col").apply(function1,axis=1)
df1

out

 YEAR_WEEK YEAR_WEEK_targ
0   2008_38        2008_38
1   2008_39        2008_39
2   2008_52         2009_1

CodePudding user response：

A tricky solution would be to mask the values where week=52 and replace them by 0.

s = df["WEEK"].mask(df["WEEK"].eq(52)).fillna(0, downcast="infer").add(1)

df["YEAR_WEEK_targ"] = df['YEAR'].astype(str)   "_"   s.astype(str).str.zfill(2)

An alternative, you can also use numpy.where:

df["YEAR_WEEK_targ"] = np.where(df["WEEK"].eq(52),
                                df['YEAR'].astype(str)   "_01",
                                df['YEAR'].astype(str)   "_"   df["WEEK"].astype(str))

NB : For the sake of clarity, I used (df) instead of (df_calls_subweek_target)

Output :

print(df)

   YEAR  WEEK YEAR_WEEK_targ
0  2008    38        2008_39
1  2008    39        2008_40
2  2008    52        2008_01