let's assume the following dataframe and shift operation:
d = {'col1': ['2022-01-01','2022-02-01','2022-03-01','2022-05-01'], 'col2': [1,2,3,4]}
df = pd.DataFrame(d)
df['shifted'] = df['col2'].shift(1, fill_value=0)
I want to create a column containing the values of the month before and filling it up for months which do not exist with 0, so the desired result would look like:
col1 | col2 | shifted |
---|---|---|
2022-01-01 | 1 | 0 |
2022-02-01 | 2 | 1 |
2022-03-01 | 3 | 2 |
2022-05-01 | 4 | 0 |
So in the last line the value is 0 because there is no data for April.
But at the moment it looks like this:
col1 | col2 | shifted |
---|---|---|
2022-01-01 | 1 | 0 |
2022-02-01 | 2 | 1 |
2022-03-01 | 3 | 2 |
2022-05-01 | 4 | 3 |
Does anyone know how to achieve this?
CodePudding user response:
One idea is create month PeriodIndex
, so possible shift by months, last replace missing values:
df = df.set_index(pd.to_datetime(df['col1']).dt.to_period('m'))
df['shifted'] = df['col2'].shift(1, freq='m').reindex(df.index, fill_value=0)
print (df)
col1 col2 shifted
col1
2022-01 2022-01-01 1 0
2022-02 2022-02-01 2 1
2022-03 2022-03-01 3 2
2022-05 2022-05-01 4 0
Last is possible remove PeriodIndex:
df = df.reset_index(drop=True)
print (df)
col1 col2 shifted
0 2022-01-01 1 0
1 2022-02-01 2 1
2 2022-03-01 3 2
3 2022-05-01 4 0