I have a dataframe as follows:
Month Year Value1 Value2 ...
Jul 1.1 2.2
Aug 1.1 2.3
Sep 1.1 2.6
Oct 1.1 2.2
Nov 1.1 2.2
Dec 1.1 2.2
Jan 1.1 2.2
Feb 1.1 2.2
Mar 1.1 2.2
Apr 1.1 2.2
May 1.1 2.2
Jun 1.1 2.2
Jul 1.1 2.2
Aug 1.1 2.2
Sep 1.1 2.2
Oct 1.1 2.2
Nov 1.1 2.2
Dec 1.1 2.2
Jan 1.1 2.2
Feb 1.1 2.2
Mar 1.1 2.2
Apr 1.1 2.2
May 1.1 2.2
Jun 1.1 2.2
Jul 2022 1.1 2.2
The aim is to fill up the year column. Now the last row will always be data for the current year, so I fill up the last row with 2022 and then find the current month, and depending upon that, I am able to fill up the past month till Jan with 2022.
month_val1 = df['Month'].values[-1] # Find Last month value updated
year_val1 = df['Year'].values[-1] # Find the last year value
month_count = month_dict[month_val1] # Find the number of rows to be updated
df['Year'].iloc[-month_count:] = year_val1 # Update number of rows with required year value
But how do I fill up the value of the Year column up to the first row in a backward fashion? Please note there can be any number of rows in the dataframe, hence I can't just keep filling 12 rows with decremented values.
Below is the expected result:
Month Year Value1 Value2 ...
Jul 2020 1.1 2.2
Aug 2020 1.1 2.3
Sep 2020 1.1 2.6
Oct 2020 1.1 2.2
Nov 2020 1.1 2.2
Dec 2020 1.1 2.2
Jan 2021 1.1 2.2
Feb 2021 1.1 2.2
Mar 2021 1.1 2.2
Apr 2021 1.1 2.2
May 2021 1.1 2.2
Jun 2021 1.1 2.2
Jul 2021 1.1 2.2
Aug 2021 1.1 2.2
Sep 2021 1.1 2.2
Oct 2021 1.1 2.2
Nov 2021 1.1 2.2
Dec 2021 1.1 2.2
Jan 2022 1.1 2.2
Feb 2022 1.1 2.2
Mar 2022 1.1 2.2
Apr 2022 1.1 2.2
May 2022 1.1 2.2
Jun 2022 1.1 2.2
Jul 2022 1.1 2.2
Any help is appreciated.
CodePudding user response:
Assuming the year changes when a month shifts from a given value to a previous month:
from calendar import month_abbr
# map month abbreviations to number
d = {m: i for i,m in enumerate(month_abbr)}
# reverse series, map month number
# if difference is > 0 we changed year
# use cumsum to propagate to previous rows
s = df.loc[::-1, 'Month'].map(d).diff().gt(0).cumsum()
# subtract correction from max year
df['Year'] = s.rsub(df['Year'].max()).astype(int)
NB. This should work even with missing months.
Output:
Month Year Value1 Value2
0 Jul 2020 1.1 2.2
1 Aug 2020 1.1 2.3
2 Sep 2020 1.1 2.6
3 Oct 2020 1.1 2.2
4 Nov 2020 1.1 2.2
5 Dec 2020 1.1 2.2
6 Jan 2021 1.1 2.2
7 Feb 2021 1.1 2.2
8 Mar 2021 1.1 2.2
9 Apr 2021 1.1 2.2
10 May 2021 1.1 2.2
11 Jun 2021 1.1 2.2
12 Jul 2021 1.1 2.2
13 Aug 2021 1.1 2.2
14 Sep 2021 1.1 2.2
15 Oct 2021 1.1 2.2
16 Nov 2021 1.1 2.2
17 Dec 2021 1.1 2.2
18 Jan 2022 1.1 2.2
19 Feb 2022 1.1 2.2
20 Mar 2022 1.1 2.2
21 Apr 2022 1.1 2.2
22 May 2022 1.1 2.2
23 Jun 2022 1.1 2.2
24 Jul 2022 1.1 2.2
CodePudding user response:
Find the year groups, then find the relative year numbers (in reverse) and then add the last year to the relative year numbers to get the actual year number:
df['Year'] = (df['Year'].max() - df['Month'].eq('Dec')[::-1].cumsum()).astype(int)
print(df)
Result
Month Year Value1 Value2
0 Jul 2020 1.1 2.2
1 Aug 2020 1.1 2.3
2 Sep 2020 1.1 2.6
3 Oct 2020 1.1 2.2
4 Nov 2020 1.1 2.2
5 Dec 2020 1.1 2.2
6 Jan 2021 1.1 2.2
7 Feb 2021 1.1 2.2
8 Mar 2021 1.1 2.2
9 Apr 2021 1.1 2.2
10 May 2021 1.1 2.2
11 Jun 2021 1.1 2.2
12 Jul 2021 1.1 2.2
13 Aug 2021 1.1 2.2
14 Sep 2021 1.1 2.2
15 Oct 2021 1.1 2.2
16 Nov 2021 1.1 2.2
17 Dec 2021 1.1 2.2
18 Jan 2022 1.1 2.2
19 Feb 2022 1.1 2.2
20 Mar 2022 1.1 2.2
21 Apr 2022 1.1 2.2
22 May 2022 1.1 2.2
23 Jun 2022 1.1 2.2
24 Jul 2022 1.1 2.2