Home > Software design >  Filling up missing values based on the year value in recursive manner
Filling up missing values based on the year value in recursive manner

Time:09-27

I have a dataframe as follows:

Month Year Value1 Value2 ...
Jul         1.1     2.2
Aug         1.1     2.3
Sep         1.1     2.6
Oct         1.1     2.2
Nov         1.1     2.2
Dec         1.1     2.2
Jan         1.1     2.2
Feb         1.1     2.2
Mar         1.1     2.2
Apr         1.1     2.2
May         1.1     2.2
Jun         1.1     2.2
Jul         1.1     2.2
Aug         1.1     2.2
Sep         1.1     2.2
Oct         1.1     2.2
Nov         1.1     2.2
Dec         1.1     2.2
Jan         1.1     2.2
Feb         1.1     2.2
Mar         1.1     2.2
Apr         1.1     2.2
May         1.1     2.2
Jun         1.1     2.2
Jul 2022    1.1     2.2

The aim is to fill up the year column. Now the last row will always be data for the current year, so I fill up the last row with 2022 and then find the current month, and depending upon that, I am able to fill up the past month till Jan with 2022.

month_val1 = df['Month'].values[-1] # Find Last month value updated
year_val1 = df['Year'].values[-1] # Find the last year value 
month_count = month_dict[month_val1] # Find the number of rows to be updated
df['Year'].iloc[-month_count:] = year_val1 # Update number of rows with required year value

But how do I fill up the value of the Year column up to the first row in a backward fashion? Please note there can be any number of rows in the dataframe, hence I can't just keep filling 12 rows with decremented values.

Below is the expected result:

Month Year Value1 Value2 ...
Jul   2020  1.1     2.2
Aug   2020  1.1     2.3
Sep   2020  1.1     2.6
Oct   2020  1.1     2.2
Nov   2020  1.1     2.2
Dec   2020  1.1     2.2
Jan   2021  1.1     2.2
Feb   2021  1.1     2.2
Mar   2021  1.1     2.2
Apr   2021  1.1     2.2
May   2021  1.1     2.2
Jun   2021  1.1     2.2
Jul   2021  1.1     2.2
Aug   2021  1.1     2.2
Sep   2021  1.1     2.2
Oct   2021  1.1     2.2
Nov   2021  1.1     2.2
Dec   2021  1.1     2.2
Jan   2022  1.1     2.2
Feb   2022  1.1     2.2
Mar   2022  1.1     2.2
Apr   2022  1.1     2.2
May   2022  1.1     2.2
Jun   2022  1.1     2.2
Jul   2022  1.1     2.2

Any help is appreciated.

CodePudding user response:

Assuming the year changes when a month shifts from a given value to a previous month:

from calendar import month_abbr

# map month abbreviations to number
d = {m: i for i,m in enumerate(month_abbr)}

# reverse series, map month number
# if difference is > 0 we changed year
# use cumsum to propagate to previous rows 
s = df.loc[::-1, 'Month'].map(d).diff().gt(0).cumsum()

# subtract correction from max year
df['Year'] = s.rsub(df['Year'].max()).astype(int)

NB. This should work even with missing months.

Output:

   Month  Year  Value1  Value2
0    Jul  2020     1.1     2.2
1    Aug  2020     1.1     2.3
2    Sep  2020     1.1     2.6
3    Oct  2020     1.1     2.2
4    Nov  2020     1.1     2.2
5    Dec  2020     1.1     2.2
6    Jan  2021     1.1     2.2
7    Feb  2021     1.1     2.2
8    Mar  2021     1.1     2.2
9    Apr  2021     1.1     2.2
10   May  2021     1.1     2.2
11   Jun  2021     1.1     2.2
12   Jul  2021     1.1     2.2
13   Aug  2021     1.1     2.2
14   Sep  2021     1.1     2.2
15   Oct  2021     1.1     2.2
16   Nov  2021     1.1     2.2
17   Dec  2021     1.1     2.2
18   Jan  2022     1.1     2.2
19   Feb  2022     1.1     2.2
20   Mar  2022     1.1     2.2
21   Apr  2022     1.1     2.2
22   May  2022     1.1     2.2
23   Jun  2022     1.1     2.2
24   Jul  2022     1.1     2.2

CodePudding user response:

Find the year groups, then find the relative year numbers (in reverse) and then add the last year to the relative year numbers to get the actual year number:

df['Year'] = (df['Year'].max() - df['Month'].eq('Dec')[::-1].cumsum()).astype(int)
print(df)

Result

   Month  Year  Value1  Value2
0    Jul  2020     1.1     2.2
1    Aug  2020     1.1     2.3
2    Sep  2020     1.1     2.6
3    Oct  2020     1.1     2.2
4    Nov  2020     1.1     2.2
5    Dec  2020     1.1     2.2
6    Jan  2021     1.1     2.2
7    Feb  2021     1.1     2.2
8    Mar  2021     1.1     2.2
9    Apr  2021     1.1     2.2
10   May  2021     1.1     2.2
11   Jun  2021     1.1     2.2
12   Jul  2021     1.1     2.2
13   Aug  2021     1.1     2.2
14   Sep  2021     1.1     2.2
15   Oct  2021     1.1     2.2
16   Nov  2021     1.1     2.2
17   Dec  2021     1.1     2.2
18   Jan  2022     1.1     2.2
19   Feb  2022     1.1     2.2
20   Mar  2022     1.1     2.2
21   Apr  2022     1.1     2.2
22   May  2022     1.1     2.2
23   Jun  2022     1.1     2.2
24   Jul  2022     1.1     2.2
  • Related