Home > Software engineering >  Python add missing values to index
Python add missing values to index

Time:09-09

Having the following DF :

Index   Date
1D      9/13/2022
1W      9/19/2022
2W      9/26/2022
3W      10/3/2022
1M      10/12/2022
2M      11/14/2022
3M      12/12/2022
4M      1/12/2023
5M      2/13/2023
6M      3/13/2023
7M      4/12/2023
8M      5/12/2023
9M      6/12/2023
10M     7/12/2023
11M     8/14/2023
12M     9/12/2023
18M     3/12/2024
2Y      9/12/2024
3Y      9/12/2025
4Y      9/14/2026
5Y      9/13/2027
6Y      9/12/2028
7Y      9/12/2029
8Y      9/12/2030
9Y      9/12/2031
10Y     9/13/2032
12Y     9/12/2034
15Y     9/14/2037
20Y     9/12/2042

The idea would be to do a loop, and to do :

if DF.index[i][-1] == 'Y':
   if int(self.dfcurve.index[i 1][:-1])-int(self.dfcurve.index[i][:-1])!= 1:
      ###Add Missing Index:
      index_val = int(self.dfcurve.index[i][:-1])  1
      index_val = str(index_val) 'Y'
      ### Example of missing index : 
      ## 11Y
      ## 13Y
      ## 14Y
      ## 16Y
      ## 17Y
      ## 18Y
      ## 19Y

But I don't know how to add the index in list at the right place. The final DF would be :

Index   Date
1D      9/13/2022
1W      9/19/2022
2W      9/26/2022
3W      10/3/2022
1M      10/12/2022
2M      11/14/2022
3M      12/12/2022
4M      1/12/2023
5M      2/13/2023
6M      3/13/2023
7M      4/12/2023
8M      5/12/2023
9M      6/12/2023
10M     7/12/2023
11M     8/14/2023
12M     9/12/2023
18M     3/12/2024
2Y      9/12/2024
3Y      9/12/2025
4Y      9/14/2026
5Y      9/13/2027
6Y      9/12/2028
7Y      9/12/2029
8Y      9/12/2030
9Y      9/12/2031
10Y     9/13/2032
11Y     NA
12Y     9/12/2034
13Y     NA
14Y     NA
15Y     9/14/2037
16Y     NA
17Y     NA
18Y     NA
19Y     NA
20Y     9/12/2042

CodePudding user response:

Use:

#filter Y index values
m = df.index.str.endswith('Y')
#processing only years
df1 = df[m].copy()

#extract numbers to index
df1.index = df1.index.str.extract(r'(\d )', expand=False).astype(int)

#reindex by range for append missing rows
df1 = df1.reindex(range(df1.index.min(), df1.index.max() 1)).rename(index=str)
#added Y substring
df1.index  = 'Y'

print (df1)
            Date
Index           
2Y     9/12/2024
3Y     9/12/2025
4Y     9/14/2026
5Y     9/13/2027
6Y     9/12/2028
7Y     9/12/2029
8Y     9/12/2030
9Y     9/12/2031
10Y    9/13/2032
11Y          NaN
12Y    9/12/2034
13Y          NaN
14Y          NaN
15Y    9/14/2037
16Y          NaN
17Y          NaN
18Y          NaN
19Y          NaN
20Y    9/12/2042

#remove Y original rows from Dataframe and append new Y rows
df = pd.concat([df[~m], df1])
print (df)

First solution add to all categories missing values:

df.index = pd.MultiIndex.from_frame(df.index.str.extract(r'(\d )(\D )', expand=True))

f = lambda x: x.reindex(range(x.index.min(), x.index.max() 1))
df = df.reset_index(1).rename(index=int).groupby(1).apply(f).drop(1, axis=1)
df.index = df.index.map(lambda x: f'{x[0]}{x[1]}')

print (df)
           Date
D1    9/13/2022
M1   10/12/2022
M2   11/14/2022
M3   12/12/2022
  • Related