I have the existing code that works great to an extent. All I am trying to add in the process is every time that there's a NaN value for a given month, copy the value of the most recent previous month but only for the same year.
If for a given year there's only 1 value then all 12 months should have that same value and if all 12 are NaN, they should remain NaN.
Existing code:
data_list = []
for f in excel_files:
df = pd.read_excel(f, header = None)
new_col_coords = [[6,1], [7,1], [3,1], [1,0], [0,0]]
new_columns = [x for x in [df.iloc[i, j] for i, j in new_col_coords]]
header_row = 11
df = df.T.set_index(header_row).T
df = df.reset_index(drop = True).iloc[header_row:].reset_index(drop = True)
df.columns.names = [None]
df[['Region-INF', 'Series Name', 'Series ID']] = new_columns
df = df[['Region-INF', 'Series Name', 'Series ID', 'View Description', 'Index Type', 'Year',
'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']]
Input sample data
Region-INF Series Name Series ID Jan Feb Mar Apr May Jun
Pacific All Items CUUR0490SDD 2.9 2.8 NaN NaN 2.52 2.04
Pacific All Items CUE07890SDF NaN 2.64 NaN 2.44 2.59 3
Pacific All Items CUE073310SAF 2.1 2.4 NaN 2.21 3.45 NaN
Expected output:
Region-INF Series Name Series ID Jan Feb Mar Apr May Jun
Pacific All Items CUUR0490SDD 2.9 2.8 2.8 2.8 2.52 2.04
Pacific All Items CUE07890SDF 3 2.64 2.64 2.44 2.59 3
Pacific All Items CUE073310SAF 2.1 2.4 2.4 2.21 3.45 3.45
CodePudding user response:
I hope I've understood your' question right:
df.loc[df['Jan'].isna(), 'Jan'] = df[df['Jan'].isna()].apply(lambda x: x[x.notna()][-1], axis=1)
df.loc[:, 'Jan':] = df.loc[:, 'Jan':].ffill(axis=1)
print(df)
Prints:
Region-INF Series Name Series ID Jan Feb Mar Apr May Jun
0 Pacific All Items CUUR0490SDD 2.9 2.80 2.80 2.80 2.52 2.04
1 Pacific All Items CUE07890SDF 3.0 2.64 2.64 2.44 2.59 3.00
2 Pacific All Items CUE073310SAF 2.1 2.40 2.40 2.21 3.45 3.45