I have this Pandas Data Frame
Months 2022-10 2022-11 2022-12 2023-01 …
2023-01 10 N/A 12 13 …
2022-12 2 14 14 N/A …
2022-11 N/A 11 N/A N/A …
2022-10 12 N/A N/A N/A …
… … … … …
I would like to replace the values inside the "date range" by 0 and outside the "date range" with blanks like this in this example:
Months 2022-10 2022-11 2022-12 2023-01 …
2023-01 10 0 12 13 …
2022-12 2 14 14 …
2022-11 0 11 …
2022-10 12 …
… … … … …
How can I do this with Python Pandas?
CodePudding user response:
Use DataFrame.mask
with mask:
#if necessary
#df = df.replace('N/A', np.nan)
df2 = df.mask(df.bfill(axis=1).notna() & df.isna(), 0)
print (df2)
2022-10 2022-11 2022-12 2023-01
Months
2023-01 10.0 0.0 12.0 13.0
2022-12 2.0 14.0 14.0 NaN
2022-11 0.0 11.0 NaN NaN
2022-10 12.0 NaN NaN NaN
Explanation:
First back filling missing values:
print (df.bfill(axis=1))
2022-10 2022-11 2022-12 2023-01
Months
2023-01 10.0 12.0 12.0 13.0
2022-12 2.0 14.0 14.0 NaN
2022-11 11.0 11.0 NaN NaN
2022-10 12.0 NaN NaN NaN
2022-10 2022-11 2022-12 2023-01
Then test non missing values:
print (df.bfill(axis=1).notna())
2022-10 2022-11 2022-12 2023-01
Months
2023-01 True True True True
2022-12 True True True False
2022-11 True True False False
2022-10 True False False False
2022-10 2022-11 2022-12 2023-01
Chain with testing missing values, so get values for replace by 0
:
print (df.bfill(axis=1).notna() & df.isna())
2022-10 2022-11 2022-12 2023-01
Months
2023-01 False True False False
2022-12 False False False False
2022-11 True False False False
2022-10 False False False False
Another idea with numpy broadcasting is compare columns and index DatatimeIndex
chained with test missing values:
c = pd.to_datetime(df.columns).to_numpy()
r = pd.to_datetime(df.index).to_numpy()
m = (c <= r[:, None]) & df.isna()
print (m)
2022-10 2022-11 2022-12 2023-01
Months
2023-01 False True False False
2022-12 False False False False
2022-11 True False False False
2022-10 False False False False
df1 = df.mask(m, 0)
print (df1)
2022-10 2022-11 2022-12 2023-01
Months
2023-01 10.0 0.0 12.0 13.0
2022-12 2.0 14.0 14.0 NaN
2022-11 0.0 11.0 NaN NaN
2022-10 12.0 NaN NaN NaN
Solutions are different if last values per range are missing values:
print (df)
2022-10 2022-11 2022-12 2023-01
Months
2023-01 10.0 NaN NaN NaN
2022-12 2.0 14.0 14.0 NaN
2022-11 NaN 11.0 NaN NaN
2022-10 NaN NaN NaN NaN
c = pd.to_datetime(df.columns).to_numpy()
r = pd.to_datetime(df.index).to_numpy()
m = (c <= r[:, None]) & df.isna()
df1 = df.mask(m, 0)
print (df1)
2022-10 2022-11 2022-12 2023-01
Months
2023-01 10.0 0.0 0.0 0.0
2022-12 2.0 14.0 14.0 NaN
2022-11 0.0 11.0 NaN NaN
2022-10 0.0 NaN NaN NaN
df2 = df.mask(df.bfill(axis=1).notna() & df.isna(), 0)
print (df2)
2022-10 2022-11 2022-12 2023-01
Months
2023-01 10.0 NaN NaN NaN
2022-12 2.0 14.0 14.0 NaN
2022-11 0.0 11.0 NaN NaN
2022-10 NaN NaN NaN NaN
CodePudding user response:
One option for in place substitution using boolean indexing:
df[df.bfill(axis=1).notna()] = df.fillna(0)
Output:
2022-10 2022-11 2022-12 2023-01
Months
2023-01 10.0 0.0 12.0 13.0
2022-12 2.0 14.0 14.0 NaN
2022-11 0.0 11.0 NaN NaN
2022-10 12.0 NaN NaN NaN