Home > Mobile >  Pandas replace in Data frame values which are contains in specific range
Pandas replace in Data frame values which are contains in specific range

Time:01-12

I have this Pandas Data Frame

Months  2022-10 2022-11 2022-12 2023-01   …
2023-01   10      N/A     12       13     …
2022-12   2       14      14       N/A    …
2022-11   N/A     11      N/A      N/A    …
2022-10   12      N/A     N/A      N/A    …
…         …       …        …       …

I would like to replace the values inside the "date range" by 0 and outside the "date range" with blanks like this in this example:

Months  2022-10 2022-11 2022-12 2023-01   …
2023-01   10      0       12       13     …
2022-12   2       14      14              …
2022-11   0       11                      …
2022-10   12                              …
…         …       …        …       …

How can I do this with Python Pandas?

CodePudding user response:

Use DataFrame.mask with mask:

#if necessary
#df = df.replace('N/A', np.nan)

df2 = df.mask(df.bfill(axis=1).notna() & df.isna(), 0)
print (df2)
         2022-10  2022-11  2022-12  2023-01
Months                                     
2023-01     10.0      0.0     12.0     13.0
2022-12      2.0     14.0     14.0      NaN
2022-11      0.0     11.0      NaN      NaN
2022-10     12.0      NaN      NaN      NaN

Explanation:

First back filling missing values:

print (df.bfill(axis=1))
         2022-10  2022-11  2022-12  2023-01
Months                                     
2023-01     10.0     12.0     12.0     13.0
2022-12      2.0     14.0     14.0      NaN
2022-11     11.0     11.0      NaN      NaN
2022-10     12.0      NaN      NaN      NaN
         2022-10  2022-11  2022-12  2023-01

Then test non missing values:

print (df.bfill(axis=1).notna())
         2022-10  2022-11  2022-12  2023-01
Months                                     
2023-01     True     True     True     True
2022-12     True     True     True    False
2022-11     True     True    False    False
2022-10     True    False    False    False
         2022-10  2022-11  2022-12  2023-01

Chain with testing missing values, so get values for replace by 0:

print (df.bfill(axis=1).notna() & df.isna())
         2022-10  2022-11  2022-12  2023-01
Months                                     
2023-01    False     True    False    False
2022-12    False    False    False    False
2022-11     True    False    False    False
2022-10    False    False    False    False

Another idea with numpy broadcasting is compare columns and index DatatimeIndex chained with test missing values:

c = pd.to_datetime(df.columns).to_numpy()
r = pd.to_datetime(df.index).to_numpy()

m = (c <= r[:, None]) & df.isna()
print (m)
         2022-10  2022-11  2022-12  2023-01
Months                                     
2023-01    False     True    False    False
2022-12    False    False    False    False
2022-11     True    False    False    False
2022-10    False    False    False    False

df1 = df.mask(m, 0)
print (df1)
         2022-10  2022-11  2022-12  2023-01
Months                                     
2023-01     10.0      0.0     12.0     13.0
2022-12      2.0     14.0     14.0      NaN
2022-11      0.0     11.0      NaN      NaN
2022-10     12.0      NaN      NaN      NaN

Solutions are different if last values per range are missing values:

print (df)
         2022-10  2022-11  2022-12  2023-01
Months                                     
2023-01     10.0      NaN      NaN      NaN
2022-12      2.0     14.0     14.0      NaN
2022-11      NaN     11.0      NaN      NaN
2022-10      NaN      NaN      NaN      NaN


c = pd.to_datetime(df.columns).to_numpy()
r = pd.to_datetime(df.index).to_numpy()

m = (c <= r[:, None]) & df.isna()
df1 = df.mask(m, 0)
print (df1)
         2022-10  2022-11  2022-12  2023-01
Months                                     
2023-01     10.0      0.0      0.0      0.0
2022-12      2.0     14.0     14.0      NaN
2022-11      0.0     11.0      NaN      NaN
2022-10      0.0      NaN      NaN      NaN

df2 = df.mask(df.bfill(axis=1).notna() & df.isna(), 0)
print (df2)
         2022-10  2022-11  2022-12  2023-01
Months                                     
2023-01     10.0      NaN      NaN      NaN
2022-12      2.0     14.0     14.0      NaN
2022-11      0.0     11.0      NaN      NaN
2022-10      NaN      NaN      NaN      NaN

CodePudding user response:

One option for in place substitution using boolean indexing:

df[df.bfill(axis=1).notna()] = df.fillna(0)

Output:

         2022-10  2022-11  2022-12  2023-01
Months                                     
2023-01     10.0      0.0     12.0     13.0
2022-12      2.0     14.0     14.0      NaN
2022-11      0.0     11.0      NaN      NaN
2022-10     12.0      NaN      NaN      NaN
  • Related