Home > database >  Tricky conversion based on specific conditions within Pandas
Tricky conversion based on specific conditions within Pandas

Time:12-15

I have a dataframe when where the date matches the date of a specific column, I wish to convert all 0's to blanks per area category for specific columns.

Data

date        start       end         area    stat1   stat2   stat3   final
10/1/2021   11/1/2021   12/1/2021   NY      5       0       0       11/1/2021
11/1/2021   12/1/2021   1/1/2022    NY      19      0       0       11/1/2021
12/1/2021   1/1/2022    2/1/2022    NY      10      0       0       11/1/2021
1/1/2022    2/1/2022    3/1/2022    NY      1       0       0       11/1/2021
10/1/2021   11/1/2021   12/1/2021   CA      1       0       0       11/1/2021
11/1/2021   12/1/2021   1/1/2022    CA      3       0       0       11/1/2021
12/1/2021   1/1/2022    2/1/2022    CA      3       0       0       11/1/2021
1/1/2022    2/1/2022    3/1/2022    CA      2       0       0       11/1/2021
                        
                        

Desired

date        start       end         area    stat1   stat2   stat3   final
10/1/2021   11/1/2021   12/1/2021   NY      5       0       0       11/1/2021
11/1/2021   12/1/2021   1/1/2022    NY      19                      11/1/2021
12/1/2021   1/1/2022    2/1/2022    NY      10                      11/1/2021
1/1/2022    2/1/2022    3/1/2022    NY      1                       11/1/2021
10/1/2021   11/1/2021   12/1/2021   CA      1       0       0       11/1/2021
11/1/2021   12/1/2021   1/1/2022    CA      3                       11/1/2021
12/1/2021   1/1/2022    2/1/2022    CA      3                       11/1/2021
1/1/2022    2/1/2022    3/1/2022    CA      2                       11/1/2021

logic

above we want to convert all zeros to blanks just for columns [stat2] and [stat3] where the date in the [date] column is == '11/01/2021' or greater.

Doing

I am thinking that I must groupby and create a subset and then perform the conversion:

df1 = df.groupby(['date', 'area'], as_index=False
df1[df1.eq(0)] = np.nan

Any suggestion is appreciated.

CodePudding user response:

conditionally execute, using np.where in list squares

df.update(pd.DataFrame(np.transpose([np.where(pd.to_datetime(df['date'],dayfirst=False).ge('2021-11-01'),'','0') for x in df.filter(regex='stat2|stat3').columns]),columns=['stat2', 'stat3']))

or

df[['stat2', 'stat3']] =pd.DataFrame([np.where(pd.to_datetime(df['date'],dayfirst=False).ge('2021-11-01'),'','0') for x in df.filter(regex='stat2|stat3').columns]).T

Or

if easier go the conventional way

s =pd.to_datetime(df['date'],dayfirst=False).ge('2021-11-01')


df = df.assign(
               stat2= np.where(s,'','0'),
               stat3 = np.where(s,'','0'))
  • Related