Update and replace values in columns based on conditions in Python-CodePudding

I wish to update and replace values based on the dates within my dataframe, while removing data in other specific columns.

Data

id  date        location    status  value1  value2
CC  1/1/2022    ny          new     12      1
CC  4/1/2022    ny          new     1       1
CC  7/1/2022    ny          new     1       1
CC  10/1/2022   ny          new     1       2
CC  1/1/2023    ny          ok      1       2
CC  4/1/2023    ny          ok      1       2
CC  7/1/2023    ny          ok      1       3
CC  10/1/2023   ny          ok      1       3
BB  1/1/2022    ca          new     1       3
BB  4/1/2022    ca          new     1       3
BB  7/1/2022    ca          new     1       3
BB  10/1/2022   ca          new     12      3
BB  1/1/2023    ca          new     2       3
BB  4/1/2023    ca          new     2       3
BB  7/1/2023    ca          new     2       3
BB  10/1/2023   ca          new     2       3

Desired

id  date        location    status  value1  value2
CC  1/1/2022    ny          open        
CC  4/1/2022    ny          open        
CC  7/1/2022    ny          open        
CC  10/1/2022   ny          new     1       2
CC  1/1/2023    ny          ok      1       2
CC  4/1/2023    ny          ok      1       2
CC  7/1/2023    ny          ok      1       3
CC  10/1/2023   ny          ok      1       3
BB  1/1/2022    ca          new     1       3
BB  4/1/2022    ca          new     1       3
BB  7/1/2022    ca          new     1       3
BB  10/1/2022   ca          new     12      3
BB  1/1/2023    ca          new     2       3
BB  4/1/2023    ca          new     2       3
BB  7/1/2023    ca          new     2       3
BB  10/1/2023   ca          new     2       3

Doing

df.loc[(df.id == 'cc') & (df.date <= '07/01/2022'), 'status']= 'open'

This labels all of the dates as open and does not remove the values in the other columns. Any suggestion is appreciated. Thank you for any suggestions.

CodePudding user response：

Unfortunately, snapping a cell out of existence does not seem to work with Pandas. Similarly, Pandas expects a value for each cell of every column when setting up a dataframe.

Therefore, nan (not a number) seems to be the exact placeholder appropriate for your case. In turn, consider, importing numpy as np and adding the line to set the respective entry to np.nan

df.loc[(df.id == 'cc') & (df.date <= '07/01/2022'), 'value1']= np.nan

Fortunately,

df.fillna("")

prints the Pandas frame without showing those annoying NAN entries but leaves the cells 'empty' as you seem to desire.

In addition, NumPy enables the use of aggregate functions to ignore nan values such as np.nanmean() as can be found here to not break computation on such tables.