I wish to update and replace values based on the dates within my dataframe, while removing data in other specific columns.
Data
id date location status value1 value2
CC 1/1/2022 ny new 12 1
CC 4/1/2022 ny new 1 1
CC 7/1/2022 ny new 1 1
CC 10/1/2022 ny new 1 2
CC 1/1/2023 ny ok 1 2
CC 4/1/2023 ny ok 1 2
CC 7/1/2023 ny ok 1 3
CC 10/1/2023 ny ok 1 3
BB 1/1/2022 ca new 1 3
BB 4/1/2022 ca new 1 3
BB 7/1/2022 ca new 1 3
BB 10/1/2022 ca new 12 3
BB 1/1/2023 ca new 2 3
BB 4/1/2023 ca new 2 3
BB 7/1/2023 ca new 2 3
BB 10/1/2023 ca new 2 3
Desired
id date location status value1 value2
CC 1/1/2022 ny open
CC 4/1/2022 ny open
CC 7/1/2022 ny open
CC 10/1/2022 ny new 1 2
CC 1/1/2023 ny ok 1 2
CC 4/1/2023 ny ok 1 2
CC 7/1/2023 ny ok 1 3
CC 10/1/2023 ny ok 1 3
BB 1/1/2022 ca new 1 3
BB 4/1/2022 ca new 1 3
BB 7/1/2022 ca new 1 3
BB 10/1/2022 ca new 12 3
BB 1/1/2023 ca new 2 3
BB 4/1/2023 ca new 2 3
BB 7/1/2023 ca new 2 3
BB 10/1/2023 ca new 2 3
Doing
df.loc[(df.id == 'cc') & (df.date <= '07/01/2022'), 'status']= 'open'
This labels all of the dates as open and does not remove the values in the other columns. Any suggestion is appreciated. Thank you for any suggestions.
CodePudding user response:
Unfortunately, snapping a cell out of existence does not seem to work with Pandas. Similarly, Pandas expects a value for each cell of every column when setting up a dataframe.
Therefore, nan
(not a number) seems to be the exact placeholder appropriate for your case. In turn, consider, importing numpy as np
and adding the line to set the respective entry to np.nan
df.loc[(df.id == 'cc') & (df.date <= '07/01/2022'), 'value1']= np.nan
Fortunately,
df.fillna("")
prints the Pandas frame without showing those annoying NAN
entries but leaves the cells 'empty' as you seem to desire.
In addition, NumPy enables the use of aggregate functions to ignore nan values such as np.nanmean()
as can be found here to not break computation on such tables.