I would like to create an if statement where, if the date equals the start of any month (2021-01-01, 2021-02-01, 2021-03-01 etc.) then I would like to replace the value in column X to 0. Is this possible or is there a better way to go about this.
This is the dataset that I am working with:
df = [('2021-04-25', 4, 4),
('2021-04-26', 0, 0),
('2021-04-27', 0, 0),
('2021-04-28', 56, 53),
('2021-04-29', 3, 0),
('2021-04-30', 4, 0),
('2021-05-01', 5, 0),
('2021-05-02', 5, 0),
('2021-05-03', 10, 5),
('2021-05-04', 5, 0)]
df = pd.DataFrame(stack_out, columns=['Date', 'X', 'Y'])
desired output:
desired = [('2021-04-25', 4, 4),
('2021-04-26', 0, 0),
('2021-04-27', 0, 0),
('2021-04-28', 56, 53),
('2021-04-29', 3, 0),
('2021-04-30', 4, 0),
('2021-05-01', 0, 0),
('2021-05-02', 5, 0),
('2021-05-03', 10, 5),
('2021-05-04', 5, 0)]
desired = pd.DataFrame(stack_out, columns=['Date', 'X', 'Y'])
CodePudding user response:
Here's a simple one-liner which will do the job:
df.loc[pd.to_datetime(df['Date']).dt.day == 1, 'X'] = 0
Essentially, a filter is applied to the X
column to subset to values where the day is equal to 1. And change these values to zero.
Output:
Date X Y
0 2021-04-25 4 4
1 2021-04-26 0 0
2 2021-04-27 0 0
3 2021-04-28 56 53
4 2021-04-29 3 0
5 2021-04-30 4 0
6 2021-05-01 0 0 <-- Change applied here.
7 2021-05-02 5 0
8 2021-05-03 10 5
9 2021-05-04 5 0
CodePudding user response:
I recommend you to use the iterrows method, which allows you to iterate through the rows of a given dataframe. Let's call your dataframe by df. Assuming the type of your "Date" column is Timestamp. The following works
new_Column_X = list()
for index,row in df.iterrows():
ts = row['Date']
if ts.day==1:
new_Column_X.append(0)
else:
new_Column_X.append(row['X'])
# If wanted you can just replace the old column with the new one
df['New_Column_X'] = new_Column_X
Second Solution: As @S3DEV mentioned you can solve this problem using the apply method (which is probably faster than the first solution).
from pandas import Timestamp
def column_X_parser(ts, x):
if ts.day==1:
return 0
else:
return x
df['New_Column_X'] = df.apply(lambda row: column_X_parser(Timestamp(row['Date']), row['X']),axis=1)