I am trying to create a new column "Starting_time" by subtracting 60 days out of "Harvest_date" but I get the same date each time. Can someone point out what did I do wrong please?
Harvest_date |
---|
20.12.21 |
12.01.21 |
10.03.21 |
import pandas as pd
from datetime import timedelta
df1 = pd.read_csv (r'C:\Flower_weight.csv')
def subtract_days_from_date(date, days):
subtracted_date = pd.to_datetime(date) - timedelta(days=days)
subtracted_date = subtracted_date.strftime("%Y-%m-%d")
return subtracted_date
df1['Harvest_date'] = pd.to_datetime(df1.Harvest_date)
df1.style.format({"Harvest_date": lambda t: t.strftime("%Y-%m-%d")})
for harvest_date in df1['Harvest_date']:
df1["Starting_date"]=subtract_days_from_date(harvest_date,60)
print(df1["Starting_date"])
Starting_date |
---|
2021-10-05 |
2021-10-05 |
2021-10-05 |
CodePudding user response:
I am not sure if the use of the loop was necessary here. Perhaps try the following:
df1_dates['Starting_date'] = df1_dates['Harvest_date'].apply(lambda x: pd.to_datetime(x) - timedelta(days=60))
df1_dates['Starting_date'].dt.strftime("%Y-%m-%d")
df1_dates['Starting_date']
CodePudding user response:
You're overwriting the series on each iteration of the last loop
for harvest_date in df1['Harvest_date']:
df1["Starting_date"]=subtract_days_from_date(harvest_date,60)
You can do away with the loop by vectorizing the subtract_days_from_date
function.
You could also reference an index with enumerate
np.vectorize
import numpy as np
subtract_days_from_date = np.vectorize(subtract_days_from_date)
df1["Starting_date"]=subtract_days_from_date(df1["Harvest_date"], 60)
enumerate
for idx, harvest_date in enumerate(df1['Harvest_date']):
df1.iloc[idx][ "Starting_date"]=subtract_days_from_date(harvest_date,60)