I have the Join Date and Left Date for multiple staff in a company in my csv file. I would like to calculate the difference for these 2 dates for each staff, and then find the total, average days, min and max. Not sure how to even get started...
Some Left Dates may be empty, and I would like to fill those empty dates with 1 May 2022, and then find the difference.
df[['Staff', 'JoinDate', 'LeftDate']] df['LeftDate'].fillna(value = '1/05/2022')
I have tried to replace the NaN values with a set date but that doesn't work as well.
CodePudding user response:
A rough draft of what I think you want. I use dateutil to make this easier.
cat dt_csv
id,dt_start,dt_end
1,2022-01-01,2022-06-30
2,2022-02-15,
3,2022-03-15,2022-07-31
4,2021-11-14,2022-04-15
import csv
from dateutil.parser import parse
from statistics import mean
with open('dt.csv') as csv_file:
diff_list = []
d_reader = csv.DictReader(csv_file)
for row in d_reader:
# Use strip() to reduce any values that are ' ' or more down to '' so the or works.
dt_start, dt_end = parse(row["dt_start"]), parse(row["dt_end"].strip() or "2022-05-01")
dt_diff = dt_end - dt_start
diff_list.append(dt_diff.days)
# Sort the list to make it easier to get min and max.
diff_list.sort()
diff_list
[75, 138, 152, 180]
avg_diff = mean(diff_list)
min_diff = diff_list[0]
max_diff = diff_list[-1]
total_diff = sum(diff_list)
avg_diff, min_diff, max_diff, total_diff
(136.25, 75, 180, 545)