How do I get the difference between 2 dates from a csv file?-CodePudding

I have the Join Date and Left Date for multiple staff in a company in my csv file. I would like to calculate the difference for these 2 dates for each staff, and then find the total, average days, min and max. Not sure how to even get started...

Some Left Dates may be empty, and I would like to fill those empty dates with 1 May 2022, and then find the difference.

df[['Staff', 'JoinDate', 'LeftDate']] df['LeftDate'].fillna(value = '1/05/2022')

I have tried to replace the NaN values with a set date but that doesn't work as well.

CodePudding user response：

A rough draft of what I think you want. I use dateutil to make this easier.

cat dt_csv
id,dt_start,dt_end
1,2022-01-01,2022-06-30
2,2022-02-15,
3,2022-03-15,2022-07-31
4,2021-11-14,2022-04-15

import csv
from dateutil.parser import parse 
from statistics import mean  

with open('dt.csv') as csv_file:
    diff_list = []
    d_reader = csv.DictReader(csv_file)
    for row in d_reader:
        # Use strip() to reduce any values that are ' ' or more down  to '' so the or works.
        dt_start, dt_end = parse(row["dt_start"]), parse(row["dt_end"].strip() or "2022-05-01")
        dt_diff = dt_end - dt_start
        diff_list.append(dt_diff.days)
    # Sort the list to make it easier to get min and max.
    diff_list.sort()

diff_list                                                                                                                                                                 
[75, 138, 152, 180]

avg_diff = mean(diff_list)
min_diff = diff_list[0]
max_diff = diff_list[-1]
total_diff = sum(diff_list) 

avg_diff, min_diff, max_diff, total_diff                                                                                                                                  
(136.25, 75, 180, 545)