Home > other >  Date validation in CSV file
Date validation in CSV file

Time:12-15

I need to validate DOB field in a CSV and remove the invalid data from the field. The expected DOB format is YYYY-MM-DD only Please see the below source file and the expected output. I'm expecting AWK command to solve this issue.

name,dob
pater,2022-12-10
john,1900-10-23
cader,apr 10 12056
tina,2020-maple road
mike,2019-01-35
carl,2010-03-18 new york
anne,hi how are you?

I need to clean the 2nd column as the DOB field. Note: in some rows, there are other text available in the DOB field and for such occurrences I need to keep only the valid date removing other text(ex: row 6)

Expected output

name,dob
pater,2022-12-10
john,1900-10-23
cader,
tina,
mike,
carl,2010-03-18
anne,

CodePudding user response:

I was able to achieve this task by using the below command

awk 'BEGIN{FS=OFS=","}{$2=match($2,/[0-9]{4}-(0[1-9]|1[0-2])-(?:[0-9]|[12][0-9]|3[01])/)?substr($2,RSTART,RLENGTH):"";print}' input.csv > output.csv

CodePudding user response:

Something like this might work

awk -F "," '{ if ($2 ~ /^[0-9]{4}-[0-9]{2}-[0-9]{2}(.*)$/) print $1 "," $2; else print $1 "," }' input.csv > output.csv
  • Related