Home > Software engineering >  Delete rows that have duplicate column value in CSV file
Delete rows that have duplicate column value in CSV file

Time:11-21

I have a CSV file that contains duplicate data in columns, for example :

Field1;Field2;Field3;Field4;Field5
alpha;15;16;delta;delta
alpha;15;15;delta;kappa
alpha;15;15;delta;delta
alpha;15;16;delta;kappa

I want to delete rows that have the same value in Field2;Field3 or Field4;Field5 or both.

Expected output :

Field1;Field2;Field3;Field4;Field5
alpha;15;16;delta;kappa

CodePudding user response:

Suggesting awk script:

awk -F';' '$2==$3||$4==$5{next}1' input.csv

This will print input.csv excluding the required lines.

awk -i inplace -F';' '$2==$3||$4==$5{next}1' input.csv

This will updateinput.csv excluding the required lines.

  • Related