Home > Mobile >  Add location to duplicate names in a CSV file using Bash
Add location to duplicate names in a CSV file using Bash

Time:11-04

Using Bash create user logins. Add the location if the name is duplicated. Location should be added to the original name, as well as to the duplicates.

id,location,name,login
1,KP,Lacie,
2,US,Pamella,
3,CY,Korrie,
4,NI,Korrie,
5,BT,Queenie,
6,AW,Donnie,
7,GP,Pamella,
8,KP,Pamella,
9,LC,Pamella,
10,GM,Ericka,

The result should look like this:

id,location,name,login
1,KP,Lacie,[email protected]
2,US,Pamella,[email protected]
3,CY,Korrie,[email protected]
4,NI,Korrie,[email protected]
5,BT,Queenie,[email protected]
6,AW,Donnie,[email protected]
7,GP,Pamella,[email protected]
8,KP,Pamella,[email protected]
9,LC,Pamella,[email protected]
10,GM,Ericka,[email protected]

I used AWK to process the csv file.

    cat data.csv | awk 'BEGIN {FS=OFS=","};
    NR > 1 {
    split($3, name)
    $4 = tolower($3)
    split($4, login)
    for (k in login) {
    !a[login[k]]   ? sub(login[k], login[k]"@mail.com", $4) : sub(login[k], tolower($2)login[k]"@mail.com", $4)
    }
    }; 1' > data_new.csv 

The script adds location values only to further duplicates.

id,location,name,login
1,KP,Lacie,[email protected]
2,US,Pamella,[email protected]
3,CY,Korrie,[email protected]
4,NI,Korrie,[email protected]
5,BT,Queenie,[email protected]
6,AW,Donnie,[email protected]
7,GP,Pamella,[email protected]
8,KP,Pamella,[email protected]
9,LC,Pamella,[email protected]
10,GM,Ericka,[email protected]

How do I add location to the initial one?

CodePudding user response:

A common solution is to have Awk process the same file twice if you need to know whether there are duplicates down the line.

Notice also that this requires you to avoid the useless use of cat.

awk 'BEGIN {FS=OFS=","};
  NR == FNR {   seen[$3]; next }
  FNR > 1 { $4 = (seen[$3] > 1 ? tolower($2) : "") tolower($3) "@mail.com" }
  1' data.csv data.csv >data_new.csv

NR==FNR is true when you read the file the first time. We simply count the number of occurrences of $3 in seen for the second pass.

Then in the second pass, we can just look at the current entry in seen to figure out whether or not we need to add the prefix.

  • Related