Linux Command to get fields from CSV files-CodePudding

In csv files on Linux server, I have thousands of rows in below csv format

0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|

I need to get output from all the files on below format (2nd field ie 20221208195546466 and 5th field but value after Above as: and before first | ie 2 in above example )

output :

20221208195546466 , 2

Can anyone help me with linux command ?

Edit :

my attempts

I tried but it give field 5th value. How to add field 2 as well ?

cat *.csv | cut -d, -f5|cut -d'|' -f1|cut -d':' -f2|

CodePudding user response：

You can use awk for this:

awk -F',' '{gsub(/Above as:/,""); gsub(/\|.*/, ""); print($2, $5)}'

Probably need to adopt regexp a bit.

CodePudding user response：

Gnu awk allows multiple field separators to be set, allowing you to delimit each record at ,, |, and : at the same time. Thus, the following will fish out the required fields from file.csv:

awk -F'[,|:]' '{print $2", "$6}' file.csv

Tested on the single record example:

echo "0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|" | awk -F'[,|:]' '{print $2", "$6}'

output:

20221208195546466, 2

CodePudding user response：

Assumptions:

starting string of the 5th comma-delimited field can vary from line to line (ie, not known before hand)
the item of interest in the 5th comma-delimited field occurs between the first : and the first |

Sample data:

$ cat test.csv
0,20221208195546466,9,200,Above as:2|RAN34f2fb:HAER:0|RAND8365b2bca763:FON:0|RANDa7a5f964900b:ION:0|
1,20230124123456789,10,1730,Total ts:7|stuff:HAER:0|morestuff:FON:0|yetmorestuff:ION:0|

One awk approach:

awk '
BEGIN { FS=OFS="," }                    # define input/output field delimiter as ","
      { split($5,a,"[:|]")              # split 5th field on dual delimiters ":" and "|", store results in array a[]
        print $2,a[2]                   # print desired items to stdout
      }
' test.csv

This generates:

20221208195546466,2
20230124123456789,7