Home > Blockchain >  Extract a Certain Substrings from Certain Columns from a Delimited CSV File and save the New Edited
Extract a Certain Substrings from Certain Columns from a Delimited CSV File and save the New Edited

Time:07-03

I am trying to Solve a UseCase using Shell Script::

I have a Sample CSV File Like as Below -

Col1|Col2|Col3|Col4|Col5
120 Sam|145678 Sam|Pp|Iss|samrat
134 Jhu|456788 Uip|Tt|Acc|jhurt
678 Pop|120987 Por|Uu|Try|pord

I am trying to get the Substrings from Col1, Col2,Col3 and Col4 and create a new file with it as below.

Col1 - First 3 Characters

Col2 - First 6 Characters

Col3 and Col4 - First Character

Col1|Col2|Col3|Col4|Col5
120|145678|P|I|samrat
134|456788|T|A|jhurt
678|120987|U|T|pord

I am able to do them separately like as below but i am not able to put all together and make all the edits happen at one shot in the same file and create a new file with it.

cut -d"|" -f1 | cut -c 1-3

Please help with the implementation. Thanks in Advance cut -d"|" -f2 | cut -c 1-6

CodePudding user response:

An awk solution with substr:

awk '
BEGIN{FS=OFS="|"}
NR==1 { print; next }
{print substr($1,1,3), substr($2,1,6), substr($3,1,1), substr($4,1,1), $5}
' file

Col1|Col2|Col3|Col4|Col5
120|145678|P|I|samrat
134|456788|T|A|jhurt
678|120987|U|T|pord

CodePudding user response:

With your shown samples, please try following awk code. Written and tested in GNU awk using awk's match function with creating array named arr in it.

awk '
BEGIN { FS=OFS="|" }
FNR==1{ print;next }
match($0,/^(.{3})([^|]*\|)(.{6})[^|]*\|(.)[^|]*\|(.)[^|]*\|([^|]*)/,arr){
  print arr[1],arr[3],arr[4],arr[5],arr[6]
}
' Input_file

CodePudding user response:

Using sed

$ sed -E '1!s/^(.{3})[^|] (.{7})[^|] (.{2})[^|] (.{2})[^|] /\1\2\3\4/' < input_file > output_file
$ cat output_file
Col1|Col2|Col3|Col4|Col5
120|145678|P|I|samrat
134|456788|T|A|jhurt
678|120987|U|T|pord
  • Related