Home > Mobile >  Bash shell for data cleansing
Bash shell for data cleansing

Time:10-25

I'm new to bash scripting and I am learning scripting for data cleansing. I have a large file which I have managed to cut out the necessary columns and save it to a new file. Need help to achieve the outcome I am looking for.

   2 Media Server Community - WebRTC, MP4, HLS, RTMP"
  29 Media Server Enterprise
   7 Media Server lite
  10 Media server lite 1.0
 468 Media server lite 2.0
   8 Media server lite 2.3
   1 Media server lite 2.4
  40 Media server lite 3.0
   3 Media server lite 3.3

How could I edit this file to now make the csv file as

   2 | Media Server Community - WebRTC, MP4, HLS, RTMP"
  29 | Media Server Enterprise
   7 | Media Server lite
  10 | Media server lite 1.0
 468 | Media server lite 2.0
   8 | Media server lite 2.3
   1 | Media server lite 2.4
  40 | Media server lite 3.0
   3 | Media server lite 3.3

CodePudding user response:

I'd rather see you post (parts of) the original data file and show you how it's done all the way with awk, but here's what you asked for using GNU awk (gensub):

$ gawk '{print gensub(/([0-9]  )/,"\\1| ",1,$0)}' file

Output:

   2 | Media Server Community - WebRTC, MP4, HLS, RTMP"
  29 | Media Server Enterprise
   7 | Media Server lite
...

Edit: Hmm, too much gensub lately I guess, just use awk:

$ awk '{sub(/([0-9]  )/,"&| ")}1' file

CodePudding user response:

Another approach with any awk is to use match() to locate where the first number and whitespace ends and then use substr() to print up to that point, add a "|" and then use substr() again to print from that point to the end, e.g.

awk '{ 
    match($0,/^[ \t0-9] /)
    print substr($0,0,RLENGTH-1), "|", substr($0, RLENGTH 1)
}'

Example Use/Output

With your sample input in the file name media, you would do:

$ awk '{ match($0,/^[ \t0-9] /); print substr($0,0,RLENGTH-1), "|", substr($0, RLENGTH 1) }' media
   2 | Media Server Community - WebRTC, MP4, HLS, RTMP"
  29 | Media Server Enterprise
   7 | Media Server lite
  10 | Media server lite 1.0
 468 | Media server lite 2.0
   8 | Media server lite 2.3
   1 | Media server lite 2.4
  40 | Media server lite 3.0
  • Related