Implement tr and sed functions in awk-CodePudding

I need to process a text file - a big CSV - to correct format in it. This CSV has a field which contains XML data, formatted to be human readable: break up into multiple lines and indentation with spaces. I need to have every record in one line, so I am using awk to join lines, and after that I am using sed, to get rid of extra spaces between XML tags, and after that tr to eliminate unwanted "\r" characters. (the first record is always 8 numbers and the fiels separator is the pipe character: "|"

The awk scrips is (join4.awk)

BEGIN {
  # initialise "line" variable. Maybe unnecessary
  line=""
}

{
  # check if this line is a beginning of a new record
  if ( $0 ~ "^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]|" ) {
    # if it is a new record, then print stuff already collected
    # then update line variable with $0
    print line
    line = $0
  } else {
    # if it is not, then just attach $0 to the line
    line = line $0
  }
}

END {
  # print out the last record kept in line variable
  if (line) print line
}

and the commandline is

cat inputdata.csv | awk -f join4.awk | tr -d "\r" | sed 's/>  *</></g'   > corrected_data.csv

My question is if there is an efficient way to implement tr and sed functionality inside the awk script? - this is not Linux, so I gave no gawk, just simple old awk and nawk.

thanks,

--Trifo

CodePudding user response：

tr -d "\r"

Is just gsub(/\r/, "").

 sed 's/>  *</></g'

That's just gsub(/> *</, "><")

CodePudding user response：

mawk NF=NF RS='\r?\n' FS='> *<' OFS='><'