Home > front end >  Remove \r\ character from String pattern matched in AWK
Remove \r\ character from String pattern matched in AWK

Time:03-26

I'm quite new to AWK so apologies for the basic question. I've found many references for removing windows end-line characters from files but none that match a regular expression and subsequently remove the windows end line characters.

I have a file named infile.txt that contains a line like so:

...
DATAFILE   data5v.dat
...

Within a shell script I want to capture the filename argument data5v.dat from this infile.txt and remove any carriage return character, \r, IF present. The carriage return may not always be present. So I have to match a word and then remove the \r subsequently.

I have tried the following but it is not working how I expect:

FILENAME=$(awk '/DATAFILE/ { print gsub("\r", "", $2) }' $INFILE)

Can I store the string returned from matching my regex /DATAFILE/ in a variable within my AWK statement to subsequently apply gsub?

CodePudding user response:

File names can contain spaces, including \rs, blanks and tabs, so to do this robustly you can't remove all \rs with gsub() and you can't rely on there being any field, e.g. $2, that contains the whole file name.

If your input fields are tab-separated you need:

awk '/DATAFILE/ { sub(/[^\t] \t/,""); sub(/\r$/,""); print }' file

or this otherwise:

awk '/DATAFILE/ { sub(/[^[:space:]] [[:space:]] /,""); sub(/\r$/,""); print }' file

The above assumes your file names don't start with spaces and don't contain newlines.

To test any solution for robustness try:

printf 'DATAFILE\tfoo \r bar\r\n' | awk '...' | cat -TEv

and make sure that the output looks like it does below:

$ printf 'DATAFILE\tfoo \r\tbar\r\n' | awk '/DATAFILE/ { sub(/[^\t] \t/,""); sub(/\r$/,""); print }' | cat -TEv
foo ^M^Ibar$

$ printf 'DATAFILE\tfoo \r\tbar\r\n' | awk '/DATAFILE/ { sub(/[^[:space:]] [[:space:]] /,""); sub(/\r$/,""); print }' | cat -TEv
foo ^M^Ibar$

Note the blank, ^M (CR), and ^I (tab) in the middle of the file name as they should be but no ^M at the end of the line.

If your version of cat doesn't support -T or -E then do whatever you normally do to look for non-printing chars, e.g. od -c or vi the output.

CodePudding user response:

Would you please try the following:

FILENAME=$(awk -v RS='\r?\n' '/DATAFILE/ {print $2}' "$INFILE")
echo "$FILENAME"

It assigns the record separator RS to a sequence of zero or one \r followed by \n.
As a side note, it is not recommended to use uppercases for user's variable names because it may conflict with system reserved variable names.

CodePudding user response:

Awk simply applies each line of script to each input line. You can easily remove the carriage return and then apply some other logic to the input line. For example,

FILENAME=$(awk '/\r/ { sub(/\r/, "") }
     /DATAFILE/ { print $2 }' "$INFILE")

Notice also When to wrap quotes around a shell variable.

  • Related