How to extract and replace columns with a multi-character delimiter?-CodePudding

I got a file with ^$ as delimiter, the text is like :

tony^$36^$developer^$20210310^$CA

I want to replace the datetime. I tried awk -F '\^\$' '{print $4}' file.txt | sed -i '/20210310/20221210/' , but it returns nothing. Then I tried the awk part, it returns nothing, I guess it still treat the line as a whole and the delimiter doesn't work. Wondering why and how to solve it?

CodePudding user response：

When I run your awk command, I get these warnings:

awk: warning: escape sequence `\^' treated as plain `^'
awk: warning: escape sequence `\$' treated as plain `$'

That explains why your output is blank: the field delimiter is interpreted as the regular expression '^$', which matches a completely blank line (only). As a result, each non-blank line of input is without any field separators, and therefore has only a single field. $4 can be non-empty only if there are at least four fields.

You can fix that by escaping the backslashes:

awk -F '\\^\\$' '{print $4}' file.txt

If all you want to do is print the modified datecodes py themselves, then that should get you going. However, the question ...

How to extract and replace columns with a multi-character delimiter?

... sounds like you may want actually to replace the datecode within each line, keeping the rest intact. In that case, it is a non-starter for the awk command to discard the other parts of the line. You have several options here, but two of the more likely would be

instead of sending field 4 out to sed for substitution, do the sub in the awk script, and then reconstitute the input line by printing all fields, with the expected delimiters. (This is left as an exercise.) OR
do the whole thing in sed:
```
sed -E 's/^((([^^]|\^[^$])*\^\$){3})20210310(\^\$.*)/\120221210\4/' file.txt
```
If you wanted to modify file.txt in-place then you could add the -i flag (which, on the other hand, is not useful in your original command, where sed's input is coming from a pipe rather than a file).

The -E option engages the POSIX extended regex dialect, which allows the given regex to be more readable (the alternative would require a bunch more \ characters).

Overall, presuming that there are five or more fields delimited by literal '^$' strings, and the fourth contains exactly "20210310", that matches the first three fields, including their trailing delimiters, and captures them all as group 1; matches the leading delimiter of the fifth field and all the remainder of the line and captures it as group 4; and substitutes replaces the whole line with group 1 followed by the new datecode followed by group 4.

CodePudding user response：

A simple solution would be:

sed 's/\^\$/\n/g; s/20210310/20221210/g' -i file.txt

which will modify the file to separate each section to a new line.

If you need a different delimiter, change the \n in the command to maybe space or , .. up to you.

And it will also replace the date in the file.

If you want to see the changes, and really modify the file, remove the -i from the command.