Home > Software design >  field ordering with sed
field ordering with sed

Time:06-06

I have a bash script with a sed command that I want to run on a csv file to change the order of some fields. This is what I tried:

sed -r '{
   s/(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*)/\1,\2,\3,\4,\5,\6,\8,\9,\10,\11,\7/
}' $1

The problem comes because I have 11 fields (i.e. two digit number) so when I specify the desired order i.e.

"\1,\2,\3,\4,\5,\6,\8,\9,\10,\11,\7/"

10 and 11 are taken as literals and this ruins my attempt. I have tried the obvious alternatives like:

"\1,\2,\3,\4,\5,\6,\8,\9,(10),(11),\7/"

or

"\1,\2,\3,\4,\5,\6,\8,\9,{10},{11},\7/"

or

"\1,\2,\3,\4,\5,\6,\8,\9,\(10),\(11),\7/"

or

"\1,\2,\3,\4,\5,\6,\8,\9,\{10},\{11},\7/"

But none of these work, they are also treated as literals. I am running out of imagination, do not know what else to try, any ideas?

I know there are other ways of going about this problem (like awk etc), but I would appreciate if your answers are focused in sed since the rest of my code is done using sed.

Wish to thank you all in advance!

CodePudding user response:

I see three blocks: fields 1-6, field 7 and the rest. So you can use

sed -r '{s/^(([^,]*,){6})([^,]*),(.*)/\1\4,\3/}'

CodePudding user response:

From info sed

The REPLACEMENT can contain '\N' (N being a number from 1 to 9, inclusive) references, which refer to the portion of the match which is contained between the Nth '(' and its matching ')'.

So \10 will be taken literally.

perl however does not have this limitation

$ perl -pe 's/([^ ]* )([^ ]* )([^ ]* )([^ ]* )([^ ]* )([^ ]* )([^ ]* )([^ ]* )([^ ]* )([^ ]* )([^ ]* )([^ ]* )([^ ]* )(.*)/$14 $13$12$11$10$9$8$7$6$5$4$3$2$1/g;' <<< "one two three four five six seven eight nine ten eleven twelve thirteen fourteen"
fourteen thirteen twelve eleven ten nine eight seven six five four three two one

CodePudding user response:

Instead of (ab)using sed, I'd do this with csvcut from the csvkit package of utilities for working with CSV files:

csvcut -c 1,2,3,4,5,6,8,9,10,11,7 old.csv > new.csv

Or using perl instead:

perl -F, -lane 'print join(",", @F[0..5, 7..10, 6])' old.csv > new.csv

(Splits each line into an array on commas, and print out a reordered version; no ugly regular expressions needed.)

CodePudding user response:

There simply isn't a way in sed to reference a 10th or later capture; only \1 through \9 are recognized.

This sort of thing is more in the wheelhouse of awk, so even though you asked for sed solutions, I'll offer a couple awk solutions as examples.

Since awk happily recognizes multiple-digit field numbers, and we don't really have any choice but to treat every occurrence of the field separator as marking a new field anyway, we're counting all the fields separately, unlike @leu's sed answer. But that means these solutions lend themselves more easily to modification for more thorough shuffling of the fields.

Here's a straightforward sequence of field assignments with a temp var to remember the first one until we need it:

$ awk -v{,O}FS=, '{t=$7;$7=$8;$8=$9;$9=$10;$10=$11;$11=t}1'

Although since you have a whole block of fields being shifted left by one position, perhaps a more systematic approach with a loop would be suitable:

$ awk -v{,O}FS=, '{t=$7; for (i=7;i<11;  i) { $i=$(i 1) }; $11=t}1'

If you were shifting a larger number of items by the same amount, the loop would make more sense; if the rearrangement was more random, the series of direct assignments would make more sense. In this case, they're about the same amount of code, so it's a toss-up.

  • Related