Home > Software engineering >  sed Extract unique characters within each line
sed Extract unique characters within each line

Time:02-13

I want to get unique chars within each line using regular expressions in a Shell Script (sh). In other words, I want to remove any further occurrence of a char within each line.

I'm trying to answer this question: "What characters do appears in each line?"

For example, I'm trying to do something like this:

echo '1.Hi
2.This is
3.a huge file
4.with repeated chars
5.per
6.line' | sed 's/MYSTERIOUS_REGEX/MYSTERIOUS_REPLACE/g'

And the expected output is:

1.Hi
2.This 
3.a hugefil
4.with repadcs
5.per
6.line

This is the explanation:

  • Line 1: there isn't any repeated chars
  • Line 2: 'i', 's' repeated
  • Line 3: ' ', 'e' repeated
  • Line 4: 'e', 'a', 't', 'e', 'd', ' ', 'c', 'h', 'a', 'r' repeated
  • Line 5: there isn't any repeated chars
  • Line 6: there isn't any repeated chars

OBS:

  • If you achieve this using sh and sed you obtain 5⭐s
  • If you achieve this using other tools (bash, awk etc), you obtain 3⭐s

̶D̶i̶s̶t̶r̶a̶c̶t̶o̶r̶ ̶ HINT:

The following regex matches lines which don't have repeated chars: ^(?:([A-Za-z])(?!.*\1))*$

echo "bleh" | grep -P '^(?:([A-Za-z])(?!.*\1))*$'

ble

echo "fooo" | grep -P '^(?:([A-Za-z])(?!.*\1))*$'

(empty)



CodePudding user response:

You may use this gnu-sed solution:

sed -E ':a;s/(.)(.*)\1/\1\2/g;ta' file

1.Hi
2.This
3.a hugefil
4.with repadcs
5.per
6.line

Alternative awk non-regex solution (should work in any awk version):

awk '{
   delete seen
   for (i=1; i<=length();   i) {
      ch = substr($0,i,1)
      if (!seen[ch]  ) printf "%s", ch
   }
   print ""
}' file

1.Hi
2.This
3.a hugefil
4.with repadcs
5.per
6.line

CodePudding user response:

With your shown samples, please try following awk code. Written and tested in GNU awk.

awk -v FS="" '
{
  delete seen
  for(i=1;i<=NF;i  ){
    if(!seen[$i]  ){
      val=(i>1?val:"") $i
    }
  }
  print val
}
'  Input_file

Explanation: Adding detailed explanation for above.

awk -v FS="" '              ##Starting awk program from here and setting field separator as NULL here.
{
  delete seen               ##Deleting seen array here.
  for(i=1;i<=NF;i  ){       ##Traversing through all field of current line here.
    if(!seen[$i]  ){        ##checking condition if current field value is NOT present in seen.
      val=(i>1?val:"")$i ##Then add its value in val variable and keep concatenating these kind of values, if they pass above condition.
    }
  }
  print val                 ##Printing val here.
}
'  Input_file               ##Mentioning Input_file name here.
  • Related