I want to get unique chars within each line using regular expressions in a Shell Script (sh
).
In other words, I want to remove any further occurrence of a char within each line.
I'm trying to answer this question: "What characters do appears in each line?"
For example, I'm trying to do something like this:
echo '1.Hi
2.This is
3.a huge file
4.with repeated chars
5.per
6.line' | sed 's/MYSTERIOUS_REGEX/MYSTERIOUS_REPLACE/g'
And the expected output is:
1.Hi
2.This
3.a hugefil
4.with repadcs
5.per
6.line
This is the explanation:
- Line 1: there isn't any repeated chars
- Line 2: '
i
', 's
' repeated - Line 3: '
e
' repeated - Line 4: '
e
', 'a
', 't
', 'e
', 'd
', 'c
', 'h
', 'a
', 'r
' repeated - Line 5: there isn't any repeated chars
- Line 6: there isn't any repeated chars
OBS:
- If you achieve this using
sh
andsed
you obtain 5⭐s - If you achieve this using other tools (
bash
,awk
etc), you obtain 3⭐s
̶D̶i̶s̶t̶r̶a̶c̶t̶o̶r̶ ̶ HINT:
The following regex matches lines which don't have repeated chars: ^(?:([A-Za-z])(?!.*\1))*$
echo "bleh" | grep -P '^(?:([A-Za-z])(?!.*\1))*$'
ble
echo "fooo" | grep -P '^(?:([A-Za-z])(?!.*\1))*$'
(empty)
CodePudding user response:
You may use this gnu-sed
solution:
sed -E ':a;s/(.)(.*)\1/\1\2/g;ta' file
1.Hi
2.This
3.a hugefil
4.with repadcs
5.per
6.line
Alternative awk
non-regex solution (should work in any awk version):
awk '{
delete seen
for (i=1; i<=length(); i) {
ch = substr($0,i,1)
if (!seen[ch] ) printf "%s", ch
}
print ""
}' file
1.Hi
2.This
3.a hugefil
4.with repadcs
5.per
6.line
CodePudding user response:
With your shown samples, please try following awk
code. Written and tested in GNU awk
.
awk -v FS="" '
{
delete seen
for(i=1;i<=NF;i ){
if(!seen[$i] ){
val=(i>1?val:"") $i
}
}
print val
}
' Input_file
Explanation: Adding detailed explanation for above.
awk -v FS="" ' ##Starting awk program from here and setting field separator as NULL here.
{
delete seen ##Deleting seen array here.
for(i=1;i<=NF;i ){ ##Traversing through all field of current line here.
if(!seen[$i] ){ ##checking condition if current field value is NOT present in seen.
val=(i>1?val:"")$i ##Then add its value in val variable and keep concatenating these kind of values, if they pass above condition.
}
}
print val ##Printing val here.
}
' Input_file ##Mentioning Input_file name here.