Home > Software engineering >  Print lines where any word begins and ends with the same letter in linux
Print lines where any word begins and ends with the same letter in linux

Time:10-24

I have input like

sie%Qu7s Kuux"oh9 ohc9ahG% hoe8Toh: Eix*ohd1 doh:bo2U Cu0doo|t zo`L9xaW
fie5Du[h Phe8aid# Opu&fai5 ieZ<aek6 hu4ga&Di Oose}p1p aiD@oos2 nu-a1Fub
ahqu5To/ ahtie[H3 ioK&u5Ai nei1Za#d poo_Th9r gu|aGh7h uZ%io2ah IeNah&v7
eif\e8AE Ieb,ing4 reph1oW* eeSh'ee8 Ah ei4ai Oi0Ca,vu Esh1xe?e Wei&k4ic
ue5OhQu. aaf-i8uP eedae%T5 sei?M9Pu ieH[oh2l ieh~ah8A aev"oo9A Ohf"i8de
Foh:x2zi aLoo'qu2 Ia6aig-e La{vie1E IeFoh{cI Au_h7Hee Se)f4ebi Cah$yu7m

where each word in the column constitutes a password ، i am trying to print lines where where any word begins and ends with the same letter , with this we do not distinguish between uppercase and lowercase letters

i know with command grep i can do this

cat passwords.txt | grep -e ' \([A-Z]\)......\1 ' -e ' \([a-z]\)......\1 '

but here the word will start and end only with same latter (uppercase or lowercase letters) , like

Foh:x2zi aLoo'qu2 Ia6aig-e La{vie1E IeFoh{cI Au_h7Hee Se)f4ebi Cah$yu7m

expected output

    eif\e8AE Ieb,ing4 reph1oW* eeSh'ee8 Ah ei4ai Oi0Ca,vu Esh1xe?e Wei&k4ic
    sie%Qu7s Kuux"oh9 ohc9ahG% hoe8Toh: Eix*ohd1 doh:bo2U Cu0doo|t zo`L9xaW
    ue5OhQu. aaf-i8uP eedae%T5 sei?M9Pu ieH[oh2l ieh~ah8A aev"oo9A Ohf"i8de
    Foh:x2zi aLoo'qu2 Ia6aig-e La{vie1E IeFoh{cI Au_h7Hee Se)f4ebi Cah$yu7m
    ahqu5To/ ahtie[H3 ioK&u5Ai nei1Za#d poo_Th9r gu|aGh7h uZ%io2ah IeNah&v7

CodePudding user response:

With GNU grep:

grep -iE '(.)[^ ]{6}\1' passwords.txt

Output:

sie%Qu7s Kuux"oh9 ohc9ahG% hoe8Toh: Eix*ohd1 doh:bo2U Cu0doo|t zo`L9xaW
ahqu5To/ ahtie[H3 ioK&u5Ai nei1Za#d poo_Th9r gu|aGh7h uZ%io2ah IeNah&v7
eif\e8AE Ieb,ing4 reph1oW* eeSh'ee8 Ah ei4ai Oi0Ca,vu Esh1xe?e Wei&k4ic
ue5OhQu. aaf-i8uP eedae%T5 sei?M9Pu ieH[oh2l ieh~ah8A aev"oo9A Ohf"i8de
Foh:x2zi aLoo'qu2 Ia6aig-e La{vie1E IeFoh{cI Au_h7Hee Se)f4ebi Cah$yu7m

-i: Ignore case distinctions in patterns and input data, so that characters that differ only in case match each other.

-E: Interpret (.)[^ ]{6}\1 as extended regular expressions.

CodePudding user response:

Use GNU grep:

grep -i -P '(?<!\S)(\S)(?:\S*\1)?(?!\S)' passwords.txt

The -i option turns on case insensitivity, -P turns on PCRE flavor (supports lookbehinds/lookaheads).

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
    \S                       non-whitespace (all but \n, \r, \t, \f,
                             and " ")
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \S                       non-whitespace (all but \n, \r, \t, \f,
                             and " ")
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \S*                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (0 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
    \1                       what was matched by capture \1
--------------------------------------------------------------------------------
  )?                       end of grouping
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    \S                       non-whitespace (all but \n, \r, \t, \f,
                             and " ")
--------------------------------------------------------------------------------
  )                        end of look-ahead
  • Related