how to grab text after newline in a text file no clean of spaces, tabs, unicode-CodePudding

assume this:

This is the only text I'M showing. The remaining text has more data[not showing], the problem. The text is semi-clean, full of whitespace, tabs, Unicode, isn't clean and has to be like this[my needs], so copy/paste this exact text doesn't work [formatted by markup]:

I have SOME text like this:

*** *
more text with spaces and  tabs                                                             
*****
1
Something here and else, 2000 edf, 60 pop
    Usd324.32           2 Usd534.22
2
21st New tetx that will like to select with pattern, 334 pop
    Usd162.14

*** *
more text with spaces and tabs, unicode
*****

I'm trying to grab this explicit text:

1 Something here and else, 2000 edf, 60 pop Usd324.32

because of the newline and whitespace, the next command only grabs 1:

grep -E '1\s. '

also, I have been trying to make it with new concats:

grep -E '1\s|[A-Z]. '

But doesn't work, grep begins to select similar pattern in different parts of the text

awk '{$1=$1}1'   #done already
tr -s "\t\r\n\v" #done already
tr -d "\t\b\r"   #done already

How can I grab:

grab 1 newline
grab the whole second line after 1 newline
grab the number $Usd324.34 and remove Usd

CodePudding user response：

Pure Bash:

#! /bin/bash

exec <<EOF
*** *
more text with spaces and  tabs                                                             
*****
1
Something here and else, 2000 edf, 60 pop
    Usd324.32           2 Usd534.22
2
21st New tetx that will like to select with pattern, 334 pop
    Usd162.14

*** *
more text with spaces and tabs, unicode
*****
EOF

while read -r line1; do
  if [[ $line1 =~ ^1$ ]]; then
    read -r line2
    read -r line3col1 dontcare
    printf '%s %s %s\n' "$line1" "$line2" "${line3col1#Usd}"
  fi
done

CodePudding user response：

You can use this sed:

sed -En '/^1/ {N;N;s/[[:blank:]]*Usd([^[:blank:]] )[^\n]*$/\1/; s/\n/ /gp;}' file

1 Something here and else, 2000 edf, 60 pop 324.32

Or this awk would also work:

awk '$0 == 1 {
   printf "%s", $0
   getline
   printf " %s ", $0
   getline
   sub(/Usd/, "")
   print $1
}' file

1 Something here and else, 2000 edf, 60 pop 324.32