Home > Blockchain >  How to split a file in bash by pattern if find a number
How to split a file in bash by pattern if find a number

Time:06-10

I have a text like:

1Lorem ipsum dolor sit amet, consectetur adipiscing elit. 2Vivamus dictum, justo mattis sollicitudin pretium, ante magna gravida ligula, 3a condimentum libero tortor sit amet lectus. Nulla congue mauris quis lobortis interdum. 4Integer eget ante mattis ante egestas suscipit. Suspendisse imperdiet pellentesque risus, a luctus sem pellentesque nec. Curabitur vel luctus eros. Morbi id magna sit amet 5risus hendrerit porta. Praesent vitae sapien in nunc aliquet pharetra vitae sed lectus. Donec id magna magna. Phasellus eget rhoncus purus, vitae vestibulum nisl. 6Phasellus massa mi, ultricies id mi sit amet, tristique auctor mi. 

I want to split the text by the numbers found, whatever; like:

1Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
2Vivamus dictum, justo mattis sollicitudin pretium, ante magna gravida ligula,
3a condimentum libero tortor sit amet lectus. Nulla congue mauris quis lobortis interdum.
...

In awk, I tried:

cat text | awk -F'/^[- ]?[0-9] $/' '{for (i=1; i<= NF; i  ) print $i}'

Where -F is /^[- ]?[0-9] $/, a pattern used to test if is a number or not. But it`snt split the text.

If I change the pattern to any separator it works without problems, what is then the pattern that I should use for it?

CodePudding user response:

I would harness GNU AWK for this task following way, let file.txt content be

1Lorem ipsum dolor sit amet, consectetur adipiscing elit. 2Vivamus dictum, justo mattis sollicitudin pretium, ante magna gravida ligula, 3a condimentum libero tortor sit amet lectus. Nulla congue mauris quis lobortis interdum. 4Integer eget ante mattis ante egestas suscipit. Suspendisse imperdiet pellentesque risus, a luctus sem pellentesque nec. Curabitur vel luctus eros. Morbi id magna sit amet 5risus hendrerit porta. Praesent vitae sapien in nunc aliquet pharetra vitae sed lectus. Donec id magna magna. Phasellus eget rhoncus purus, vitae vestibulum nisl. 6Phasellus massa mi, ultricies id mi sit amet, tristique auctor mi.

then

awk 'BEGIN{RS="[- ]?[0-9] "}{printf "%s%s%s", $0, NR==1?"":"\n", RT}' file.txt

gives output

1Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
2Vivamus dictum, justo mattis sollicitudin pretium, ante magna gravida ligula, 
3a condimentum libero tortor sit amet lectus. Nulla congue mauris quis lobortis interdum. 
4Integer eget ante mattis ante egestas suscipit. Suspendisse imperdiet pellentesque risus, a luctus sem pellentesque nec. Curabitur vel luctus eros. Morbi id magna sit amet 
5risus hendrerit porta. Praesent vitae sapien in nunc aliquet pharetra vitae sed lectus. Donec id magna magna. Phasellus eget rhoncus purus, vitae vestibulum nisl. 
6Phasellus massa mi, ultricies id mi sit amet, tristique auctor mi.

Explanation: I inform GNU AWK that row separator (RS) is (- or ) repeated 0 or 1 time and digit repeated 1 or more time. Then for every row I printf content of said line followed by newline (only for non-first word) followed by found row terminator (RT).

(tested in gawk 4.2.1)

CodePudding user response:

This inserts a new line before every number, except the first, and also strips any whitespace before the new line.

sed -E 's/[[:blank:]]*([0-9] )/\
\1/g; s/\n//'

You still have the problem of numbers within each line which are regular content. These will also have a new line prepended.

  • Related