Home > Blockchain >  extract text lines between two lines with text marks using regex
extract text lines between two lines with text marks using regex

Time:09-04

I have a text file like this:

## COL
{ "Id": 1, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueC", ... "keyN": "valueN"}
{ "Id": 2, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
                               .
                               .
                               .
{ "Id": n, "key1": "value1", "key2": "valueZ", ... "keyN": "valueN"}

## USA
{ "Id": 1, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueC", ... "keyN": "valueN"}
{ "Id": 2, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
                               .
                               .
                               .
{ "Id": n, "key1": "value1", "key2": "valueZ", ... "keyN": "valueN"}

## ESP
{ "Id": 1, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueC", ... "keyN": "valueN"}
{ "Id": 2, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
                               .
                               .
                               .
{ "Id": n, "key1": "value1", "key2": "valueZ", ... "keyN": "valueN"}

I need to extract just the lines for a specific country using regex and python, for example:

## COL
{ "Id": 1, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueC", ... "keyN": "valueN"}
{ "Id": 2, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
                               .
                               .
                               .
{ "Id": n, "key1": "value1", "key2": "valueZ", ... "keyN": "valueN"}

Note: There is no key or value that identifies the country, only those text marks line from the previous example

I try this regex without success:

(?<=## COL).*[\w\s]*(?=##})

Thanks in advance!

CodePudding user response:

With a regex:

import re

m = re.search(r'^## COL\n(?:(?!##).) ', text, flags=re.S)

if m:
    print(m.group())

More efficient alternative:

m = re.search(r'^## COL\n(?:(?:(?!##).*)\n) ', text).group()

Output:

## COL
{ "Id": 1, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueC", ... "keyN": "valueN"}
{ "Id": 2, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
                               .
                               .
                               .
{ "Id": n, "key1": "value1", "key2": "valueZ", ... "keyN": "valueN"}

regex demo option 1

regex demo alternative (with blank lines)

CodePudding user response:

What about ## COL[^#]* ? It should be sufficient to match the requested pattern ? No look ahead or behind necessary.

See https://regex101.com/r/pc0iaV/1 for demonstration that it works.

  • Related