I have a text file like this:
## COL
{ "Id": 1, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueC", ... "keyN": "valueN"}
{ "Id": 2, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
.
.
.
{ "Id": n, "key1": "value1", "key2": "valueZ", ... "keyN": "valueN"}
## USA
{ "Id": 1, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueC", ... "keyN": "valueN"}
{ "Id": 2, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
.
.
.
{ "Id": n, "key1": "value1", "key2": "valueZ", ... "keyN": "valueN"}
## ESP
{ "Id": 1, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueC", ... "keyN": "valueN"}
{ "Id": 2, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
.
.
.
{ "Id": n, "key1": "value1", "key2": "valueZ", ... "keyN": "valueN"}
I need to extract just the lines for a specific country using regex and python, for example:
## COL
{ "Id": 1, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueC", ... "keyN": "valueN"}
{ "Id": 2, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
.
.
.
{ "Id": n, "key1": "value1", "key2": "valueZ", ... "keyN": "valueN"}
Note: There is no key or value that identifies the country, only those text marks line from the previous example
I try this regex without success:
(?<=## COL).*[\w\s]*(?=##})
Thanks in advance!
CodePudding user response:
With a regex:
import re
m = re.search(r'^## COL\n(?:(?!##).) ', text, flags=re.S)
if m:
print(m.group())
More efficient alternative:
m = re.search(r'^## COL\n(?:(?:(?!##).*)\n) ', text).group()
Output:
## COL
{ "Id": 1, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueC", ... "keyN": "valueN"}
{ "Id": 2, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
.
.
.
{ "Id": n, "key1": "value1", "key2": "valueZ", ... "keyN": "valueN"}
regex demo alternative (with blank lines)
CodePudding user response:
What about ## COL[^#]*
? It should be sufficient to match the requested pattern ? No look ahead or behind necessary.
See https://regex101.com/r/pc0iaV/1 for demonstration that it works.