I am reading a text file which has data like this string input: String input :
string = """unwanted strings
/start INPUT_DATA
input_data_Name_One input_data_Name_two
/end INPUT_DATA
unwanted string
/start OUTPUT_DATA
ouput_data_Name_One ouput_data_Name_Two
/end OUTPUT_DATA
unwanted string
/start LOC_DATA
loc_Data_Name_One loc_data_Name_Two loc_data_Name_Three
/end LOC_DATA
unwanted string
/start LOC_THING
no_Need_One no_Need_Two no_Need_Three
/end LOC_THING
unwanted strings"""
matches = re.findall('start(.*?)end', string, re.DOTALL)
I tried with many expressions but newline,special charecters giving errors how to escape it.
expected output:
['input_data_Name_One' ,'input_data_Name_two','ouput_data_Name_One','ouput_data_Name_Two','loc_Data_Name_One','loc_data_Name_Two','loc_data_Name_Three']
CodePudding user response:
You can use this findall split
approach:
import re
string = """unwanted strings
/start INPUT_DATA
input_data_Name_One input_data_Name_two
/end INPUT_DATA
unwanted string
/start OUTPUT_DATA
ouput_data_Name_One ouput_data_Name_Two
/end OUTPUT_DATA
unwanted string
/start LOC_THING
no_Need_One no_Need_Two no_Need_Three
/end LOC_THING
/start LOC_DATA
loc_Data_Name_One loc_data_Name_Two loc_data_Name_Three
/end LOC_DATA
unwanted strings"""
print ( [s.split() for s in re.findall(r'/start .*_DATA\s ((?:. \n) ?)/end ', string)] )
Output:
[['input_data_Name_One', 'input_data_Name_two'], ['ouput_data_Name_One', 'ouput_data_Name_Two'], ['loc_Data_Name_One', 'loc_data_Name_Two', 'loc_data_Name_Three']]
Here:
- We use
findall
to match 1 lines between/start
line that ends with_DATA
and another line with/end
keywords. - Then we use
split
to split matches on whitespaces.
CodePudding user response:
list(filter(None,re.findall(r'(\w _[dD]ata_[_\w] |)', string )))
['input_data_Name_One', 'input_data_Name_two', 'ouput_data_Name_One', 'ouput_data_Name_Two', 'loc_Data_Name_One', 'loc_data_Name_Two', 'loc_data_Name_Three']Name_Three']