Home > Software engineering >  Python find text in string
Python find text in string

Time:09-28

I have the following string for which I want to extract data:

text_example = '\nExample text \nTECHNICAL PARTICULARS\nLength oa: ...............189.9m\nLength bp: ........176m\nBreadth moulded:  .......26.4m\nDepth moulded to main deck:  ....9.2m\n
  • Every variable I want to extract starts with \n
  • The value I want to get starts with a colon ':' followed by more than 1 dot
  • When it doesnt start with a colon followed by dots, I dont want to extract that value.

For example my preferred output looks like:

LOA = 189.9
LBP = 176.0
BM = 26.4
DM = 9.2

CodePudding user response:

import re

text_example = '\nExample text \nTECHNICAL PARTICULARS\nLength oa: ...............189.9m\nLength bp: ........176m\nBreadth moulded:  .......26.4m\nDepth moulded to main deck:  ....9.2m\n'

# capture all the characters BEFORE the ':' character

variables = re.findall(r'(.*?):', text_example)

# matches all floats and integers (does not account for minus signs)

values = re.findall(r'(\d (?:\.\d )?)', text_example)

# zip into dictionary (this is assuming you will have the same number of results for both regex expression.

result = dict(zip(variables, values))

print(result)

--> {'Length oa': '189.9', 'Breadth moulded': '26.4', 'Length bp': '176', 'Depth moulded to main deck': '9.2'}

CodePudding user response:

You can create a regex and workaround the solution-

re.findall(r'(\\n|\n)([A-Za-z\s]*)(?:(\:\s*\. ))(\d*\.*\d*)',text_example)[2]

('\n', 'Breadth moulded', ':  .......', '26.4')    

                                                
  • Related