Home > front end >  Finding a special sequence of numbers (coordinates) in a string
Finding a special sequence of numbers (coordinates) in a string

Time:09-17

We have two kind of coordinates in our project. The normal one with x, y and z

Example:

101, 520, 62
960.93 764.22 59.20

And the extended version with 6 digits (2x xyz for position and rotation) Example:

101 520 62 3 0 0
960.93 764.22 59.20 -0.34 0.00 -89.81

They can be negative, they can be floats and can be rounded numbers. They can be separated by comma or by nothing

Using python, I am trying to find any coordinates in a string.

Example:

textbefore 101, 520, 62
GOTO 960.93 796.22 59.20 -0.34 0.00 -89.81
$5GOTO 1960.93 1796.22 159.20 -0.34 0.00 -89.81
501, 513, 162
1040, 1040, 520 text after 
error
222, 222
1500, 1500, 60  (1)
1337 1337 65
124.5, 133.6, 35.4
15:13:26  Condition: index_ != StringList::npos [line 178](125, 157, 215) 
Allocating shadow map cache 6324 x 6324: 76.28 MB

In the perfect world the output should be:

101 520 62
960.93 796.22 59.20 -0.34 0.00 -89.81
1960.93 1796.22 159.20 -0.34 0.00 -89.81
501 513 162
1040 1040 520
1500 1500 60
1337 1337 65
124.5 133.6 35.4
125 157 215 

The last line with "Allocating shadow maps, is a bit tricky and if this fails and gets listed as coordinate, its fine.

I used this code here, which filters the numbers very good, then I was checking for 6 or 3 numbers, but I have problems with lines which have more numbers. So I need somehow a logic which checks if there numbers are "close" to each other or even separated by words.

re.findall("[- ]?[.]?[\d] (?:,\d\d\d)*[\.]?\d*(?:[eE][- ]?\d )?", line)

If possible the code should work on Python 2.7 (Sadly we are far behind).

Thanks

CodePudding user response:

You can use below regex for this

(?:(?:[ -]?\d \.?\d*[ ,] ){5}[ -]?\d \.?\d*)|(?:(?:[ -]?\d \.?\d*[ ,] ){2}[ -]?\d \.?\d*)

This will search for 2/5 consecutive numbers with , or space delimited and a 3rd/6th number with a non-digit value.

Here is a demo.

Output

['101, 520, 62',
 '960.93 796.22 59.20 -0.34 0.00 -89.81',
 '1960.93 1796.22 159.20 -0.34 0.00 -89.81',
 '501, 513, 162',
 '1040, 1040, 520',
 '1500, 1500, 60',
 '1337 1337 65',
 '124.5, 133.6, 35.4',
 '125, 157, 215']

CodePudding user response:

Before getting too far into the code, you need to figure out an algorithm or method in psuedocode which will do what you ask. In this example you need to create python code to identify a number:

def is_number(input):
    if type(input) == int or type(input) == float:
        return True
    else:
        return False

Then I would split on spaces or commas and parse through the array you create looking for 3 or 6 Trues in a row

CodePudding user response:

s = '''textbefore 101, 520, 62
GOTO 960.93 796.22 59.20 -0.34 0.00 -89.81
$5GOTO 1960.93 1796.22 159.20 -0.34 0.00 -89.81
501, 513, 162
1040, 1040, 520 text after 
error
222, 222
1500, 1500, 60  (1)
1337 1337 65
124.5, 133.6, 35.4
15:13:26  Condition: index_ != StringList::npos [line 178](125, 157, 215) 
Allocating shadow map cache 6324 x 6324: 76.28 MB'''
s = s.split('\n')
s_row = []
for i in range(len(s)):
    s_row.append(s[i].replace(',', '').split(' '))

coord = []
for i in range(len(s_row)):
    coord_row = []
    for j in range(len(s_row[i])):
        try:
            s_row[i][j] = float(s_row[i][j])
            coord_row.append(s_row[i][j])
        except ValueError:
            None
    if coord_row != []:
        coord.append(coord_row)

will give you following output:


[[101.0, 520.0, 62.0]
[960.93, 796.22, 59.2, -0.34, 0.0, -89.81]
[1960.93, 1796.22, 159.2, -0.34, 0.0, -89.81]
[501.0, 513.0, 162.0]
[1040.0, 1040.0, 520.0]
[222.0, 222.0]
[1500.0, 1500.0, 60.0]
[1337.0, 1337.0, 65.0]
[124.5, 133.6, 35.4]
[157.0]
[6324.0, 76.28]]

CodePudding user response:

You can do this with a regular expression. Since the re module that ships with python doesn't handle repeating or nested capture groups well, you are best off assembling a larger regex from parts.

r"([- ]?[\d ][\.\d]*)"

is a regular expression that will match a decimal or float with optional sign. [- ]? matches the sign, [\d ] matches at least 1 decimal before a dot, [\.\d]* matches the optional fractional part of the float and the outer () tells the regex to emit the captured string.

r"[ ,]*"

is the separator between the decimal/floats. Now you could just write out 3 and 6 of these things together, but the code below does that with a bit of python.

import re
import io

test = io.StringIO("""textbefore 101, 520, 62
GOTO 960.93 796.22 59.20 -0.34 0.00 -89.81
$5GOTO 1960.93 1796.22 159.20 -0.34 0.00 -89.81
501, 513, 162
1040, 1040, 520 text after 
error
222, 222
1500, 1500, 60  (1)
1337 1337 65
124.5, 133.6, 35.4
15:13:26  Condition: index_ != StringList::npos [line 178](125, 157, 215) 
Allocating shadow map cache 6324 x 6324: 76.28 MB""")


# assemble regex from matching 1 decimal or float coord into 3 or 6
_one = r"([- ]?[\d ][\.\d]*)"
_sep = "[ ,] "
coord_3 = re.compile(_sep.join([_one]*3))
coord_6 = re.compile(_sep.join([_one]*6))

coords = []
for line in test:
    match = coord_6.search(line) or coord_3.search(line)
    if match is not None:
        print(match.groups())
        coords.append(" ".join(match.groups()))

for c in coords:
    print(c)
  • Related