Home > OS >  How can I split a string in python that has multiple spaces between?
How can I split a string in python that has multiple spaces between?

Time:05-27

It's difficult for me to explain this within a title, so please allow me to do such here.

I'm working on a search interface for a utility I'm developing, one with google(ish) filters.

It works fine when there's only one filter in the query, but when there are two or more, problems appear.

So, let's say I have a query like intitle:foo bar inbody: boo far

As an example, while the first part makes it to the second part of the loop and is correctly interpreted as {intitle:foo bar}, the next one is printed out in the first part of the loop as foo bar inbody, followed by its value boo far

What should be happening is each filter should be recognized and isolated into its own pair (e.g. {intitle:foo bar} {inbody: bar foo})

Below is the code responsible for this problem.

def ParseFilters(query):
    filterVals = []

    if ":" in query:
        query = query.split(":")

        for part in query:
            # This is the first part of the loop
            print(part)
            if part in filters:
                # This is the second part of the loop
                listIndex = query.index(part)
                filtering = query[listIndex   1]

                for f in filters:
                    filtering = filtering.strip(f).lstrip()

                pair = {
                    part: filtering
                }
                
                print(pair)

                filterVals.append(pair)
    return filterVals

The "filters" table is

filters = [
    "intitle",
    "inbody"
]

CodePudding user response:

If I understand your requirements correctly. I would write something like this:

from collections import defaultdict

filters = [
    "intitle",
    "inbody"
]

query = 'intitle:foo bar inbody: boo far '

result = defaultdict(list)
current_filter = None
for elem in query.split():
    left, _, right = elem.partition(':')
    if left in filters:
        current_filter = left
        if right:
            result[current_filter].append(right)
    else:
        result[current_filter].append(left)

print(result)

Output:

defaultdict(<class 'list'>, {'intitle': ['foo', 'bar'], 'inbody': ['boo', 'far']})

In my opinion this is slightly more declarative and easier to make more robust in the future. You can experiment with it to make it meet your requirements. I suggest you check out str.partition, it is incredibly useful for a lot of stuff like this. And defaultdict works just like a dictionary.

CodePudding user response:

That's because when you do query.split(":") your program has no way of knowing that inbody is a filter and not part of intitle value. The best way would be to use Regular Expressions to find all filters and all values and store them in different lists (i.e.: query_filters and query_values) and then make a dict:

import re


filter_table = ["intitle", "inbody"]
query = "intitle:foo bar inbody: boo far"

# Create a regular expression to match filters
filters_re = re.compile(r"\s*[a-zA-Z] \:\s*")

# Find all filters
query_filters = filters_re.findall(query)
# Find all values by splitting query at the values matched by filters_re
query_values = filters_re.split(query)

# Cleaning up the strings
query_filters = map(lambda x: x.strip().replace(":", ""), query_filters)
query_values = map(lambda x: x.strip(), filter(None, query_values))

# Make pairs
filter_pairs = zip(query_filters, query_values)

# Remove filters that are not in filter_table
filter_pairs = filter(lambda x: x[0] in filter_table, filter_pairs)

filter_dict = dict(filter_pairs)

print(filter_dict)

Or, if you like one-liners:

import re

filter_table = ["intitle", "inbody"]
query = "intitle:foo bar inbody: boo far"

filter_dict = dict(filter(lambda x: x[0] in filter_table, zip(re.findall(r"[a-zA-Z] (?=\:)", query), map(lambda x: x.strip(), filter(None, re.split(r"\s*[a-zA-Z] \:\s*", query))))))

print(filter_dict)
  • Related