Home > Software engineering >  Python regex removing string from a list of dictionaries
Python regex removing string from a list of dictionaries

Time:12-22

I have the following list of dictionaries

d =
[
    {
        "Business": "Company A",
        "Category": "Supply Chain",
        "Date": "Posted Date\nDecember 21 2021",
    },
    {
        "Business": "Company B",
        "Category": "Manufacturing",
        "Date": "Posted Date\nDecember 21 2021",
    }
]

I'm trying to use re to remove the Posted Date\n string from the dictionaries but getting the following error:

TypeError: expected string or bytes-like object

My code is the following:

regex = re.compile('Posted Date\n')
filtered = [i for i in d if not regex.match(i)]
print(filtered)

If I do the same on a normal list of strings with no dictionaries it's working. Would I have to convert my dictionaries into strings first?

Thanks!

CodePudding user response:

Assuming that d is the list of dictionaries, then you're looping over the dictionaries themselves. So for the first iteration:

i = {
    "Business": "Company A",
    "Category": "Supply Chain",
    "Date": "Posted Date\nDecember 21 2021",
}

And indeed, you cannot use regex on a dictionary. You would need to go deeper and loop over the key and values in the dictionary. But that can also cause RunTimeErrors if you're changing the dictionary while looping.

import re

d = [{
    "Business": "Company A",
    "Category": "Supply Chain",
    "Date": "Posted Date\nDecember 21 2021",
}, {
    "Business": "Company B",
    "Category": "Manufacturing",
    "Date": "Posted Date\nDecember 21 2021",
}]

regex = re.compile('Posted Date\n')

for dikt in d:
    for key, value in list(dikt.items()):  # make a list to prevent RuntimeError
        if regex.match(value): 
            del dikt[key]

This would omit the Date key entirely:

d = [{
    "Business": "Company A",
    "Category": "Supply Chain",
}, {
    "Business": "Company B",
    "Category": "Manufacturing",
}]

If you just want to get rid of the "Posted Date\n", this suffices:

d = [{
    "Business": "Company A",
    "Category": "Supply Chain",
    "Date": "Posted Date\nDecember 21 2021",
}, {
    "Business": "Company B",
    "Category": "Manufacturing",
    "Date": "Posted Date\nDecember 21 2021",
}]


for dikt in d:
    for key, value in dikt.items():
        dikt[key] = value.replace('Posted Date\n', '') # replace string from all our values stupidly :)

Result:

d = [{
    "Business": "Company A",
    "Category": "Supply Chain",
    "Date": "December 21 2021",
}, {
    "Business": "Company B",
    "Category": "Manufacturing",
    "Date": "December 21 2021",
}]
  • Related