Home > Back-end >  Using regex in python for a dynamic string
Using regex in python for a dynamic string

Time:02-11

I have a pandas columns with strings which dont have the same pattern, something like this:

{'iso_2': 'FR', 'iso_3': 'FRA', 'name': 'France'}
{'iso': 'FR', 'iso_2': 'USA', 'name': 'United States of America'}
{'iso_3': 'FR', 'iso_4': 'FRA', 'name': 'France'}

How do I only keep the name of the country for every row? I would only like to keep "France", "United States of America", "France".

I tried building the regex pattern: something like this

r"^\W [a-z] _[0-9]\W "

But this turns out to be very specific, and if there is a slight change in the string the pattern wont work. How do we resolve this?

CodePudding user response:

As you have dictionaries in the column, you can get the values of the name keys:

import pandas as pd
df = pd.DataFrame({'col':[{'iso_2': 'FR', 'iso_3': 'FRA', 'name': 'France'},
    {'iso': 'FR', 'iso_2': 'USA', 'name': 'United States of America'},
    {'iso_3': 'FR', 'iso_4': 'FRA', 'name': 'France'}]})
df['col'] = df['col'].apply(lambda x: x["name"])

Output of df['col']:

0                      France
1    United States of America
2                      France
Name: col, dtype: object

If the column contains stringified dictionaries, you can use ast.literal_eval before accessing the name key value:

import pandas as pd
import ast

df = pd.DataFrame({'col':["{'iso_2': 'FR', 'iso_3': 'FRA', 'name': 'France'}",
    "{'iso': 'FR', 'iso_2': 'USA', 'name': 'United States of America'}",
    "{'iso_3': 'FR', 'iso_4': 'FRA', 'name': 'France'}"]})

df['col'] = df['col'].apply(lambda x: ast.literal_eval(x)["name"])

And in case your column is totally messed up, yes, you can resort to regex:

df['col'] = df['col'].str.extract(r"""['"]name['"]\s*:\s*['"]([^"'] )""")
# or to support escaped " and ':
df['col'] = df['col'].str.extract(r"""['"]name['"]\s*:\s*['"]([^"'\\] (?:\\.[^'"\\]*)*)""")>>> df['col']
                          0
0                    France
1  United States of America
2                    France

See the regex demo.

  • Related