Python Regex expression within a string-CodePudding

So basically I,m looking for a regex expression that gives you everything within a string. As example if your data looks like

"name":"fffff fff", "ID":"1234", "name":"matt lam","ID":"1255"etc

How can you make a regex expression that gives you everything within the string? Currently i have: r'ID"\d ' which works for the ID part, but im looking for an expression that is reusable, something like r'"name":"[gives u everything within this string]"

CodePudding user response：

Your content almost looks like a JSON fragment, except that the same key name appears multiple times. If so, you might want to try to use a JSON parser. If you must proceed with pure regex, then re.findall might be one option:

inp = '"name":"fffff fff", "ID":"1234", "name":"matt lam","ID":"1255"'
names = re.findall(r'"name":"(.*?)"', inp)
print(names)  # ['fffff fff', 'matt lam']

CodePudding user response：

Is this what you want?

import regex 

pattern = 'name:(.*)|ID:(.*)'
string_1 = 'name:matt lam'
string_2 = 'ID:1255'
for string in [string_1, string_2]:
  for value in regex.findall(pattern, string):
    typeIx = list(map(bool, value)).index(True)
    print(value[typeIx])

Output

matt lan #typeIx = 0 (name), value = "matt lan"
1255 #typeIx = 1 (ID), value = "1255"

CodePudding user response：

If you want to turn your string into a clean pandas data frame, you could do something like this (assuming your input is well-formatted, i.e. '"name":"name_value", "ID":"ID_value", ...', no extra " or :):

import re
import pandas as pd


inp = '"name":"fffff fff", "ID":"1234", "name":"matt lam","ID":"1255"'

inp = inp.replace('"', '') # remove all double quotes

inp_list = re.split(', |:|,', inp) # split on comma, comma   whitespace, colon and put elements in list

name_lst = inp_list[1::4] # get every 4th element, starting from second (all name values)
ID_lst = inp_list[3::4] # get every 4th elements, starting from fourth (all ID values)

d = {'name':name_lst,'ID':ID_lst} # Combine lists into dict

df = pd.DataFrame(d) # Turn dict into dataframe

df