So basically I,m looking for a regex expression that gives you everything within a string. As example if your data looks like
"name":"fffff fff", "ID":"1234", "name":"matt lam","ID":"1255"etc
How can you make a regex expression that gives you everything within the string?
Currently i have:
r'ID"\d '
which works for the ID part, but im looking for an expression that is reusable, something like r'"name":"[gives u everything within this string]"
CodePudding user response:
Your content almost looks like a JSON fragment, except that the same key name appears multiple times. If so, you might want to try to use a JSON parser. If you must proceed with pure regex, then re.findall
might be one option:
inp = '"name":"fffff fff", "ID":"1234", "name":"matt lam","ID":"1255"'
names = re.findall(r'"name":"(.*?)"', inp)
print(names) # ['fffff fff', 'matt lam']
CodePudding user response:
Is this what you want?
import regex
pattern = 'name:(.*)|ID:(.*)'
string_1 = 'name:matt lam'
string_2 = 'ID:1255'
for string in [string_1, string_2]:
for value in regex.findall(pattern, string):
typeIx = list(map(bool, value)).index(True)
print(value[typeIx])
Output
matt lan #typeIx = 0 (name), value = "matt lan"
1255 #typeIx = 1 (ID), value = "1255"
CodePudding user response:
If you want to turn your string into a clean pandas data frame, you could do something like this (assuming your input is well-formatted, i.e. '"name":"name_value", "ID":"ID_value", ...'
, no extra "
or :
):
import re
import pandas as pd
inp = '"name":"fffff fff", "ID":"1234", "name":"matt lam","ID":"1255"'
inp = inp.replace('"', '') # remove all double quotes
inp_list = re.split(', |:|,', inp) # split on comma, comma whitespace, colon and put elements in list
name_lst = inp_list[1::4] # get every 4th element, starting from second (all name values)
ID_lst = inp_list[3::4] # get every 4th elements, starting from fourth (all ID values)
d = {'name':name_lst,'ID':ID_lst} # Combine lists into dict
df = pd.DataFrame(d) # Turn dict into dataframe
df