I have a web scrapped string containing key value pairs i.e firstName:"Quaran", lastName:"McPherson"
st = '{"accountId":405266,"firstName":"Quaran","lastName":"McPherson","accountIdentifier":"StudentAthlete","profilePicUrl":"https://pbs.twimg.com/profile_images/1331475329014181888/4z19KrCf.jpg","networkProfileCode":"quaran-mcpherson","hasDeals":true,"activityMin":11,"sports":["Men\'s Basketball","Basketball"],"currentTeams":["Nebraska Cornhuskers"],"previousTeams":[],"facebookReach":null,"twitterReach":619,"instagramReach":0,"linkedInReach":null},{"accountId":375964,"firstName":"Micole","lastName":"Cayton","accountIdentifier":"StudentAthlete","profilePicUrl":"https://opendorsepr.blob.core.windows.net/media/375964/20220622223838_46dbe3fd-a683-436b-84d4-90c84a5af35f.jpg","networkProfileCode":"micole-cayton","hasDeals":true,"activityMin":16,"sports":["Basketball","Women\'s Basketball"],"currentTeams":["Minnesota Golden Gophers"],"previousTeams":["Cal Berkeley Golden Bears"],"facebookReach":0,"twitterReach":1273,"instagramReach":5700,"linkedInReach":null}'
I am trying to extract the first_name, last_name and a few other parameters from this string in list format such that I will be having a first_name list with all first_names from the string
I tried using re.findall('"firstName":'"(.*)\S$",st)
to access the text "Quaran"
but result is coming in the following format
'"Quaran","lastName":"McPherson","accountIdentifier":"StudentAthlete","profilePicUrl":"https://pbs.twimg.com/profile_images/1331475329014181888/4z19KrCf.jpg","networkProfileCode":"quaran-mcpherson","hasDeals":true,"activityMin":11,"sports":["Men\'s Basketball","Basketball"],"currentTeams":["Nebraska Cornhuskers"],"previousTeams":[],"facebookReach":null,"twitterReach":619,"instagramReach":0,"linkedInReach":null}
how do I end the specify within the regex to end the search at the end of the name in quotes??
TIA
CodePudding user response:
Try this regex (?<=\"firstName\":\").*?(?=\")
. The ? in the middle makes it a lazy match, so that it stops matching as soon as it finds a " character.
CodePudding user response:
Your string seems JSON array, you can easily parse json in any language if it's valid. To make your string valid add '[' at first and ']' at last of your string then parse the JSON in your language. Such as
JavaScript:
JSON.parse(st)
Python:
import json
dict = json.loads(st)
Regular expression:
if you strictly wish to parse using regular expression use:
/(?:\"|\')(?<key>[\w\d] )(?:\"|\')(?:\:\s*)(?:\"|\')?(?<value>[\w\s-]*)(?:\"|\')?/gm
CodePudding user response:
Try this:
(?<="firstName":")[^"\r\n]
(?<="firstName":")
go to the point where "firstName":"
appeasrs in the string,
[^"\r\n]
then match one or more character except "
, \r
and \n
. not to cross the second double quote of the firstName value and not to cross any newline.
See regex demo.
See python demo.