So I have a list of Data Stating the Place and Publisher of a Journal
The Data is given in a single Sentence in a List
['Place: Amsterdam Publisher: Elsevier Science Bv WOS:000179813800003' ,
'Place: Hanoi Publisher: Vietnam Acad Science & Technology-Vast WOS:000530921100003' ,
'Publisher: SAGE Publications Ltd',
'Place: London']
So as you can see in some strings Publisher is given but no place and some places it can be vice versa.
So I want the Output to be like in two lists
Places = ['Amsterdam','Hanoi','London']
Publishers = ['Elsevier Science',
'Vietnam Acad Science & Technology- Vast',
'SAGE Publications Ltd']
I am Using Python for this Data analysis..
I was thinking of using split() function to detect location of Place is written and chose the string next to it but it seems not to be working
My Code till Now
places=[]
for i in extrainfo : #E xtrainfo Name of Initial List
if ('Place') in i :
z=i
i=i.split()
counter=0
for q in i :
if q=='Place' :
break
counter=counter 1
places=pleaces z[counter 1]
print(places)
CodePudding user response:
- split on colons
':'
usings.split(':')
; - discard trailing whitespace using
s.strip()
; - if one of the split substrings ends with
'Publisher'
or'Place'
, add the next substring to the relevant list; - some of the substrings added to the lists will end with
'Place'
or'Publisher'
: take care of that usings.removesuffix('Place').removesuffix('Publisher')
.
from itertools import pairwise # python>=3.10
# from itertools import tee
# def pairwise(iterable):
# "s -> (s0,s1), (s1,s2), (s2, s3), ..."
# a, b = tee(iterable)
# next(b, None)
# return zip(a, b)
data = ['Place: Amsterdam Publisher: Elsevier Science Bv WOS:000179813800003' , 'Place: Hanoi Publisher: Vietnam Acad Science & Technology-Vast WOS:000530921100003' , 'Publisher: SAGE Publications Ltd','Place: London']
things = {'Place': [], 'Publisher': [], 'WOS': []}
for sentence in data:
for k, v in pairwise(map(str.strip, sentence.split(':'))):
for cat in things:
if k.endswith(cat):
for suffix in things:
v = v.removesuffix(suffix).strip()
things[cat].append(v)
break
print(things)
# {'Place': ['Amsterdam', 'Hanoi', 'London'],
# 'Publisher': ['Elsevier Science Bv', 'Vietnam Acad Science & Technology-Vast', 'SAGE Publications Ltd'],
# 'WOS': ['000179813800003', '000530921100003']}
CodePudding user response:
Solution with re
module:
import re
lst = [
"Place: Amsterdam Publisher: Elsevier Science Bv WOS:000179813800003",
"Place: Hanoi Publisher: Vietnam Acad Science & Technology-Vast WOS:000530921100003",
"Publisher: SAGE Publications Ltd",
"Place: London",
]
places = [
m.group(1)
for i in lst
if (m := re.search(r"Place: (.*?)\s*(?:Publisher|$)", i))
]
publishers = [
m.group(1)
for i in lst
if (m := re.search(r"Publisher: (.*?)\s*(?:WOS|$)", i))
]
print(places)
print(publishers)
Prints:
['Amsterdam', 'Hanoi', 'London']
['Elsevier Science Bv', 'Vietnam Acad Science & Technology-Vast', 'SAGE Publications Ltd']