I wanted to group values from a key into a list of values by splitting them up after each quotation mark that appears. So each landmark is assigned a list of locations, for example:
{'Landmarks': ['Great norwich', 'Larger building', 'Leaning building'],
'location': ['North 28th Street',
'Stadium High School',
'',
'Charles Bridge',
"St Vitus' Cathedral",
'',
'All Saints Church',
'Royal Courts of Justice',
'Lonsdale Road']
}
Expected output:
as dictionary:
{'Landmarks': ['Great norwich', 'Larger building', 'Leaning building'],
'location': [['North 28th Street', 'Stadium High School'],
['Charles Bridge', "St Vitus' Cathedral"],
['All Saints Church', 'Royal Courts of Justice', 'Lonsdale Road']]
}
or as DataFrame:
Landmarks Locations
Great norwich ['North 28th Street','Stadium High School']
Larger building ['Charles Bridge','St Vitus' Cathedral',]
Leaning building ['All Saints Church','Royal Courts of Justice','Lonsdale Road']
I have tried:
pd.DataFrame(data)
I get the error ->
arrays must all be same length
I can remove the quotation marks with:
for v in data.values():
if ('') in v:
v.remove((''))
Though how do I assign all values before the quotation mark as a list before removing, to get the expected value above?
CodePudding user response:
Here is one approach to split the list on ''
. It is using a short list comprehension withitertools.groupby
and an assignment expression (python ≥ 3.8)
from itertools import groupby
d2 = d.copy()
d2['location'] = [G for _,g in groupby(d['location'], ''.__eq__)
if (G:=list(g)) != ['']]
version for python < 3.8:
from itertools import groupby
d2 = d.copy()
d2['location'] = [G for _,g in groupby(d['location'], ''.__eq__)
for G in [list(g)] if G != ['']]
output:
{'Landmarks': ['Great norwich', 'Larger building', 'Leaning building'],
'location': [['North 28th Street', 'Stadium High School'],
['Charles Bridge', "St Vitus' Cathedral"],
['All Saints Church', 'Royal Courts of Justice', 'Lonsdale Road']]}
as DataFrame:
>>> pd.DataFrame(d2)
Landmarks location
0 Great norwich [North 28th Street, Stadium High School]
1 Larger building [Charles Bridge, St Vitus' Cathedral]
2 Leaning building [All Saints Church, Royal Courts of Justice, Lonsdale Road]
used input:
d = {'Landmarks': ['Great norwich', 'Larger building', 'Leaning building'],
'location': ['North 28th Street',
'Stadium High School',
'',
'Charles Bridge',
"St Vitus' Cathedral",
'',
'All Saints Church',
'Royal Courts of Justice',
'Lonsdale Road']}