Group values within a key through a list-CodePudding

I wanted to group values from a key into a list of values by splitting them up after each quotation mark that appears. So each landmark is assigned a list of locations, for example:

{'Landmarks': ['Great norwich', 'Larger building', 'Leaning building'],
 'location':  ['North 28th Street',
               'Stadium High School',
               '',
               'Charles Bridge',
               "St Vitus' Cathedral",
               '',
               'All Saints Church',
               'Royal Courts of Justice',
               'Lonsdale Road']
}

Expected output:

as dictionary:

{'Landmarks': ['Great norwich', 'Larger building', 'Leaning building'],
 'location':  [['North 28th Street', 'Stadium High School'],
               ['Charles Bridge', "St Vitus' Cathedral"],
               ['All Saints Church', 'Royal Courts of Justice', 'Lonsdale Road']]
}

or as DataFrame:

Landmarks               Locations
Great norwich           ['North 28th Street','Stadium High School']
Larger building         ['Charles Bridge','St Vitus' Cathedral',]
Leaning building        ['All Saints Church','Royal Courts of Justice','Lonsdale Road']

I have tried:

pd.DataFrame(data)

I get the error ->

arrays must all be same length

I can remove the quotation marks with:

for v in data.values():
    if ('') in v:
        v.remove((''))

Though how do I assign all values before the quotation mark as a list before removing, to get the expected value above?

CodePudding user response：

Here is one approach to split the list on ''. It is using a short list comprehension withitertools.groupby and an assignment expression (python ≥ 3.8)

from itertools import groupby
d2 = d.copy()
d2['location'] = [G for _,g in groupby(d['location'], ''.__eq__)
                  if (G:=list(g)) != ['']]

version for python < 3.8:

from itertools import groupby
d2 = d.copy()
d2['location'] = [G for _,g in groupby(d['location'], ''.__eq__)
                  for G in [list(g)] if G != ['']]

output:

{'Landmarks': ['Great norwich', 'Larger building', 'Leaning building'],
 'location': [['North 28th Street', 'Stadium High School'],
  ['Charles Bridge', "St Vitus' Cathedral"],
  ['All Saints Church', 'Royal Courts of Justice', 'Lonsdale Road']]}

as DataFrame:

>>> pd.DataFrame(d2)
          Landmarks                                                     location
0     Great norwich                     [North 28th Street, Stadium High School]
1   Larger building                        [Charles Bridge, St Vitus' Cathedral]
2  Leaning building  [All Saints Church, Royal Courts of Justice, Lonsdale Road]

used input:

d = {'Landmarks': ['Great norwich', 'Larger building', 'Leaning building'],
     'location': ['North 28th Street',
                  'Stadium High School',
                  '',
                  'Charles Bridge',
                  "St Vitus' Cathedral",
                  '',
                  'All Saints Church',
                  'Royal Courts of Justice',
                  'Lonsdale Road']}