Home > Mobile >  Change a column containing list of dict to columns in a DataFrame
Change a column containing list of dict to columns in a DataFrame

Time:12-27

I have the following DataFrame which contains a column that is a list of dict items:

    d = pd.DataFrame([
        ['Green', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]],
        ['Apply', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]],
        ['Range', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]],
        ['Peop',  [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]]
        ], columns=['Name', 'Legal Description'])

and I want to transform it to a simple DataFrame like so:

    d = pd.DataFrame([
        ['Green', 'STERLING GREEN SO', '01', 'L0038', 'B0008'],
        ['Apply', 'STERLING GREEN SO', '01', 'L0038', 'B0008'],
        ['Range', 'STERLING GREEN SO', '01', 'L0038', 'B0008'],
        ['Peop',  'STERLING GREEN SO', '01', 'L0038', 'B0008']
        ], columns=['Name', 'Legal Description', 'Desc', 'Sec', 'Lot', 'Block'])

CodePudding user response:

IMO, the ideal solution would be to act upstream and get a properly formatted dictionary or dataframe.

The issue with your list of single-keyed dictionaries is that you have to merge them. You can use a dictionary comprehension for that and convert to Series:

d2 = d['Legal Description'].apply(lambda c:
                                  pd.Series({next(iter(x.keys())).strip(':'):
                                             next(iter(x.values())) for x in c})
                                  )

Then join to the original dataframe:

d.drop(columns='Legal Description').join(d2)

output:

    Name               Desc Sec    Lot  Block
0  Green  STERLING GREEN SO  01  L0038  B0008
1  Apply  STERLING GREEN SO  01  L0038  B0008
2  Range  STERLING GREEN SO  01  L0038  B0008
3   Peop  STERLING GREEN SO  01  L0038  B0008

CodePudding user response:

If possible, you should wrangle your data before creating the DataFrame. It's faster than reshaping the DataFrame after being created. For instance, something like

data = [
    ['Green', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]],
    ['Apply', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]],
    ['Range', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]],
    ['Peop',  [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]]
]

records = []

for name, legal_desc in data:
    rec = {'Name': name}
    rec.update(x for d in legal_desc for x in d.items())
    records.append(rec)
    
d = pd.DataFrame(records)

Output:

>>> d

    Name              Desc: Sec:   Lot: Block:
0  Green  STERLING GREEN SO   01  L0038  B0008
1  Apply  STERLING GREEN SO   01  L0038  B0008
2  Range  STERLING GREEN SO   01  L0038  B0008
3   Peop  STERLING GREEN SO   01  L0038  B0008

>>> records 

[{'Name': 'Green', 'Desc:': 'STERLING GREEN SO', 'Sec:': '01', 'Lot:': 'L0038', 'Block:': 'B0008'}, {'Name': 'Apply', 'Desc:': 'STERLING GREEN SO', 'Sec:': '01', 'Lot:': 'L0038', 'Block:': 'B0008'}, {'Name': 'Range', 'Desc:': 'STERLING GREEN SO', 'Sec:': '01', 'Lot:': 'L0038', 'Block:': 'B0008'}, {'Name': 'Peop', 'Desc:': 'STERLING GREEN SO', 'Sec:': '01', 'Lot:': 'L0038', 'Block:': 'B0008'}]

CodePudding user response:

You can also use:

df.set_index('Name', inplace=True)
df =  df['Legal Description'].explode().apply(pd.Series).groupby(level=0).sum().reset_index()

OUTPUT

    Name              Desc: Sec:   Lot: Block:
0  Apply  STERLING GREEN SO   01  L0038  B0008
1  Green  STERLING GREEN SO   01  L0038  B0008
2   Peop  STERLING GREEN SO   01  L0038  B0008
3  Range  STERLING GREEN SO   01  L0038  B0008
  • Related