I have the following DataFrame which contains a column that is a list of dict items:
d = pd.DataFrame([
['Green', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]],
['Apply', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]],
['Range', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]],
['Peop', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]]
], columns=['Name', 'Legal Description'])
and I want to transform it to a simple DataFrame like so:
d = pd.DataFrame([
['Green', 'STERLING GREEN SO', '01', 'L0038', 'B0008'],
['Apply', 'STERLING GREEN SO', '01', 'L0038', 'B0008'],
['Range', 'STERLING GREEN SO', '01', 'L0038', 'B0008'],
['Peop', 'STERLING GREEN SO', '01', 'L0038', 'B0008']
], columns=['Name', 'Legal Description', 'Desc', 'Sec', 'Lot', 'Block'])
CodePudding user response:
IMO, the ideal solution would be to act upstream and get a properly formatted dictionary or dataframe.
The issue with your list of single-keyed dictionaries is that you have to merge them. You can use a dictionary comprehension for that and convert to Series:
d2 = d['Legal Description'].apply(lambda c:
pd.Series({next(iter(x.keys())).strip(':'):
next(iter(x.values())) for x in c})
)
Then join to the original dataframe:
d.drop(columns='Legal Description').join(d2)
output:
Name Desc Sec Lot Block
0 Green STERLING GREEN SO 01 L0038 B0008
1 Apply STERLING GREEN SO 01 L0038 B0008
2 Range STERLING GREEN SO 01 L0038 B0008
3 Peop STERLING GREEN SO 01 L0038 B0008
CodePudding user response:
If possible, you should wrangle your data before creating the DataFrame. It's faster than reshaping the DataFrame after being created. For instance, something like
data = [
['Green', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]],
['Apply', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]],
['Range', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]],
['Peop', [{'Desc:': 'STERLING GREEN SO'}, {'Sec:': '01'}, {'Lot:': 'L0038'}, {'Block:': 'B0008'}]]
]
records = []
for name, legal_desc in data:
rec = {'Name': name}
rec.update(x for d in legal_desc for x in d.items())
records.append(rec)
d = pd.DataFrame(records)
Output:
>>> d
Name Desc: Sec: Lot: Block:
0 Green STERLING GREEN SO 01 L0038 B0008
1 Apply STERLING GREEN SO 01 L0038 B0008
2 Range STERLING GREEN SO 01 L0038 B0008
3 Peop STERLING GREEN SO 01 L0038 B0008
>>> records
[{'Name': 'Green', 'Desc:': 'STERLING GREEN SO', 'Sec:': '01', 'Lot:': 'L0038', 'Block:': 'B0008'}, {'Name': 'Apply', 'Desc:': 'STERLING GREEN SO', 'Sec:': '01', 'Lot:': 'L0038', 'Block:': 'B0008'}, {'Name': 'Range', 'Desc:': 'STERLING GREEN SO', 'Sec:': '01', 'Lot:': 'L0038', 'Block:': 'B0008'}, {'Name': 'Peop', 'Desc:': 'STERLING GREEN SO', 'Sec:': '01', 'Lot:': 'L0038', 'Block:': 'B0008'}]
CodePudding user response:
You can also use:
df.set_index('Name', inplace=True)
df = df['Legal Description'].explode().apply(pd.Series).groupby(level=0).sum().reset_index()
OUTPUT
Name Desc: Sec: Lot: Block:
0 Apply STERLING GREEN SO 01 L0038 B0008
1 Green STERLING GREEN SO 01 L0038 B0008
2 Peop STERLING GREEN SO 01 L0038 B0008
3 Range STERLING GREEN SO 01 L0038 B0008