Extract a dict cell and transform. Expand a dict cell into multiple rows-CodePudding

I have a table with a dataset something like the following. Its 10 cols and 30 rows. The order of the records in the 'names' dictionary cell is important. There can be more than 1 of each Title.

    data =[
            ['4/18/2005',   [{'grantor': 'Company1'}, {'grantee': 'Company2'}]],
            ['3/29/2005',   [{'grantor': 'Company3'}, {'grantor': 'Company1'}, {'grantor': 'Company4'}, {'grantee': 'Company5'}, {'grantee': 'Company2'}]],
            ['3/29/2005',   [{'grantor': 'Company2'}, {'grantor': 'Company9'}, {'grantor': 'Apple'}, {'grantee': 'CompnayX'}, {'grantee': 'CompanyY'}, {'grantee': 'CompanyR'}]]
        ]
    df = pd.DataFrame(data=data, columns=['fdate', 'names'])

I would like to do the 2 following tasks:

Process a single row. I would like to read the Names cell for a selected row, and convert it into something like this (row 1)

    data = [
            [{'Title': 'grantor', 'Company': 'Company1'}],
            [{'Title': 'grantee', 'Company': 'Company2'}]
           ]
    df = pd.DataFrame(data)

Expand the entire data set

    data = [
            [{'fdate': '4/18/2005', 'pos':'1', 'Title': 'grantor', 'Company': 'Company1'}],
            [{'fdate': '4/18/2005', 'pos':'2', 'Title': 'grantee', 'Company': 'Company2'}],
            [{'fdate': '3/29/2005', 'pos':'1', 'Title': 'grantor', 'Company': 'Company3'}], 
            [{'fdate': '3/29/2005', 'pos':'2', 'Title': 'grantor', 'Company': 'Company1'}], 
            [{'fdate': '3/29/2005', 'pos':'3', 'Title': 'grantor', 'Company': 'Company4'}], 
            [{'fdate': '3/29/2005', 'pos':'4', 'Title': 'grantor', 'Company': 'Company5'}], 
            [{'fdate': '3/29/2005', 'pos':'5', 'Title': 'grantee', 'Company': 'Company2'}]
        ]
    df = pd.DataFrame(data)

CodePudding user response：

Here is my attempt at a solution:

I have labelled the parts where Task 1 is complete

import pprint
import pandas as pd

#Task 1
def process_row(datum, i):
    ret = []
    new_dict = {}  
    information = datum[i][1]
    for info in information:
        for (key, value) in info.items():
            new_dict['Title'] = key
            new_dict['Company'] = value
        ret.append([new_dict.copy()])
        new_dict.clear()
    return ret

#Need to create two dicts for Task 2
def process_date(datum, i):
    return {'fdate': datum[i][0]}


pd.set_option('display.max_colwidth', None)
data =[
            ['4/18/2005',   [{'grantor': 'Company1'}, {'grantee': 'Company2'}]],
            ['3/29/2005',   [{'grantor': 'Company3'}, {'grantor': 'Company1'}, {'grantor': 'Company4'}, {'grantee': 'Company5'}, {'grantee': 'Company2'}]],
            ['3/29/2005',   [{'grantor': 'Company2'}, {'grantor': 'Company9'}, {'grantor': 'Apple'}, {'grantee': 'CompnayX'}, {'grantee': 'CompanyY'}, {'grantee': 'CompanyR'}]]
        ]


#This line prints the result for Task 1.
#You should iterate if you need the function to apply to other rows.
print(process_row(data, 0))
#print(process_date(data, 0))

new_data = []
for i in range(len(data)):
    temp = process_row(data, i)
    
    for index, element in enumerate(temp):
        new_dict ={}
        new_dict.update(process_date(data, i))
        new_dict.update({'pos': index 1})
        for kv in element:
            new_dict.update(kv)
        new_data.append([new_dict])


pp=pprint.PrettyPrinter(indent=2)
pp.pprint(new_data)

Here is a picture of the output

The first line is the output for Task 1. The rest of the output is for Task 2, note that the it includes the third element in the original data as well.

Output: https://imgur.com/a/M6TvO70