Creating a dictionary in python from a pandas data frame-CodePudding

I have a pandas dataframe with three columns Lot Number, Price and Image Id. I have to create a JSON file with the following format

{'1200-1300':{'LOT3551': [9082327, 9082329],
              'LOT3293':[982832, 898762, 887654]
             },
 '1300-1400': {'LOT2219': [776542, 119234]
              }
}

where the first level keys in the dictionary i.e. '1200-1300', '1300-1400' etc. are the price ranges. The keys inside the price ranges are Lot Numbers that belong in the price range and their values are the Values from Image Id column.

I have tried the following code till now

for idx, gid_list in enumerate(df['AV Gid']):
    data = df.iloc[idx]
    lot_no = data['Lot Number']
    price = data['Final Price']
    gids = gid_list.replace("[","").replace("]","").split(",")
    if price >= 1000 and price < 1100:
        pr = '10-11'
    elif price >= 1100 and price < 1200:
        pr = '11-12'
    else:
        continue
    print(pr)
        
    if lot_no in sample_dict[pr]:
        sampe_dict[pr][lot_no].append(gid)
    else:
        #print(pr)
        sample_dict[pr][lot_no] = []

where the sample_dict has keys as the price range. The problem with the above code is that it is filling the other price ranges keys as well.

CodePudding user response：

If df is your dataframe you could try:

data = {
    f"{p}-{p   100}": ser.to_dict()
    for p, ser in df.assign(Price=df["Price"].floordiv(100).mul(100))
                    .set_index("Lot Number")
                    .groupby("Price")["Image Id"]
}

Replace column Price with its .floordiv(100).mul(100) equivalent
Set column Lot Number as index
.groupby() the resulting dataframe by column Price, grab the column Image Id as series, and place the results in a dictionary with:
- strings f"{p}-{p 100}" as keys (p is the floored price of the group), and
- the group series converted to a dictionary as values

Result for

data = {"Lot Number":["LOT1", "LOT2", "LOT3", "LOT4", "LOT5", "LOT6"],
        "Price": [1200, 1250, 10, 20, 30, 1300],
        "Image Id": [list(range(n)) for n in range(1, 7)]}
df = pd.DataFrame(data)

  Lot Number  Price            Image Id
0       LOT1   1200                 [0]
1       LOT2   1250              [0, 1]
2       LOT3     10           [0, 1, 2]
3       LOT4     20        [0, 1, 2, 3]
4       LOT5     30     [0, 1, 2, 3, 4]
5       LOT6   1300  [0, 1, 2, 3, 4, 5]

{'0-100': {'LOT3': [0, 1, 2], 'LOT4': [0, 1, 2, 3], 'LOT5': [0, 1, 2, 3, 4]},
 '1200-1300': {'LOT1': [0], 'LOT2': [0, 1]},
 '1300-1400': {'LOT6': [0, 1, 2, 3, 4, 5]}}

You could do the same in one Pandas-go:

data = (
    df.assign(
        Price=df["Price"].floordiv(100).mul(100).map(lambda p: f"{p}-{p   100}")
    )
    .set_index("Lot Number")
    .groupby("Price")["Image Id"]
    .agg(dict)
    .to_dict()
)

CodePudding user response：

I would do something like

price_ranges = {'10-11': [1000, 1099], '11-12': [1100, 1199], '0-10': [0, 999]}

sample_dict = dict.fromkeys(price_ranges.keys(), {})

def look_for_range(price, price_ranges=price_ranges):
    for label, (low, high) in price_ranges.items():
        if low <= price <= high:
            return label
    
def compose_range_dict(row, sample_dict = sample_dict):
    range_label = look_for_range(row['PRICE'])
    if range_label is not None:
        sample_dict[range_label].update({row['LOTNUMBER']: row['IMAGE_ID']})

Then

import pandas as pd

# dictionary of lists
testdict = {'LOTNUMBER':['LOT3551', 'LOT3520', 'LOT3574', 'LOT3572'],
            'PRICE': [1250, 1150, 10, 900],
            'IMAGE_ID':[[9082327, 9082328, 9082329],
                        [9081865, 9081866, 9081867], 
                        [9083230, 9083231, 9083232],
                        [9082985, 9082986, 9082988]]}
 
testdf = pd.DataFrame(testdict)

testdf.apply(compose_range_dict, axis = 1)

# >>> sample_dict
# {'10-11': {'LOT3520': [9081865, 9081866, 9081867], 'LOT3574': [9083230, 9083231, 9083232], 'LOT3572': [9082985, 9082986, 9082988]}, 
# '11-12': {'LOT3520': [9081865, 9081866, 9081867], 'LOT3574': [9083230, 9083231, 9083232], 'LOT3572': [9082985, 9082986, 9082988]}, 
# '0-10': {'LOT3520': [9081865, 9081866, 9081867], 'LOT3574': [9083230, 9083231, 9083232], 'LOT3572': [9082985, 9082986, 9082988]}}