I have a pandas dataframe with three columns Lot Number
, Price
and Image Id
. I have to create a JSON file with the following format
{'1200-1300':{'LOT3551': [9082327, 9082329],
'LOT3293':[982832, 898762, 887654]
},
'1300-1400': {'LOT2219': [776542, 119234]
}
}
where the first level keys in the dictionary i.e. '1200-1300', '1300-1400' etc. are the price ranges. The keys inside the price ranges are Lot Numbers that belong in the price range and their values are the Values from Image Id column.
I have tried the following code till now
for idx, gid_list in enumerate(df['AV Gid']):
data = df.iloc[idx]
lot_no = data['Lot Number']
price = data['Final Price']
gids = gid_list.replace("[","").replace("]","").split(",")
if price >= 1000 and price < 1100:
pr = '10-11'
elif price >= 1100 and price < 1200:
pr = '11-12'
else:
continue
print(pr)
if lot_no in sample_dict[pr]:
sampe_dict[pr][lot_no].append(gid)
else:
#print(pr)
sample_dict[pr][lot_no] = []
where the sample_dict has keys as the price range. The problem with the above code is that it is filling the other price ranges keys as well.
CodePudding user response:
If df
is your dataframe you could try:
data = {
f"{p}-{p 100}": ser.to_dict()
for p, ser in df.assign(Price=df["Price"].floordiv(100).mul(100))
.set_index("Lot Number")
.groupby("Price")["Image Id"]
}
- Replace column
Price
with its.floordiv(100).mul(100)
equivalent - Set column
Lot Number
as index .groupby()
the resulting dataframe by columnPrice
, grab the columnImage Id
as series, and place the results in a dictionary with:- strings
f"{p}-{p 100}"
as keys (p
is the floored price of the group), and - the group series converted to a dictionary as values
- strings
Result for
data = {"Lot Number":["LOT1", "LOT2", "LOT3", "LOT4", "LOT5", "LOT6"],
"Price": [1200, 1250, 10, 20, 30, 1300],
"Image Id": [list(range(n)) for n in range(1, 7)]}
df = pd.DataFrame(data)
Lot Number Price Image Id
0 LOT1 1200 [0]
1 LOT2 1250 [0, 1]
2 LOT3 10 [0, 1, 2]
3 LOT4 20 [0, 1, 2, 3]
4 LOT5 30 [0, 1, 2, 3, 4]
5 LOT6 1300 [0, 1, 2, 3, 4, 5]
is
{'0-100': {'LOT3': [0, 1, 2], 'LOT4': [0, 1, 2, 3], 'LOT5': [0, 1, 2, 3, 4]},
'1200-1300': {'LOT1': [0], 'LOT2': [0, 1]},
'1300-1400': {'LOT6': [0, 1, 2, 3, 4, 5]}}
You could do the same in one Pandas-go:
data = (
df.assign(
Price=df["Price"].floordiv(100).mul(100).map(lambda p: f"{p}-{p 100}")
)
.set_index("Lot Number")
.groupby("Price")["Image Id"]
.agg(dict)
.to_dict()
)
CodePudding user response:
I would do something like
price_ranges = {'10-11': [1000, 1099], '11-12': [1100, 1199], '0-10': [0, 999]}
sample_dict = dict.fromkeys(price_ranges.keys(), {})
def look_for_range(price, price_ranges=price_ranges):
for label, (low, high) in price_ranges.items():
if low <= price <= high:
return label
def compose_range_dict(row, sample_dict = sample_dict):
range_label = look_for_range(row['PRICE'])
if range_label is not None:
sample_dict[range_label].update({row['LOTNUMBER']: row['IMAGE_ID']})
Then
import pandas as pd
# dictionary of lists
testdict = {'LOTNUMBER':['LOT3551', 'LOT3520', 'LOT3574', 'LOT3572'],
'PRICE': [1250, 1150, 10, 900],
'IMAGE_ID':[[9082327, 9082328, 9082329],
[9081865, 9081866, 9081867],
[9083230, 9083231, 9083232],
[9082985, 9082986, 9082988]]}
testdf = pd.DataFrame(testdict)
testdf.apply(compose_range_dict, axis = 1)
# >>> sample_dict
# {'10-11': {'LOT3520': [9081865, 9081866, 9081867], 'LOT3574': [9083230, 9083231, 9083232], 'LOT3572': [9082985, 9082986, 9082988]},
# '11-12': {'LOT3520': [9081865, 9081866, 9081867], 'LOT3574': [9083230, 9083231, 9083232], 'LOT3572': [9082985, 9082986, 9082988]},
# '0-10': {'LOT3520': [9081865, 9081866, 9081867], 'LOT3574': [9083230, 9083231, 9083232], 'LOT3572': [9082985, 9082986, 9082988]}}