Dataframe for example:
data =[['a056cf7d3aeadfc7c4d5d6e37f424554', 11089528586286, [{'rate': 0.0625, 'price': '0.52', 'title': 'IL STATE TAX', 'price_set': {'shop_money': {'amount': '0.52', 'currency_code': 'USD'}, 'presentment_money': {'amount': '0.52', 'currency_code': 'USD'}}, 'channel_liable': False}, {'rate': 0.0, 'price': '0.00', 'title': 'IL COUNTY TAX', 'price_set': {'shop_money': {'amount': '0.00', 'currency_code': 'USD'}, 'presentment_money': {'amount': '0.00', 'currency_code': 'USD'}}, 'channel_liable': False}, {'rate': 0.0, 'price': '0.00', 'title': 'IL CITY TAX', 'price_set': {'shop_money': {'amount': '0.00', 'currency_code': 'USD'}, 'presentment_money': {'amount': '0.00', 'currency_code': 'USD'}}, 'channel_liable': False}, {'rate': 0.0, 'price': '0.00', 'title': 'IL SPECIAL TAX', 'price_set': {'shop_money': {'amount': '0.00', 'currency_code': 'USD'}, 'presentment_money': {'amount': '0.00', 'currency_code': 'USD'}}, 'channel_liable': False}]], ['a056cf7d3aeadfc7c4d5d6e37f424554', 11089528651822, [{'rate': 0.0625, 'price': '0.75', 'title': 'IL STATE TAX', 'price_set': {'shop_money': {'amount': '0.75', 'currency_code': 'USD'}, 'presentment_money': {'amount': '0.75', 'currency_code': 'USD'}}, 'channel_liable': False}, {'rate': 0.0, 'price': '0.00', 'title': 'IL COUNTY TAX', 'price_set': {'shop_money': {'amount': '0.00', 'currency_code': 'USD'}, 'presentment_money': {'amount': '0.00', 'currency_code': 'USD'}}, 'channel_liable': False}, {'rate': 0.0, 'price': '0.00', 'title': 'IL CITY TAX', 'price_set': {'shop_money': {'amount': '0.00', 'currency_code': 'USD'}, 'presentment_money': {'amount': '0.00', 'currency_code': 'USD'}}, 'channel_liable': False}, {'rate': 0.0, 'price': '0.00', 'title': 'IL SPECIAL TAX', 'price_set': {'shop_money': {'amount': '0.00', 'currency_code': 'USD'}, 'presentment_money': {'amount': '0.00', 'currency_code': 'USD'}}, 'channel_liable': False}]], ['a056cf7d3aeadfc7c4d5d6e37f424554', 11089528717358, [{'rate': 0.0625, 'price': '0.48', 'title': 'IL STATE TAX', 'price_set': {'shop_money': {'amount': '0.48', 'currency_code': 'USD'}, 'presentment_money': {'amount': '0.48', 'currency_code': 'USD'}}, 'channel_liable': False}, {'rate': 0.0, 'price': '0.00', 'title': 'IL COUNTY TAX', 'price_set': {'shop_money': {'amount': '0.00', 'currency_code': 'USD'}, 'presentment_money': {'amount': '0.00', 'currency_code': 'USD'}}, 'channel_liable': False}, {'rate': 0.0, 'price': '0.00', 'title': 'IL CITY TAX', 'price_set': {'shop_money': {'amount': '0.00', 'currency_code': 'USD'}, 'presentment_money': {'amount': '0.00', 'currency_code': 'USD'}}, 'channel_liable': False}, {'rate': 0.0, 'price': '0.00', 'title': 'IL SPECIAL TAX', 'price_set': {'shop_money': {'amount': '0.00', 'currency_code': 'USD'}, 'presentment_money': {'amount': '0.00', 'currency_code': 'USD'}}, 'channel_liable': False}]], ['a8911732ef329fccffd45fd4ff945b36', 11805558440126, [{'rate': 0.2, 'price': '35.42', 'title': 'AT VAT', 'price_set': {'shop_money': {'amount': '35.42', 'currency_code': 'EUR'}, 'presentment_money': {'amount': '35.42', 'currency_code': 'EUR'}}, 'channel_liable': False}]], ['5506eecfc2ffa7f2b1832e1ccab121fa', 12019803881631, [{'rate': 0.2, 'price': '0.00', 'title': 'GB VAT', 'price_set': {'shop_money': {'amount': '0.00', 'currency_code': 'GBP'}, 'presentment_money': {'amount': '0.00', 'currency_code': 'GBP'}}, 'channel_liable': False}]]]
headers = [['token','line','data']]
df = pd.DataFrame(data, columns =headers)
df
The "data" column stored as json in list. How it is possible to extract price for "data" column for each "token" and "line" ? When I try :
s = df.explode('data', ignore_index=True)
dfn=s.join(pd.DataFrame([*s.pop('data')], index=s.index))
Get error : TypeError: explode() missing 1 required positional argument: 'column'
Also can not to use json.loads because of this list and not string...
By this example :Pandas expand json field across records I get error: TypeError: the JSON object must be str, bytes or bytearray, not Series
Any ideas?
CodePudding user response:
Your data is not JSON if its type is a list and not str. The following code will produce a series of lists containing the prices.
df['data'].apply(lambda l: [item['price'] for item in l])
CodePudding user response:
hope this help iterate through each element of each element
for d in data:
i = 0
for r in d:
if i == 2:
for m in r:
print(m['price'])
i = i 1
if i use the i counter it's to avoid looping on the int elements token and line only continue to go further deep with the data list
CodePudding user response:
I am sure there is a better way, but since your object is regularly nested, I would simply parse each entry and build a list of non-nested dictionaries you can later use to build the DataFrame.
I am assuming you want to port all the data into a single DataFrame.
new_data = []
for entry in data:
d = {'label':[], 'line':[], 'rate':[], 'price':[], 'title':[],'shop_money_amount':[], 'shop_money_currency_code':[], 'presentment_money_amount':[], 'presentment_money_currency_code':[], 'channel_liable':[]}
for item in entry[2]: #list of dicts
d['label'].append(entry[0])
d['line'].append(entry[1])
d['rate'].append(item['rate'])
d['price'].append(item['price'])
d['title'].append(item['title'])
d['shop_money_amount'].append(item['price_set']['shop_money']['amount'])
d['shop_money_currency_code'].append(item['price_set']['shop_money']['currency_code'])
d['presentment_money_amount'].append(item['price_set']['presentment_money']['amount'])
d['presentment_money_currency_code'].append(item['price_set']['presentment_money']['currency_code'])
d['channel_liable'].append(item['channel_liable'])
new_data.append(d)
df = pd.concat([pd.DataFrame(d) for d in new_data])