I am using 2.7.13 and observing different behaviour between 2 version of pandas.
import pandas as pd
df = pd.DataFrame({'cluster': ['5', '5', '5', '5', '5', '5'],
'mdse_item_i': ['23627102',
'23627102',
'23627102',
'23627102',
'23627102',
'23627102'],
'predPriceQty': ['35.675543',
'35.675543',
'35.675543',
'35.675543',
'35.675543',
'35.675543'],
'schedule_i': ['56', '56', '56', '56', '56', '56'],
'segment_id': ['4123', '4123', '4144', '4161', '4295', '4454'],
'wk': ['1', '1', '1', '1', '1', '1']} )
df.set_index(['segment_id', 'cluster'], inplace=True)
df.apply(lambda row:
{row['schedule_i']: {row['mdse_item_i']: {row['wk']: row['predPriceQty']}}},
axis=1)
Below are the results using different version of pandas
Pandas: 0.24.2
segment_id cluster
4123 5 {u'56': {u'23627102': {u'1': u'35.675543'}}}
5 {u'56': {u'23627102': {u'1': u'35.675543'}}}
4144 5 {u'56': {u'23627102': {u'1': u'35.675543'}}}
4161 5 {u'56': {u'23627102': {u'1': u'35.675543'}}}
4295 5 {u'56': {u'23627102': {u'1': u'35.675543'}}}
4454 5 {u'56': {u'23627102': {u'1': u'35.675543'}}}
dtype: object
Pandas: 0.20.1
mdse_item_i predPriceQty schedule_i wk
segment_id cluster
4123 5 NaN NaN NaN NaN
5 NaN NaN NaN NaN
4144 5 NaN NaN NaN NaN
4161 5 NaN NaN NaN NaN
4295 5 NaN NaN NaN NaN
4454 5 NaN NaN NaN NaN
I am not sure why the values are getting created to Nan and i would appreciate any help on this
Ultimately my aim is to create a dictionary using to_dict() as below..i want to avoid iterating between rows as my dataset has more than 100k rows
df.apply(lambda row:
{row['schedule_i']: {row['mdse_item_i']: {row['wk']: row['predPriceQty']}}},
axis=1).to_dict()
Also i want to achieve this using pandas to 0.20 as i cannot upgrade to pandas 0.24 due to some constraints
CodePudding user response:
You can just use to_dict
and then a dictionary comprehension:
di = df[~df.index.duplicated(keep='first')].to_dict(orient='index')
{k:{v['schedule_i']: {v['mdse_item_i']: {v['wk']: v['predPriceQty']}}}
for k,v in di.items()}
# Output:
{('4123', '5'): {'56': {'23627102': {'1': '35.675543'}}},
('4144', '5'): {'56': {'23627102': {'1': '35.675543'}}},
('4161', '5'): {'56': {'23627102': {'1': '35.675543'}}},
('4295', '5'): {'56': {'23627102': {'1': '35.675543'}}},
('4454', '5'): {'56': {'23627102': {'1': '35.675543'}}}}
Fairly sure this works even with Pandas 0.2