Different behaviour in Pandas 0.20 and 0.24 while creating a dictionary using lambda function-CodePudding

I am using 2.7.13 and observing different behaviour between 2 version of pandas.

import pandas as pd
df = pd.DataFrame({'cluster': ['5', '5', '5', '5', '5', '5'],
         'mdse_item_i': ['23627102',
                         '23627102',
                         '23627102',
                         '23627102',
                         '23627102',
                         '23627102'],
         'predPriceQty': ['35.675543',
                         '35.675543',
                         '35.675543',
                         '35.675543',
                         '35.675543',
                         '35.675543'],
         'schedule_i': ['56', '56', '56', '56', '56', '56'],
         'segment_id': ['4123', '4123', '4144', '4161', '4295', '4454'],
         'wk': ['1', '1', '1', '1', '1', '1']} )
df.set_index(['segment_id', 'cluster'], inplace=True)

df.apply(lambda row:
                 {row['schedule_i']: {row['mdse_item_i']: {row['wk']: row['predPriceQty']}}},
                 axis=1)

Below are the results using different version of pandas

Pandas: 0.24.2

segment_id  cluster
4123        5          {u'56': {u'23627102': {u'1': u'35.675543'}}}
            5          {u'56': {u'23627102': {u'1': u'35.675543'}}}
4144        5          {u'56': {u'23627102': {u'1': u'35.675543'}}}
4161        5          {u'56': {u'23627102': {u'1': u'35.675543'}}}
4295        5          {u'56': {u'23627102': {u'1': u'35.675543'}}}
4454        5          {u'56': {u'23627102': {u'1': u'35.675543'}}}
dtype: object

Pandas: 0.20.1

                    mdse_item_i  predPriceQty  schedule_i  wk
segment_id cluster
4123       5                NaN           NaN         NaN NaN
           5                NaN           NaN         NaN NaN
4144       5                NaN           NaN         NaN NaN
4161       5                NaN           NaN         NaN NaN
4295       5                NaN           NaN         NaN NaN
4454       5                NaN           NaN         NaN NaN

I am not sure why the values are getting created to Nan and i would appreciate any help on this

Ultimately my aim is to create a dictionary using to_dict() as below..i want to avoid iterating between rows as my dataset has more than 100k rows

df.apply(lambda row:
                 {row['schedule_i']: {row['mdse_item_i']: {row['wk']: row['predPriceQty']}}},
                 axis=1).to_dict()

Also i want to achieve this using pandas to 0.20 as i cannot upgrade to pandas 0.24 due to some constraints

CodePudding user response：

You can just use to_dict and then a dictionary comprehension:

di = df[~df.index.duplicated(keep='first')].to_dict(orient='index')
{k:{v['schedule_i']: {v['mdse_item_i']: {v['wk']: v['predPriceQty']}}}
     for k,v in di.items()}

# Output:
{('4123', '5'): {'56': {'23627102': {'1': '35.675543'}}},
 ('4144', '5'): {'56': {'23627102': {'1': '35.675543'}}},
 ('4161', '5'): {'56': {'23627102': {'1': '35.675543'}}},
 ('4295', '5'): {'56': {'23627102': {'1': '35.675543'}}},
 ('4454', '5'): {'56': {'23627102': {'1': '35.675543'}}}}

Fairly sure this works even with Pandas 0.2