Home > database >  How to append dictionary from one column to anther column in pandas
How to append dictionary from one column to anther column in pandas

Time:11-29

I have a dataframe like below:

df = pd.DataFrame({'id' : [1,2,3],
                  'attributes' : [{'dd' : True, 'budget' : '35k'}, {'dd' : True, 'budget' : '25k'}, {'dd' : True, 'budget' : '40k'}],
                  'prod.attributes' : [{'img' : 'img1.url', 'name' : 'millennials'}, {'img' : 'img2.url', 'name' : 'single'}, {'img' : 'img3.url', 'name' : 'married'}]})

df
    id  attributes                      prod.attributes
0   1   {'dd': True, 'budget': '35k'}   {'img': 'img1.url', 'name': 'millennials'}
1   2   {'dd': True, 'budget': '25k'}   {'img': 'img2.url', 'name': 'single'}
2   3   {'dd': True, 'budget': '40k'}   {'img': 'img3.url', 'name': 'married'}

I have multiple such columns wherein I need to append all columns that have attributes as suffix with the actual attributes column as below:

op = pd.DataFrame({'id' : [1,2,3],
              'attributes' : [{'dd' : True, 'budget' : '35k', 'prod' : {'img' : 'img1.url', 'name' : 'millennials'}}, \
                              {'dd' : True, 'budget' : '25k', 'prod' : {'img' : 'img2.url', 'name' : 'single'}}, 
                              {'dd' : True, 'budget' : '40', 'prod' : {'img' : 'img3.url', 'name' : 'married'}}]})

op

    id  attributes
0   1   {'dd': True, 'budget': '35k', 'prod': {'img': 'img1.url', 'name': 'millennials'}}
1   2   {'dd': True, 'budget': '25k', 'prod': {'img': 'img2.url', 'name': 'single'}}
2   3   {'dd': True, 'budget': '40', 'prod': {'img': 'img3.url', 'name': 'married'}}

I tried:

df['attributes'].apply(lambda x : x.update({'audience' : df['prod.attributes']}))

But I am getting all None. Could someone please help me on this.

CodePudding user response:

More efficient than apply, use a loop and update the dictionaries in place:

for d1, d2 in zip(df['attributes'], df['prod.attributes']):
    d1['prod'] = d2

If you want to remove the original column use pop:

for d1, d2 in zip(df['attributes'], df.pop('prod.attributes')):
    d1['prod'] = d2

Updated dataframe:

   id                                                                         attributes
0   1  {'dd': True, 'budget': '35k', 'prod': {'img': 'img1.url', 'name': 'millennials'}}
1   2       {'dd': True, 'budget': '25k', 'prod': {'img': 'img2.url', 'name': 'single'}}
2   3      {'dd': True, 'budget': '40k', 'prod': {'img': 'img3.url', 'name': 'married'}}

timings

df = pd.concat([df]*10000, ignore_index=True)

%%timeit
for d1, d2 in zip(df['attributes'], df['prod.attributes']):
    d1['prod'] = d2
3.49 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
df['attributes'] = [{**a, **{'prod' : b}} 
                      for a, b in zip(df['attributes'], df['prod.attributes'])]
11.3 ms ± 384 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
df.apply(lambda r: {**r['attributes'], **{'prod': r['prod.attributes']}}, axis=1)
173 ms ± 7.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

CodePudding user response:

Use ** for merge both dictionaries in list comprehension, DataFrame.pop is used for remove column after using:

df['attributes'] = [{**a, **{'prod' : b}} 
                      for a, b in zip(df['attributes'], df.pop('prod.attributes'))]
print (df)
   id                                         attributes
0   1  {'dd': True, 'budget': '35k', 'prod': {'img': ...
1   2  {'dd': True, 'budget': '25k', 'prod': {'img': ...
2   3  {'dd': True, 'budget': '40k', 'prod': {'img': ...
  • Related