I have a dataframe like below:
df = pd.DataFrame({'id' : [1,2,3],
'attributes' : [{'dd' : True, 'budget' : '35k'}, {'dd' : True, 'budget' : '25k'}, {'dd' : True, 'budget' : '40k'}],
'prod.attributes' : [{'img' : 'img1.url', 'name' : 'millennials'}, {'img' : 'img2.url', 'name' : 'single'}, {'img' : 'img3.url', 'name' : 'married'}]})
df
id attributes prod.attributes
0 1 {'dd': True, 'budget': '35k'} {'img': 'img1.url', 'name': 'millennials'}
1 2 {'dd': True, 'budget': '25k'} {'img': 'img2.url', 'name': 'single'}
2 3 {'dd': True, 'budget': '40k'} {'img': 'img3.url', 'name': 'married'}
I have multiple such columns wherein I need to append all columns that have attributes
as suffix with the actual attributes
column as below:
op = pd.DataFrame({'id' : [1,2,3],
'attributes' : [{'dd' : True, 'budget' : '35k', 'prod' : {'img' : 'img1.url', 'name' : 'millennials'}}, \
{'dd' : True, 'budget' : '25k', 'prod' : {'img' : 'img2.url', 'name' : 'single'}},
{'dd' : True, 'budget' : '40', 'prod' : {'img' : 'img3.url', 'name' : 'married'}}]})
op
id attributes
0 1 {'dd': True, 'budget': '35k', 'prod': {'img': 'img1.url', 'name': 'millennials'}}
1 2 {'dd': True, 'budget': '25k', 'prod': {'img': 'img2.url', 'name': 'single'}}
2 3 {'dd': True, 'budget': '40', 'prod': {'img': 'img3.url', 'name': 'married'}}
I tried:
df['attributes'].apply(lambda x : x.update({'audience' : df['prod.attributes']}))
But I am getting all None
. Could someone please help me on this.
CodePudding user response:
More efficient than apply
, use a loop and update the dictionaries in place:
for d1, d2 in zip(df['attributes'], df['prod.attributes']):
d1['prod'] = d2
If you want to remove the original column use pop
:
for d1, d2 in zip(df['attributes'], df.pop('prod.attributes')):
d1['prod'] = d2
Updated dataframe:
id attributes
0 1 {'dd': True, 'budget': '35k', 'prod': {'img': 'img1.url', 'name': 'millennials'}}
1 2 {'dd': True, 'budget': '25k', 'prod': {'img': 'img2.url', 'name': 'single'}}
2 3 {'dd': True, 'budget': '40k', 'prod': {'img': 'img3.url', 'name': 'married'}}
timings
df = pd.concat([df]*10000, ignore_index=True)
%%timeit
for d1, d2 in zip(df['attributes'], df['prod.attributes']):
d1['prod'] = d2
3.49 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
df['attributes'] = [{**a, **{'prod' : b}}
for a, b in zip(df['attributes'], df['prod.attributes'])]
11.3 ms ± 384 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
df.apply(lambda r: {**r['attributes'], **{'prod': r['prod.attributes']}}, axis=1)
173 ms ± 7.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
CodePudding user response:
Use **
for merge both dictionaries in list comprehension, DataFrame.pop
is used for remove column after using:
df['attributes'] = [{**a, **{'prod' : b}}
for a, b in zip(df['attributes'], df.pop('prod.attributes'))]
print (df)
id attributes
0 1 {'dd': True, 'budget': '35k', 'prod': {'img': ...
1 2 {'dd': True, 'budget': '25k', 'prod': {'img': ...
2 3 {'dd': True, 'budget': '40k', 'prod': {'img': ...