In pandas, I have 2 columns, one of which is a dictionary and the other is a numerical column.
When the dictionary column is not null, is there a time efficient way to replace the value of a particular key with the value of the second column?
df=pd.DataFrame(zip([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32],
[{'m':34, 'n':42},None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,
{'m':54, 'n':24},None,None,None,None,None,None,None,None,None,None,None,None,None,None,None],
[{'m':1, 'n':42},None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,
{'m':17, 'n':24},None,None,None,None,None,None,None,None,None,None,None,None,None,None,None]
), columns= ['A','B','C'])
As in the above example, when the dictionary column is not null, the value of the key 'm' is replaced by the value in column A.
CodePudding user response:
Apply only on the relevant rows:
mask = df['C'].notnull()
df.loc[mask, 'C'] = df[mask].apply(lambda r: dict(r['C'], m=r['A']), axis=1)
You can make it faster with raw
. Just make sure the structure of the columns is the same, as you use indices instead.
df.loc[mask, 'C'] = df[mask].apply(lambda r: dict(r[2], m=r[0]), axis=1, raw=True)
Result:
A B C
0 1 {'m': 34, 'n': 42} {'m': 1, 'n': 42}
1 2 None None
2 3 None None
3 4 None None
4 5 None None
5 6 None None
6 7 None None
7 8 None None
8 9 None None
9 10 None None
10 11 None None
11 12 None None
12 13 None None
13 14 None None
14 15 None None
15 16 None None
16 17 {'m': 54, 'n': 24} {'m': 17, 'n': 24}
17 18 None None
18 19 None None
19 20 None None
20 21 None None
21 22 None None
22 23 None None
23 24 None None
24 25 None None
25 26 None None
26 27 None None
27 28 None None
28 29 None None
29 30 None None
30 31 None None
31 32 None None
CodePudding user response:
Drop the null values, then zip
the columns A
and B
inside a list comprehension and update the dictionaries
s = df.dropna(subset='B')
df.loc[s.index, 'C'] = [{**d, 'm': a} for a, d in zip(s['A'], s['B'])]
Result
A B C
0 1 {'m': 34, 'n': 42} {'m': 1, 'n': 42}
1 2 None None
2 3 None None
3 4 None None
4 5 None None
5 6 None None
6 7 None None
7 8 None None
8 9 None None
9 10 None None
10 11 None None
11 12 None None
12 13 None None
13 14 None None
14 15 None None
15 16 None None
16 17 {'m': 54, 'n': 24} {'m': 17, 'n': 24}
17 18 None None
18 19 None None
19 20 None None
20 21 None None
21 22 None None
22 23 None None
23 24 None None
24 25 None None
25 26 None None
26 27 None None
27 28 None None
28 29 None None
29 30 None None
30 31 None None
31 32 None None