I have the following data frame:
df = pd.DataFrame({0: {0: 'EFG',
1: 'EFG',
2: 'EFG',
3: 'EFG',
4: 'EFG',
5: 'EFG',
6: 'EFG',
7: 'ABC',
8: 'EFG',
9: 'EFG',
10: 'EFG',
11: 'EFG',
12: 'EFG',
13: 'EFG',
14: 'EFG',
15: 'EFG',
16: 'EFG',
17: 'EFG',
18: 'EFG',
19: 'EFG',
20: 'ABC',
21: 'EFG',
22: 'EFG',
23: 'EFG',
24: 'EFG',
25: 'EFG',
26: 'EFG',
27: 'EFG',
28: 'EFG',
29: 'EFG'},
1: {0: 'DS',
1: 'DS',
2: 'DS',
3: 'Q',
4: 'DS',
5: 'DS',
6: 'DS',
7: 'DS',
8: 'DS',
9: 'DS',
10: 'DS',
11: 'DS',
12: 'DS',
13: 'DS',
14: 'DS',
15: 'DS',
16: 'DS',
17: 'DS',
18: 'DS',
19: 'DS',
20: 'DS',
21: 'DS',
22: 'DAS',
23: 'DAS',
24: 'DAS',
25: 'DS',
26: 'DS',
27: 'Q',
28: 'DS',
29: 'DS'},
2: {0: '321',
1: '900',
2: '900',
3: '900',
4: '1000',
5: '1000',
6: '1000',
7: '444',
8: '900',
9: '900',
10: '321',
11: '900',
12: '1000',
13: '900',
14: '321',
15: '321',
16: '1000',
17: '1000',
18: '1000',
19: '1000',
20: '444',
21: '900',
22: '12345',
23: '12345',
24: '321',
25: '321',
26: '12345',
27: '1000',
28: '900',
29: '321'}})
and the following dict:
{('ABC', 'AS', '1000'): 123,
('ABC', 'AS', '444'): 321,
('ABC', 'AS', '231341'): 421,
('ABC', 'AS', '888'): 412,
('ABC', 'AS', '087'): 4215,
('ABC', 'DAS', '1000'): 3415,
('ABC', 'DAS', '444'): 4215,
('ABC', 'DAS', '231341'): 3214,
('ABC', 'DAS', '888'): 321,
('ABC', 'DAS', '087'): 111,
('ABC', 'Q', '1000'): 222,
('ABC', 'Q', '444'): 3214,
('ABC', 'Q', '231341'): 421,
('ABC', 'Q', '888'): 321,
('ABC', 'Q', '087'): 41,
('ABC', 'DS', '1000'): 421,
('ABC', 'DS', '444'): 421,
('ABC', 'DS', '231341'): 321,
('ABC', 'DS', '888'): 41,
('ABC', 'DS', '087'): 41,
('EFG', 'AS', '1000'): 213,
('EFG', 'AS', '900'): 32,
('EFG', 'AS', '12345'): 1,
('EFG', 'AS', '321'): 3,
('EFG', 'DAS', '1000'): 421,
('EFG', 'DAS', '900'): 321,
('EFG', 'DAS', '12345'): 123,
('EFG', 'DAS', '321'): 31,
('EFG', 'Q', '1000'): 41,
('EFG', 'Q', '900'): 51,
('EFG', 'Q', '12345'): 321,
('EFG', 'Q', '321'): 321,
('EFG', 'DS', '1000'): 41,
('EFG', 'DS', '900'): 51,
('EFG', 'DS', '12345'): 321,
('EFG', 'DS', '321'): 1}
This is of course only a sample of my df, and I have multiple dicts like this one. I am looking for the fastest way to map this dict to this dataframe, based on the 3 columns. I need to run this multiple times during ,y analysis, so I am looking for the optimal solution in term of running time.
what i tried already:
def map_d(a,b,c):
return d1[(a,b,c)]
res = [map_d(*a) for a in tuple(zip(df[0], df[1], df[2]))]
23.1 µs ± 335 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
performance on real data:
170 ms ± 5.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
res = df.apply(lambda x: d1[(x[0],x[1],x[2])],axis=1)
742 µs ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
performance on real data:
7.27 s ± 201 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I am looking for fastest solutions (I can build the dict differently if needed) Thanks
CodePudding user response:
For improve performance in larger DataFrames is possible use Series
with DataFrame.join
:
df = pd.DataFrame(df)
df1 = df.join(pd.Series(d, name='new'), on=[0,1,2])
print (df1.head(10))
0 1 2 new
0 EFG DS 321 1
1 EFG DS 900 51
2 EFG DS 900 51
3 EFG Q 900 51
4 EFG DS 1000 41
5 EFG DS 1000 41
6 EFG DS 1000 41
7 ABC DS 444 421
8 EFG DS 900 51
9 EFG DS 900 51
Another idea (similar solution like in question):
res = [d[(a,b,c)] for a,b,c in zip(df[0], df[1], df[2])]
Or:
res = [d[(a,b,c)] for a,b,c in df[[0,1,2]].to_numpy()]
CodePudding user response:
Convert your dataframe as MultiIndex
and extract values from your dict (Series
):
out = pd.Series(d).loc[pd.MultiIndex.from_frame(df)] \
.rename('values').reset_index()
Output:
>>> out
0 1 2 values
0 EFG DS 321 1
1 EFG DS 900 51
2 EFG DS 900 51
3 EFG Q 900 51
4 EFG DS 1000 41
5 EFG DS 1000 41
6 EFG DS 1000 41
7 ABC DS 444 421
8 EFG DS 900 51
9 EFG DS 900 51
10 EFG DS 321 1
11 EFG DS 900 51
12 EFG DS 1000 41
13 EFG DS 900 51
14 EFG DS 321 1
15 EFG DS 321 1
16 EFG DS 1000 41
17 EFG DS 1000 41
18 EFG DS 1000 41
19 EFG DS 1000 41
20 ABC DS 444 421
21 EFG DS 900 51
22 EFG DAS 12345 123
23 EFG DAS 12345 123
24 EFG DAS 321 31
25 EFG DS 321 1
26 EFG DS 12345 321
27 EFG Q 1000 41
28 EFG DS 900 51
29 EFG DS 321 1
CodePudding user response:
You can apply tuple
constructor for each row and then map
the dictionary. Note that your df
is not a DataFrame. It needs to be cast into a DataFrame constructor.
df = pd.DataFrame(df)
df['vals'] = df.apply(tuple, axis=1).map(dct)
Output:
0 1 2 vals
0 EFG DS 321 1
1 EFG DS 900 51
2 EFG DS 900 51
3 EFG Q 900 51
4 EFG DS 1000 41
...
25 EFG DS 321 1
26 EFG DS 12345 321
27 EFG Q 1000 41
28 EFG DS 900 51
29 EFG DS 321 1