I have a dataframe looking like this:
df
a b
0 [1, 2] ['first', 'second']
1 [] []
2 [5] [1]
3 [] []
4 ['a'] ['b']
5 [] []
I would like to create a column (c) which should have the a dictionary containing the zip of values on columns (a) and (b).
If the values of the columns (a) and (b) would not be lists, I could use df.c = dict(zip(df.a, df.b))
. However, since they are lists, it gives me an error. I can transform them into a tuple via list(zip(df.a, df.b))
, but sadly a dictionary is needed.
Eventually, the output I am looking for is the following:
df
a b c
0 [1, 2] ['first', 'second'] {1: 'first', 2:'second'}
1 [] [] {}
2 [5] [1] {5:1}
3 [] [] {}
4 ['a'] ['b'] {'a':'b'}
5 [] [] {}
Any ideas without looping over the rows of dataframe 1by1?
Well both answers give the same output. Thank you for the answers. However after benchmarking, I accepted the fastest one.
%timeit [dict(zip(ai, bi)) for ai, bi in zip(df['parameter_ids'], df['parameter_values'])]
7.76 ms ± 77 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df[['parameter_ids', 'parameter_values']].apply(lambda row: dict(zip(*row)), axis=1)
140 ms ± 2.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
CodePudding user response:
Use:
import pandas as pd
# setup
data = [[[1, 2], ['first', 'second']],
[[], []],
[[5], [1]],
[[], []],
[['a'], ['b']],
[[], []]]
df = pd.DataFrame(data=data, columns=["a", "b"])
df["c"] = [dict(zip(ai, bi)) for ai, bi in zip(df.a, df.b)]
print(df)
Output
a b c
0 [1, 2] [first, second] {1: 'first', 2: 'second'}
1 [] [] {}
2 [5] [1] {5: 1}
3 [] [] {}
4 [a] [b] {'a': 'b'}
5 [] [] {}
CodePudding user response:
You could try df.apply
:
>>> df['c'] = df.apply(lambda row: dict(zip(*row)), axis=1)
>>> df
a b c
0 [1, 2] [first, second] {1: 'first', 2: 'second'}
1 [] [] {}
2 [5] [1] {5: 1}
3 [] [] {}
4 [a] [b] {'a': 'b'}
5 [] [] {}