Home > front end >  Create column containing the dict of two pandas df columns containing lists
Create column containing the dict of two pandas df columns containing lists

Time:11-03

I have a dataframe looking like this:

df
        a                      b
0   [1, 2]    ['first', 'second']
1       []                     []
2      [5]                    [1]
3       []                     []
4    ['a']                  ['b']
5       []                     []

I would like to create a column (c) which should have the a dictionary containing the zip of values on columns (a) and (b).

If the values of the columns (a) and (b) would not be lists, I could use df.c = dict(zip(df.a, df.b)). However, since they are lists, it gives me an error. I can transform them into a tuple via list(zip(df.a, df.b)), but sadly a dictionary is needed.

Eventually, the output I am looking for is the following:

df
        a                      b                           c
0   [1, 2]    ['first', 'second']    {1: 'first', 2:'second'}
1       []                     []                          {}
2      [5]                    [1]                       {5:1}
3       []                     []                          {}
4    ['a']                  ['b']                   {'a':'b'}
5       []                     []                          {}

Any ideas without looping over the rows of dataframe 1by1?

Well both answers give the same output. Thank you for the answers. However after benchmarking, I accepted the fastest one.

%timeit [dict(zip(ai, bi)) for ai, bi in zip(df['parameter_ids'], df['parameter_values'])]
7.76 ms ± 77 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df[['parameter_ids', 'parameter_values']].apply(lambda row: dict(zip(*row)), axis=1)
140 ms ± 2.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

CodePudding user response:

Use:

import pandas as pd

# setup
data = [[[1, 2], ['first', 'second']],
        [[], []],
        [[5], [1]],
        [[], []],
        [['a'], ['b']],
        [[], []]]
df = pd.DataFrame(data=data, columns=["a", "b"])

df["c"] = [dict(zip(ai, bi)) for ai, bi in zip(df.a, df.b)]
print(df)

Output

        a                b                          c
0  [1, 2]  [first, second]  {1: 'first', 2: 'second'}
1      []               []                         {}
2     [5]              [1]                     {5: 1}
3      []               []                         {}
4     [a]              [b]                 {'a': 'b'}
5      []               []                         {}

CodePudding user response:

You could try df.apply:

>>> df['c'] = df.apply(lambda row: dict(zip(*row)), axis=1)
>>> df

        a                b                          c
0  [1, 2]  [first, second]  {1: 'first', 2: 'second'}
1      []               []                         {}
2     [5]              [1]                     {5: 1}
3      []               []                         {}
4     [a]              [b]                 {'a': 'b'}
5      []               []                         {}
  • Related