Home > front end >  Assign one array column values to another array column in pandas dataframe
Assign one array column values to another array column in pandas dataframe

Time:10-26

I have a dataframe which contains array column and string column

| string_col  | array_col            |
|-------------|----------------------|
| fruits      | ['apple', 'banaana'] |
| flowers     | ['rose', 'sunflower']|
| animals     | ['lion', 'tiger']    |

I want to assign string_col elements to each element in array_col. So, the output dataframe which is in the form of below.

| string_col  | array_col            | new_col              |
|-------------|----------------------|----------------------|
| fruits      | ['apple', 'banaana'] |['fruits', 'fruits']  |
| flowers     | ['rose', 'sunflower']|['flowers', 'flowers']|
| animals     | ['lion', 'tiger']    |['animals', 'animals']|

CodePudding user response:

Use list comprehension for repeat strings by length of column:

df['new_col'] = [[a] * len(b) for a, b in zip(df['string_col'], df['array_col'])]
print (df)
  string_col          array_col             new_col
0     fruits   [apple, banaana]    [fruits, fruits]
1    flowers  [rose, sunflower]  [flowers, flowers]
2    animals      [lion, tiger]  [animals, animals]

If small data and performance not important use DataFrame.apply:

df['new_col'] = df.apply(lambda x: [x['string_col']] * len(x['array_col']) , axis=1)

#3k rows
df = pd.concat([df] * 1000, ignore_index=True)


In [311]: %timeit df['new_col'] = [[a] * len(b) for a, b in zip(df['string_col'], df['array_col'])]
1.94 ms ± 97.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [312]: %timeit df['new_col'] = df.apply(lambda x: [x['string_col']] * len(x['array_col']) , axis=1)
40.4 ms ± 3.35 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [313]: %timeit df['new_col']=df[['string_col']].agg(list, axis=1)*df['array_col'].str.len()
132 ms ± 6.91 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
  • Related