Home > OS >  Dataframe column is list of strings: how to apply transformation to each element?
Dataframe column is list of strings: how to apply transformation to each element?

Time:11-15

Assuming a dataframe where a the content of a column is one list of 0 to n strings

df = pd.DataFrame({'col_w_list':[['c/100/a/111','c/100/a/584','c/100/a/324'],
                                 ['c/100/a/327'],
                                 ['c/100/a/324','c/100/a/327'],
                                 ['c/100/a/111','c/100/a/584','c/100/a/999'],
                                 ['c/100/a/584','c/100/a/327','c/100/a/999']
                                 ]})

How would I go about transforming the column (either the same or a new one) if all I wanted was the last set of digits, meaning

|  | target_still_list     |
|--|-----------------------|
|0 | ['111', '584', '324'] |
|1 | ['327']               |
|2 | ['324', '327']        |
|3 | ['111', '584', '999'] |
|4 | ['584', '327', '999'] |

I know how to handle this one list at a time

from os import path
ls = ['c/100/a/111','c/100/a/584','c/100/a/324']
new_ls = [path.split(x)[1] for x in ls]
# or, alternatively
new_ls = [x.split('/')[3] for x in ls]

But I have failed at doing the same over a dataframe. For instance

df['target_still_list'] = df['col_w_list'].apply([lambda x: x.split('/')[3] for x in df['col_w_list']])

Throws an AttributeError at me.

CodePudding user response:

How to apply transformation to each element?

For a data frame, you can use pandas.DataFrame.applymap.

For a series, you can use pandas.Series.map or pandas.Series.apply, which is your posted solution.


Your error is caused by the lambda expression. It takes an element x, so the type of x is list, you can directly iterate over its items.

The correct code should be,

df['target_still_list'] = df['col_w_list'].apply(lambda x: [item.split('/')[-1] for item in x])
# or 
# df['target_still_list'] = df['col_w_list'].map(lambda x: [item.split('/')[-1] for item in x])
# or (NOTE: This assignment works only if df has only one column.)
# df['target_still_list'] = df.applymap(lambda x: [item.split('/')[-1] for item in x])
  • Related