Pandas, splitting a StringArray into an Array of StringArray-CodePudding

I have a column in a pandas data frame where one of the columns is an array of strings as shown below.

|column1                                                  |
|:--------------------------------------------------------|
|['abc<t>def<t>ghi', 'jkl<t>mno<t>pqr']                   |
|['abc<t>def<t>ghi', 'jkl<t>mno<t>pqr', 'def<t>pqr<t>jkl']|
|['ghi<t>jkl<t>pqr']                                      |

I need to split the column into an array of arrays such that the output looks like the table below

|column2                                                             |
|:-------------------------------------------------------------------|
|[['abc', 'def', 'ghi'], ['jkl', 'mno', 'pqr']]                      |
|['abc', 'def', 'ghi'], ['jkl', 'mno', 'pqr'], ['def', 'pqr', 'jkl']]|
|[['ghi', 'jkl', 'pqr']]                                             |

I have tried using split as shown below but this returns not a number for all values

dataset["column1"].str.split("<t>")

CodePudding user response：

Solution: dataset['column1'].apply(lambda x: [i.split("<t>") for i in x])

Explanation: apply(...) applies the lambda function to each element in the series dataset['column1']. The lambda function performs splitting (.split("<t>")) for each element in the list.

CodePudding user response：

You can try this code Basically split the data based on <t> and then convert the elements to string to get the required apostrophes. Here method format does these operations.

import pandas as pd

df = pd.DataFrame([[['abc<t>def<t>ghi', 'jkl<t>mno<t>pqr']], [['abc<t>def<t>ghi', 'jkl<t>mno<t>pqr', 'def<t>pqr<t>jkl']]], columns=['name'])

def format(value):
   return [str(item) for item in  [i.split('<t>') for i in value]]

df['new_name'] =  df['name'].apply(lambda x: format(x) )
print(df)

Output:

                                                name                                           new_name
0                 [abc<t>def<t>ghi, jkl<t>mno<t>pqr]     [['abc', 'def', 'ghi'], ['jkl', 'mno', 'pqr']]
1  [abc<t>def<t>ghi, jkl<t>mno<t>pqr, def<t>pqr<t...  [['abc', 'def', 'ghi'], ['jkl', 'mno', 'pqr'],...