Home > Enterprise >  Convert pandas series of strings to a series of lists
Convert pandas series of strings to a series of lists

Time:01-06

For iinstance I have a dataframe as below

import pandas as pd
df = pd.DataFrame({"col":['AM RLC, F C', 'AM/F C', 'DM','D C']})

    |col
-------------------|
0   |"AM RLC, F C" |
1   |"AM/F C"      |
2   |"DM"          |
3   |"D C"         |

My expected output is as following

    |col
----|-----------------------|
 0  |["AM", "RLC", "F", "C"]|
 1  |["AM", "F", "C"]       |
 2  |["DM" ]                |
 3  |["D", "C"]             |

",", "/" and "space" should be treated as delimiter,

The answers in this question do not answer my queries

CodePudding user response:

I would use str.split or str.findall:

df['col'] = df['col'].str.split('[\s,/] ')

# or
df['col'] = df['col'].str.findall('\w ')

Output:

               col
0  [AM, RLC, F, C]
1       [AM, F, C]
2             [DM]
3           [D, C]

Regex:

[\s,/]   # at least one of space/comma/slash with optional repeats

\w       # one or more word characters

CodePudding user response:

try this:

df["col"].apply(lambda x:x.replace(",","").replace("/"," ").split(" "))

CodePudding user response:

An one-liner that finds any punctuation in your string and replaces it with empty space. Then you can split the string and get a clean list:

import string

df['col'].str.replace(f'[{string.punctuation}]', ' ', regex=True).str.split().to_frame()

CodePudding user response:

Apply a function on rows of col column to filter its content. In this case the function is written in lambda form.

import pandas as pd
import re

df = pd.DataFrame({"col":['AM RLC, F C', 'AM/F C', 'DM','D C']})

df['col'] = df['col'].apply(lambda x: str(re.findall(r"[\w'] ", x)))

print(df.head())

output:

                       col
0  ['AM', 'RLC', 'F', 'C']
1         ['AM', 'F', 'C']
2                   ['DM']
3               ['D', 'C']
  • Related