For iinstance I have a dataframe as below
import pandas as pd
df = pd.DataFrame({"col":['AM RLC, F C', 'AM/F C', 'DM','D C']})
|col
-------------------|
0 |"AM RLC, F C" |
1 |"AM/F C" |
2 |"DM" |
3 |"D C" |
My expected output is as following
|col
----|-----------------------|
0 |["AM", "RLC", "F", "C"]|
1 |["AM", "F", "C"] |
2 |["DM" ] |
3 |["D", "C"] |
",", "/" and "space" should be treated as delimiter,
The answers in this question do not answer my queries
CodePudding user response:
I would use str.split
or str.findall
:
df['col'] = df['col'].str.split('[\s,/] ')
# or
df['col'] = df['col'].str.findall('\w ')
Output:
col
0 [AM, RLC, F, C]
1 [AM, F, C]
2 [DM]
3 [D, C]
Regex:
[\s,/] # at least one of space/comma/slash with optional repeats
\w # one or more word characters
CodePudding user response:
try this:
df["col"].apply(lambda x:x.replace(",","").replace("/"," ").split(" "))
CodePudding user response:
An one-liner that finds any punctuation in your string and replaces it with empty space. Then you can split the string and get a clean list:
import string
df['col'].str.replace(f'[{string.punctuation}]', ' ', regex=True).str.split().to_frame()
CodePudding user response:
Apply a function on rows of col
column to filter its content. In this case the function is written in lambda form.
import pandas as pd
import re
df = pd.DataFrame({"col":['AM RLC, F C', 'AM/F C', 'DM','D C']})
df['col'] = df['col'].apply(lambda x: str(re.findall(r"[\w'] ", x)))
print(df.head())
output:
col
0 ['AM', 'RLC', 'F', 'C']
1 ['AM', 'F', 'C']
2 ['DM']
3 ['D', 'C']