I have a dataframe column which has paranthesis with it. I would like to have only string inside it.
df:
ID col1
1 [2023/01/06:12:00:00 AM]
2 [2023/01/06:12:00:00 AM]
3 [2023/01/06:12:00:00 AM]
Expected:
ID col1
1 2023/01/06:12:00:00 AM
2 2023/01/06:12:00:00 AM
3 2023/01/06:12:00:00 AM
I tried with str.findall(r"(?<=[)([^]] )(?=])") and also some other regex it is not working.
Can anyone please help me?
CodePudding user response:
You can use pandas.Series.astype
with pandas.Series.str.strip
:
df["col1"] = df["col1"].astype(str).str.strip("['']")
Output :
print(df)
ID col1
0 1 2023/01/06:12:00:00 AM
1 2 2023/01/06:12:00:00 AM
2 3 2023/01/06:12:00:00 AM
CodePudding user response:
if its "single-element list containing a string/timestamp"
this is how to extract the first element as "MatBailie" said in the comments
df['col1'] = df['col1'].str[0]
CodePudding user response:
You can use the str.extract() method and a regular expression to extract the data inside the brackets. The regular expression you can use is [(.*?)], which will match any characters between square brackets. Here's an example:
import pandas as pd
df = pd.DataFrame({'ID': [1, 2, 3], 'col1': ['[2023/01/06:12:00:00
AM]', '[2023/01/06:12:00:00 AM]', '[2023/01/06:12:00:00 AM]']})
df['col1'] = df['col1'].str.extract(r'\[(.*?)\]')
print(df)
This will give you the expected output:
ID col1
0 1 2023/01/06:12:00:00 AM
1 2 2023/01/06:12:00:00 AM
2 3 2023/01/06:12:00:00 AM