how to extract data inside a bracket in pandas-CodePudding

I have a dataframe column which has paranthesis with it. I would like to have only string inside it.

df:
ID  col1
1   [2023/01/06:12:00:00 AM]
2   [2023/01/06:12:00:00 AM]
3   [2023/01/06:12:00:00 AM]

Expected:

ID  col1
1   2023/01/06:12:00:00 AM
2   2023/01/06:12:00:00 AM
3   2023/01/06:12:00:00 AM

I tried with str.findall(r"(?<=[)([^]] )(?=])") and also some other regex it is not working.

Can anyone please help me?

CodePudding user response：

You can use pandas.Series.astype with pandas.Series.str.strip :

df["col1"] = df["col1"].astype(str).str.strip("['']")

Output :

print(df)
   ID                    col1
0   1  2023/01/06:12:00:00 AM
1   2  2023/01/06:12:00:00 AM
2   3  2023/01/06:12:00:00 AM

CodePudding user response：

if its "single-element list containing a string/timestamp"

this is how to extract the first element as "MatBailie" said in the comments

df['col1'] = df['col1'].str[0]

CodePudding user response：

You can use the str.extract() method and a regular expression to extract the data inside the brackets. The regular expression you can use is [(.*?)], which will match any characters between square brackets. Here's an example:

import pandas as pd

df = pd.DataFrame({'ID': [1, 2, 3], 'col1': ['[2023/01/06:12:00:00 
AM]', '[2023/01/06:12:00:00 AM]', '[2023/01/06:12:00:00 AM]']})

df['col1'] = df['col1'].str.extract(r'\[(.*?)\]')

print(df)

This will give you the expected output:

   ID                col1
0   1  2023/01/06:12:00:00 AM
1   2  2023/01/06:12:00:00 AM
2   3  2023/01/06:12:00:00 AM