Separate text between square brackets as a separate column in python-CodePudding

I have the following the following columns,

column_1 = ["Northern Rockies, British Columbia [9A87]", "Northwest Territories [2H89]", "Canada [00052A]", "Division No. 1, Newfoundland and Labrador [52A]"]
column_2 = ["aa", "bb", "cc", "dd"]
column_3 = [4, 4.5, 23, 1]

zipped = list(zip(column_1 , column_2, column_3))
df = pd.DataFrame(zipped, columns=['column_1' , 'column_2', 'column_3'])

I want to extract the text between the square brackets from the first column as a separate column. Below is the output I am looking for,

column_1 = ["Northern Rockies, British Columbia", "Northwest Territories", "Canada", "Division No. 1, Newfoundland and Labrador"]
column_2 = ["aa", "bb", "cc", "dd"]
column_3 = [4, 4.5, 23, 1]
column_4 = ["9A87", "2H89", "00052A", "52A"]

zipped = list(zip(column_1 , column_2, column_3, column_4))
df = pd.DataFrame(zipped, columns=['column_1' , 'column_2', 'column_3', 'column_4'])

I am using square bracket here but I think the solution should apply to any type of bracket.

CodePudding user response：

I would use str.extract and str.repalce here:

df["column_4"] = df["column_1"].str.extract(r'\[(.*?)\]')
df["column_1"] = df["column_1"].str.replace(r'\s*\[.*?\]$', '', regex=True)

CodePudding user response：

Hello, Salahuddin! From the column_1, I assumed that square brackets or any kind of brackets come always after the text. eg. Northern Rockies, British Columbia [9A87]

df[['column_1','column_4']] = df['column_1'].str.extract(r'(.*)[\[\{\(](.*)[\]\}\)]',flags=re.IGNORECASE)

This will work for any kind of brackets like [], (), {}