Python Pandas Split strings into two Columns using str.split()-CodePudding

How do you split the text in a column to create a new column in a dataframe using "(" and ")"? Current data frame:

	Item	Description
0	coat	Boys (Target)
1	boots	Womens (DSW)
2	socks	Girls (Kohls)
3	shirt	Mens (Walmart)
4	boots	Womens (DSW)
5	coat	Boys (Target)

What I want to create:

	Item	Description	Retailer
0	coat	Boys	Target
1	boots	Womens	DSW
2	socks	Girls	Kohls
3	shirt	Mens	Walmart
4	boots	Womens	DSW
5	coat	Boys	Target

I've tried the following:

df[['Description'], ['Retailer']] = df['Description'].str.split("(")

I get an error: "TypeError: unhashable type: 'list'"

CodePudding user response：

Try this:

import pandas as pd

# creating the df
item = ['coat','boots']
dec = ["Boys (Target)", "Womens (DSW)"]
df = pd.DataFrame(item, columns=['Item'])
df['Description'] = dec


def extract_brackets(row):
    return row.split('(', 1)[1].split(')')[0].strip()


def extract_first_value(row):
    return row.split()[0].strip()


df['Retailer'] = df['Description'].apply(extract_brackets)
df['Description'] = df['Description'].apply(extract_first_value)

print(df)

CodePudding user response：

Hi I have run this tiny test and seems to work; note the space and the \ in the split string.

import pandas as pd
df = pd.Series(['Boys (Target)','Womens (DSW)','Girls (Kohls)'])
print(df)
d1 = df.str.split(' \(')
print(d1)

CodePudding user response：

You have to include the parameter expand=True within split function, and rearrange the way you assign back your two columns. Consider using the following code:

df[['Description','Retailer']]  = df.Description.str.replace(')','',regex=True)\
    .str.split('(',expand=True)

print(df)

    Item Description Retailer
0   coat       Boys    Target
1  boots     Womens       DSW
2  socks      Girls     Kohls
3  shirt       Mens   Walmart
4  boots     Womens       DSW
5   coat       Boys    Target

I am first removing the closing bracket from Description, and then expanding based on the opening bracket.