How do you split the text in a column to create a new column in a dataframe using "(" and ")"? Current data frame:
Item | Description | |
---|---|---|
0 | coat | Boys (Target) |
1 | boots | Womens (DSW) |
2 | socks | Girls (Kohls) |
3 | shirt | Mens (Walmart) |
4 | boots | Womens (DSW) |
5 | coat | Boys (Target) |
What I want to create:
Item | Description | Retailer | |
---|---|---|---|
0 | coat | Boys | Target |
1 | boots | Womens | DSW |
2 | socks | Girls | Kohls |
3 | shirt | Mens | Walmart |
4 | boots | Womens | DSW |
5 | coat | Boys | Target |
I've tried the following:
df[['Description'], ['Retailer']] = df['Description'].str.split("(")
I get an error: "TypeError: unhashable type: 'list'"
CodePudding user response:
Try this:
import pandas as pd
# creating the df
item = ['coat','boots']
dec = ["Boys (Target)", "Womens (DSW)"]
df = pd.DataFrame(item, columns=['Item'])
df['Description'] = dec
def extract_brackets(row):
return row.split('(', 1)[1].split(')')[0].strip()
def extract_first_value(row):
return row.split()[0].strip()
df['Retailer'] = df['Description'].apply(extract_brackets)
df['Description'] = df['Description'].apply(extract_first_value)
print(df)
CodePudding user response:
Hi I have run this tiny test and seems to work; note the space and the \ in the split string.
import pandas as pd
df = pd.Series(['Boys (Target)','Womens (DSW)','Girls (Kohls)'])
print(df)
d1 = df.str.split(' \(')
print(d1)
CodePudding user response:
You have to include the parameter expand=True
within split
function, and rearrange the way you assign back your two columns. Consider using the following code:
df[['Description','Retailer']] = df.Description.str.replace(')','',regex=True)\
.str.split('(',expand=True)
print(df)
Item Description Retailer
0 coat Boys Target
1 boots Womens DSW
2 socks Girls Kohls
3 shirt Mens Walmart
4 boots Womens DSW
5 coat Boys Target
I am first removing the closing bracket from Description, and then expanding based on the opening bracket.