I have the following Pandas dataframe:
Fruit Color Size
Lemon(Fruit) Green 1
Apple(Fruit) Green 1.1
Banana(Fruit) Yellow 2.5
Banana Black 1
I would like to create new rows just for fruits that have brackets in their name while the new rows will have just the fruit name, I would like my output to look like that:
Fruit Color Size
Lemon(Sour) Green 1
Lemon Green 1
Apple(Round) Green 1.1
Apple Green 1.1
Banana(Tasty) Yellow 2.5
Banana Black 1
While maintaining the original "Banana" rows since "Banana" already has a record without brackets, I have tried doing so with a loop iteration through all of my dataframe rows but this is highly inefficient, how can this be done otherwise?
CodePudding user response:
You can use:
# extract fruit name and ensure parentheses after
fruit = df['Fruit'].str.extract(r'([^()] )\(.*\)', expand=False).dropna()
# drop fruits already present
keep = fruit[~fruit.isin(df['Fruit'])]
# concatenate to original
out = (pd.concat([df, df.loc[keep.index].assign(Fruit=keep)])
.sort_index(kind='stable', ignore_index=True)
)
alternative
You need to have unique indices here:
# extract fruit name and ensure parentheses after
fruit = df['Fruit'].str.extract(r'([^()] )\(.*\)', expand=False)
# mask fruits already present
fruit = fruit.mask(fruit.isin(df['Fruit']))
# duplicate rows with fruits and parentheses
out = df.loc[df.index.repeat(fruit.notna().add(1))]
# update the fruit names for the new rows
out.loc[out.index.duplicated(), 'Fruit'] = fruit.dropna()
print(out)
Output:
Fruit Color Size
0 Lemon(Fruit) Green 1.0
1 Lemon Green 1.0
2 Apple(Fruit) Green 1.1
3 Apple Green 1.1
4 Banana(Fruit) Yellow 2.5
5 Banana Black 1.0