Home > Blockchain >  Adding filtered records to Pandas dataframe
Adding filtered records to Pandas dataframe

Time:01-26

I have the following Pandas dataframe:

   Fruit          Color   Size
Lemon(Fruit)      Green     1
Apple(Fruit)      Green    1.1
Banana(Fruit)     Yellow   2.5
Banana            Black     1

I would like to create new rows just for fruits that have brackets in their name while the new rows will have just the fruit name, I would like my output to look like that:

   Fruit           Color   Size
Lemon(Sour)        Green     1
Lemon              Green     1
Apple(Round)       Green    1.1
Apple              Green    1.1
Banana(Tasty)      Yellow   2.5
Banana             Black     1

While maintaining the original "Banana" rows since "Banana" already has a record without brackets, I have tried doing so with a loop iteration through all of my dataframe rows but this is highly inefficient, how can this be done otherwise?

CodePudding user response:

You can use:

# extract fruit name and ensure parentheses after
fruit = df['Fruit'].str.extract(r'([^()] )\(.*\)', expand=False).dropna()
# drop fruits already present
keep = fruit[~fruit.isin(df['Fruit'])]

# concatenate to original
out = (pd.concat([df, df.loc[keep.index].assign(Fruit=keep)])
         .sort_index(kind='stable', ignore_index=True)
      )

alternative

You need to have unique indices here:

# extract fruit name and ensure parentheses after
fruit = df['Fruit'].str.extract(r'([^()] )\(.*\)', expand=False)
# mask fruits already present
fruit = fruit.mask(fruit.isin(df['Fruit']))

# duplicate rows with fruits and parentheses
out = df.loc[df.index.repeat(fruit.notna().add(1))]

# update the fruit names for the new rows
out.loc[out.index.duplicated(), 'Fruit'] = fruit.dropna()

print(out)

Output:

           Fruit   Color  Size
0   Lemon(Fruit)   Green   1.0
1          Lemon   Green   1.0
2   Apple(Fruit)   Green   1.1
3          Apple   Green   1.1
4  Banana(Fruit)  Yellow   2.5
5         Banana   Black   1.0
  • Related