Home > Software engineering >  How to optimally find if "dataframe cell value" contains "cell value from another dat
How to optimally find if "dataframe cell value" contains "cell value from another dat

Time:11-04

I have dataframe with 2 unequal columns:

One-word Many-Words
Bird Bird with no blood
Stone Stone that killed the bird
Blood Bird without brains
<none> stone and blood

And i am trying to fill the new third column with all of the many-words that contain one-word. (5 or less) So it would be like:

One-word Many-Words Many-Words with One-word
Bird Bird with no blood Bird with no blood, Stone that killed the bird, Bird without brains
Stone Stone that killed the bird Stone that killed the bird, stone and blood
Blood Bird without brains Bird without brains, Bird with no blood, stone and blood
<none> stone and blood

I actually found a way, but it is very slow.

  1. Go with loop in column 'many-rows".

    1.1 Within loop create a dictionary, where key is cell from "many-words" and value is list made with split

  2. Go with loop in column "one-word"

    2.1 Within loop create another loop in keys,values of dictionary in 1.1

    2.2.Within these to loops check whether list from 1.1 contains word from one-word

    2.3 If it does - concatenate corresponding cell in third column with the key of dictionary on a condition, that amount of concatenations is 5 or less.

I am actually looping through dataframe-column cells, and creating dicts and lists from it, which i read is very very bad.

I am novice in Python but i am pretty sure that my way is unholy.

There is got to be a better, faster, and cleaner way. Maybe something with vectorization?

Thank you!

CodePudding user response:

You can use iterrows to loop over your df rows and build a list of Many-Words containing One-word:

df["Many-Words with One-word"] = pd.Series([
  df[df["Many-Words"].str.lower().str.contains(row["One-word"].lower())]["Many-Words"].to_list()
    for _, row in df.iterrows()
])

Note: using lower to make the match case-insensitive.

Output:

  One-word                  Many-Words                           Many-Words with One-word
0     Bird          Bird with no blood  [Bird with no blood, Stone that killed the bir...
1    Stone  Stone that killed the bird      [Stone that killed the bird, stone and blood]
2    Blood         Bird without brains              [Bird with no blood, stone and blood]
3   <none>             stone and blood                                                 []
  • Related