I have dataframe with 2 unequal columns:
One-word | Many-Words |
---|---|
Bird | Bird with no blood |
Stone | Stone that killed the bird |
Blood | Bird without brains |
<none> | stone and blood |
And i am trying to fill the new third column with all of the many-words that contain one-word. (5 or less) So it would be like:
One-word | Many-Words | Many-Words with One-word |
---|---|---|
Bird | Bird with no blood | Bird with no blood, Stone that killed the bird, Bird without brains |
Stone | Stone that killed the bird | Stone that killed the bird, stone and blood |
Blood | Bird without brains | Bird without brains, Bird with no blood, stone and blood |
<none> | stone and blood |
I actually found a way, but it is very slow.
Go with loop in column 'many-rows".
1.1 Within loop create a dictionary, where key is cell from "many-words" and value is list made with split
Go with loop in column "one-word"
2.1 Within loop create another loop in keys,values of dictionary in 1.1
2.2.Within these to loops check whether list from 1.1 contains word from one-word
2.3 If it does - concatenate corresponding cell in third column with the key of dictionary on a condition, that amount of concatenations is 5 or less.
I am actually looping through dataframe-column cells, and creating dicts and lists from it, which i read is very very bad.
I am novice in Python but i am pretty sure that my way is unholy.
There is got to be a better, faster, and cleaner way. Maybe something with vectorization?
Thank you!
CodePudding user response:
You can use iterrows
to loop over your df rows and build a list of Many-Words
containing One-word
:
df["Many-Words with One-word"] = pd.Series([
df[df["Many-Words"].str.lower().str.contains(row["One-word"].lower())]["Many-Words"].to_list()
for _, row in df.iterrows()
])
Note: using lower
to make the match case-insensitive.
Output:
One-word Many-Words Many-Words with One-word
0 Bird Bird with no blood [Bird with no blood, Stone that killed the bir...
1 Stone Stone that killed the bird [Stone that killed the bird, stone and blood]
2 Blood Bird without brains [Bird with no blood, stone and blood]
3 <none> stone and blood []