Home > OS >  Is there any vectorized way of check string of column is substring in Pandas?
Is there any vectorized way of check string of column is substring in Pandas?

Time:12-21

I have a series of pandas, and I want filter it by checking if strings in columns are substring of another string.

For example,

sentence = "hello world"
words = pd.Series(["hello", "wo", "d", "panda"])

And then,I want to get series (which is series of substring "hello world") as below.

filtered_words = pd.Series(["hello", "wo", "d"])

Maybe there are some ways like "apply", or something, but it doesn't look like vectorized things.

How can I make it?

CodePudding user response:

There is no vectorized way to do this, you'll need to loop.

Whether you're using apply or map this will do exactly the same and loop. A slightly faster way of to use a pure python list comprehension.

filtered_words = words[[x in sentence for x in words]]

Below is a timing on 400k rows

%%timeit
w.map(lambda x : x in sentence)
103 ms ± 7.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
w.apply(lambda x : x in sentence )
121 ms ± 20.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
[x in sentence for x in w]
85.8 ms ± 3.68 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

NB. apply is sometimes faster than map (or within the margin of error), but the pure python if always faster by ~15-25%

CodePudding user response:

Let us try

out = words[words.map(lambda x : x in sentence )]
0    hello
1       wo
2        d
dtype: object

CodePudding user response:

How about:

out = words[words.apply(lambda x: x in sentence)]

But list comprehension is still pretty fast:

out = [w for w in words if w in sentence]

CodePudding user response:

You could use list comprehension

filtered_words = pd.Series([word for word in words if word in sentence])

Or as you say, you could use apply

word_mask = words.apply(lambda word: word in sentence)
found_words = word_mask*words
filtered_words = found_words[found_words.astype(bool)]

  • Related