Home > database >  Extracting from an array of strings, strings that contain a substring in them (Python)
Extracting from an array of strings, strings that contain a substring in them (Python)

Time:02-06

A question in Python (3.9.5) and Pandas:

Suppose I have an array of strings x and I want to extract all the elements that contains a certain substring, e.g. feb05. Is there a Pythonic way to do it in one-line, including using a Pandas functions?

Example for what I mean:

x = ["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"]
must_contain = "feb05"
desired_output = ["2023_feb05", "2024_feb05"]

I can run a loop,

import numpy as np
import pandas as pd

desired_output = []
indices_bool = np.zeros(len(x))
for idx, test in enumerate(x):
   if must_contain in test:
      desired_output.append(test)
      indices_bool[idx] = 1
      

but I seek for a more Pythonic way to do it.

In my application x is a column in a Pandas dataframe, so answers with Pandas functions will also be welcomed. The goal is to filter all the rows that has must_contain in the field x (e.g. x = df["names"]).

CodePudding user response:

Since you are with pandas, you can use str.contains to get the boolean condition:

import pandas as pd
df = pd.DataFrame({'x': ["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"]})
must_contain = "feb05"

df.x.str.contains(must_contain)
#0    False
#1    False
#2    False
#3     True
#4     True
#Name: x, dtype: bool

Filter by the condition:

df[df.x.str.contains(must_contain)]
#            x
#3  2023_feb05
#4  2024_feb05

CodePudding user response:

no pandas

list(filter(lambda y: must_contain in y,x))

["2023_feb05", "2024_feb05"]

pandas

series=pd.Series(["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"])
must_contain = "feb05"
series[series.str.contains(must_contain)].to_list()

["2023_feb05", "2024_feb05"]
  • Related