A question in Python (3.9.5) and Pandas:
Suppose I have an array of strings x
and I want to extract all the elements that contains a certain substring, e.g. feb05
. Is there a Pythonic way to do it in one-line, including using a Pandas functions?
Example for what I mean:
x = ["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"]
must_contain = "feb05"
desired_output = ["2023_feb05", "2024_feb05"]
I can run a loop,
import numpy as np
import pandas as pd
desired_output = []
indices_bool = np.zeros(len(x))
for idx, test in enumerate(x):
if must_contain in test:
desired_output.append(test)
indices_bool[idx] = 1
but I seek for a more Pythonic way to do it.
In my application x
is a column in a Pandas dataframe, so answers with Pandas functions will also be welcomed. The goal is to filter all the rows that has must_contain
in the field x
(e.g. x = df["names"]
).
CodePudding user response:
Since you are with pandas, you can use str.contains
to get the boolean condition:
import pandas as pd
df = pd.DataFrame({'x': ["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"]})
must_contain = "feb05"
df.x.str.contains(must_contain)
#0 False
#1 False
#2 False
#3 True
#4 True
#Name: x, dtype: bool
Filter by the condition:
df[df.x.str.contains(must_contain)]
# x
#3 2023_feb05
#4 2024_feb05
CodePudding user response:
no pandas
list(filter(lambda y: must_contain in y,x))
["2023_feb05", "2024_feb05"]
pandas
series=pd.Series(["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"])
must_contain = "feb05"
series[series.str.contains(must_contain)].to_list()
["2023_feb05", "2024_feb05"]