I need to find entries in a data frame in which one of its columns has a specific pattern like 01-02-11-55-00115 on other words Number-number-number-number-number
raw_data = {'name': ['Willard Morris', 'Al Jennings', 'Omar Mullins', 'Spencer McDaniel'],
'code': ['01-02-11-55-00115','01-02-11-55-00445','test', '01-0t-11-55-00115'],
'favorite_color': ['blue', 'blue', 'yellow', "green"],
'grade': [88, 92, 95, 70]}
df = pd.DataFrame(raw_data)df.head()
retrieve the rows that have that condition , the condition will be in the 'code' column so the code has to retrieve the first and second rows
CodePudding user response:
Here is one way to do it (with str.isnumeric
) :
out = df.loc[df["code"].replace("-", "", regex=True).str.isnumeric()]
Output :
print(out)
name code favorite_color grade
0 Willard Morris 01-02-11-55-00115 blue 88
1 Al Jennings 01-02-11-55-00445 blue 92
Or, if you need to flag those rows, use this :
df["flag"] = df["code"].replace("-", "", regex=True).str.isnumeric()
print(df)
name code favorite_color grade flag
0 Willard Morris 01-02-11-55-00115 blue 88 True
1 Al Jennings 01-02-11-55-00445 blue 92 True
2 Omar Mullins test yellow 95 False
3 Spencer McDaniel 01-0t-11-55-00115 green 70 False
CodePudding user response:
Filter values if decimal or -
with Series.str.contains
and regex with ^
for start of string, decimals, -
value and $
for end of string:
df1 = df[df["code"].str.contains(r'^[0-9\-] $')]
print (df1)
name code favorite_color grade
0 Willard Morris 01-02-11-55-00115 blue 88
1 Al Jennings 01-02-11-55-00445 blue 92
Or for pattern Number-number-number-number-number
use:
df2 = df[df["code"].str.contains(r'^\d \-\d \-\d \-\d \-\d $')]
print (df2)
name code favorite_color grade
0 Willard Morris 01-02-11-55-00115 blue 88
1 Al Jennings 01-02-11-55-00445 blue 92
CodePudding user response:
Use str.fullmatch
for boolean indexing with the (\d -){4}\d
regex (4 times \d -
where \d
is one or more digits, followed by digits):
out = df[df['code'].str.fullmatch(r'(\d -){4}\d ')]
If you need to match the specific "2 digits -" four times then "5 digits":
out = df[df['code'].str.fullmatch(r'(\d\d-){4}\d{5}')]
Output:
name code favorite_color grade
0 Willard Morris 01-02-11-55-00115 blue 88
1 Al Jennings 01-02-11-55-00445 blue 92