I have a Pandas dataframe with the column store
. It contains a list of stores that look like this:
H-E-B 721:1101 W STAN SCHLUETER LOOP,KILLEEN,TX
H-E-B PLUS 39:2509 N MAIN ST,BELTON,TX
I want the store number, which are 721
and 39
in the above examples.
Here is my process for getting it:
- Find the position of the colon.
- Slice backwards until reaching a space.
How do I do this in Python/Pandas? I'm guessing that I need to use regex, but I have no idea how to start.
CodePudding user response:
You can use str.extract
with the (\d ):
regex:
df['number'] = df['store'].str.extract('(\d ):', expand=False).astype(int)
Output:
store number
0 H-E-B 721:1101 W STAN SCHLUETER LOOP,KILLEEN,TX 721
1 H-E-B PLUS 39:2509 N MAIN ST,BELTON,TX 39
CodePudding user response:
You can apply a function to the store column
def retrieve_store_number(text):
return text.split(":")[0].split()[-1]
df["store_number"] = df.store.apply(retrieve_store_number)
CodePudding user response:
To use regex in Pandas, you can use the extract function:
df['store_number'] = df['store'].str.extract(r'(\d (?=:))')
CodePudding user response:
You can do something along those lines:
def extract_number(string: str) -> int:
return [int(i.split(":")[0]) for i in string.split(" ") if ":" in i][0]
df = pd.DataFrame([["H-E-B 721:1101 W STAN SCHLUETER LOOP,KILLEEN,TX"],
["H-E-B PLUS 39:2509 N MAIN ST,BELTON,TX"]], columns = ["store"])
df["number"] = df["store"].apply(extract_number)