I have a string that looks like this:
details = "| 4655748765321 | _jeffybion5 | John Dutch |"
The end product I want is this:
>>> details
>>> _jeffybion5 John Dutch
My current code removes all digits including those attached to strings, also ignores the whitespace between two or more strings.
>>> import re
>>>
>>> details = "| 47574802757 | _jeffybion5 | John Dutch |"
>>> details = re.sub("[0-9]", "", details)
>>> details = re.sub(" ", "", details)
>>> details = details.replace("|", " ")
>>> details
>>> _jeffybion JohnDutch
Any help to achieving the desired result would be really appreciated.
CodePudding user response:
Non-Regex Solution
One approach:
chunks = details.split()
res = " ".join(chunk for chunk in chunks if not chunk.isnumeric() and (chunk != "|"))
print(res)
Output
_jeffybion5 John Dutch
Regex Solution
An alternative using re.findall
:
res = " ".join(re.findall(r"\w*[a-zA-z]\w*", details))
print(res)
Output
_jeffybion5 John Dutch
A third alternative using also re.findall
:
res = " ".join(re.findall(r"\w*[^\W\d]\w*", details))
The pattern:
[^\W\d]
matches any word character that is not a digit.
The regex solutions are based on the idea that you want strings composed of letters and numbers (also underscore) with at least one letter (or underscore).
CodePudding user response:
With your shown exact samples please try following regex.
^[^|]*\|[^|]*\|\s (\S )\s \|\s ([^|]*)
Here is the Online demo for above regex.
Python3 code: Using Python3x's re
module's split
function to get required output.
import re
##Creating x variable here...
x="""
| 4655748765321 | _jeffybion5 | John Dutch |
"""
##Getting required output from split function and data manipulation here.
[x.strip(' |\||\n') for x in re.split(r'^[^|]*\|[^|]*\|\s (\S )\s \|\s ([^|]*)',var) if x ][0:-1]
##Output:
['_jeffybion5', 'John Dutch']
Explanation: Using regex ^[^|]*\|[^|]*\|\s (\S )\s \|\s ([^|]*)
to get required output, this is creating 2 capturing groups which will help us to fetch values later. Then removing new lines or pipes from strip
command further. Then removing last item of list, which is empty one created by split function.
CodePudding user response:
For the example data, you might remove a pipe surrounded with optional whitespace chars, and optionally remove digits followed by whitespace chars till the next pipe.
Then strip the surrounding spaces.
\s*\|\s*(?:\d \s*\|)?
details = "| 4655748765321 | _jeffybion5 | John Dutch |"
res = re.sub(r"\s*\|\s*(?:\d \s*\|)?", " ", details).strip()
print(res)
Output
_jeffybion5 John Dutch
If there should be a char A-Za-z in the string, you could split in |
between whitespace chars and check for it:
details = "| 4655748765321 | _jeffybion5 | John Dutch | | "
res = " ".join([m for m in re.split(r"\s*\|\s*", details) if re.search(r"[A-Za-z]", m)])
print(res)
Output
_jeffybion5 John Dutch