I'm working through a data cleansing exercise. I'm needing to remove any e-mail from a column, but retain all other text.
- It is assumed that any e-mail is any text surrounding the @ symbol contained within spaces
- The email isn't in a fixed position within the cell (and not always contained in the cell)
- Email can be the only text in a cell
Sample Data:
Raw Data | Formatted |
---|---|
24B Nowhere Street, Levin 5510 | 24B Nowhere Street, Levin 5510 |
30 Bartholomew Road, Ashtown 5510, [email protected], 021 123 4567 - James | 30 Bartholomew Road, Ashtown 5510, 021 132 4567 - James |
[email protected], 021 987 6543, M Sinclair | 021 987 6543, M Sinclair |
[email protected] |
TIA
CodePudding user response:
FILTERXML()
can give you desired output.
=IFERROR(TEXTJOIN(" ",TRUE,FILTERXML("<t><s>"&SUBSTITUTE(A2," ","</s><s>")&"</s></t>","//s[not(contains(., '@'))]")),"")
If your version of excel support LAMBDA()
function then you can use below function for one go.
=IFERROR(BYROW(A2:A5,LAMBDA(a,TEXTJOIN(" ",TRUE,FILTERXML("<t><s>"&SUBSTITUTE(a," ","</s><s>")&"</s></t>","//s[not(contains(., '@'))]")))),"")
CodePudding user response:
This Tutorial is also helpful, https://www.youtube.com/watch?v=6_Ysi1SB9cY
After going through the tutorial, please read this: when you write the if condition, you can find the cells containing '@' symbol and replace them with '@' or something and then continue.
CodePudding user response:
Alternatively, if TEXTSPLIT()
is available:
=LET(X,TEXTSPLIT(A1,", "),TEXTJOIN(", ",,FILTER(X,ISERROR(FIND("@",X)),"")))
Note: I splitted the text on the ", " (comma-space combo) instead to prevent possible trailing spaces. The samples show that should work fine.