I made this Regex code that is working perfect https://regex101.com/r/ctMO3W/1
const regex = /([A-Z a-z 0-9 ,;&!@#$%^&*()-`~= ]*)(?=">)/gu;
// Alternative syntax using RegExp constructor
// const regex = new RegExp('([A-Z a-z 0-9 ,;&!@#$%^&*()-`~= ]*)(?=">)', 'gu')
const str = `<meta name="description" content="TEst text, here is the text extracted">`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex ;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Basically my code extract a part of the text I need. The text between the " " content=".....">
<meta name="description" content="here is the text extracted">
So far so good but.. I can't get it to work in iMacros with Eval function
I tried several option, don't know what to do anymore
SET newcontent EVAL("var s='{{content}}'; regex=/([A-Za-z0-9,;&!@#$%^&*()-`~= ]*)(?=">)/gu; s.replace(regex, '$&,');")
thank you for your help in this
I am using (FCI): iMacros for CR v10.1.1 'PE', CR v105.0.5195.102 (_x64), Win10_x64. ('CR' = 'Chrome' / 'PE' = 'Personal Edition')
LATER EDIT: Meanwhile I found a solution, work very well for me, maybe will help others but still I want to understand what I do wrong on above question. Maybe someone can find a solution about that Regex code and how to implement it for iMacros.
SET newcontent EVAL("var u='{{content}}'; var x,y,z; x=u.split('content=\"'); y=x[1].split('\">'); z=y[0]; z;")
PROMPT {{newcontent}}
thanks again
CodePudding user response:
LATER EDIT: Meanwhile I found a solution [...]:
SET newcontent EVAL("var u='{{content}}'; var x,y,z; x=u.split('content=\"'); y=x[1].split('\">'); z=y[0]; z;") PROMPT {{newcontent}}
Yep-yep, very good!, you managed to find one of my Posts (probably on the iMacros Forum 'Data Extraction' Sub-Forum where I've already posted this Solution dozens and dozens of times).
(And I know the Script/Sol is from me, I use a "pretty" specific Syntax, ah-ah...! :wink: )
Now answering this Qt (Question) like I had "planned" without/before seeing the "Later Edit" Section in the Qt... (=> with much more Info/Explanation... (=> not only for @OP, but for any other (future) User(s) with a similar Qt/Scenario)...)
=> OK, @OP, you have some RegEx
Qt, you want some RegEx
Solution, then hum, I "honestly" didn't even (really) read your Qt, just like you didn't really read the iMacros Tag Wiki (I wrote it myself...! (v1)) where I even "dedicated" a Section about RegEx
nearly NEVER being the "Best" Sol/Implementation with iMacros, ah-ah...!
Then I would have 2 much-much-much easier Solutions for your Case/Scenario, without using RegEx
...:
Sol 1:
Basically my code extract a part of the text I need. The text between the " " content=".....">
<meta name="description" content="here is the text extracted">
=> Your "here is the text extracted" is enclosed between Double Quotes, => can use a simple/single split()
to get your Data...:
SET Item EVAL("var s='{{!EXTRACT}}'; var z=s.split('\"'); z[3];")
PROMPT Extract:<SP>_{{!EXTRACT}}_<BR>Item:<SP>_{{Item}}_
"description" is also enclosed between Double Quotes, hence the z[3]
, split()
-Index starting at "0". (And z[1]
would output this "description" String if you needed it...)
Sol 2:
Sol 1 is the "easiest" and simplest, but hum..., maybe not always completely reliable...:
For some "strange" Reasons (10( ) years Experience doing Data-Extraction with iMacros), Attributes of an HTML Element sometimes swap Order, so your <meta name="description" content="here is the text extracted">
could upon a Refresh of the Page become <meta content="here is the text extracted" name="description">
, and yep maybe not with only 2 Attr's, but that happens very often with more than 2 Attr's...
(I don't really have an Explanation why, maybe(?) the Order can be controlled/forced from the Server-Side, I don't think that's possible from the Client/Browser-Side..., or only using some 'UserScript' customized on ALL HTML Elements "important" for your iMacros Script..., and knowing "in Advance" all "unexpected" Situations that could happen, ... pretty cumbersome I would think..., and 'Sol 2' takes care of that, ah-ah...!)
This would/could also happen if you "played" with iMacros on that same Page without doing a "fresh" Refresh/Reload, as iMacros will inject some "pixel-line=blue" extra-Attr in the HTML Def of the Element, in any "random" Position, so there is no real "Assurance" that the content
Attr will always be the 2nd Attr in the List, and therefore that z[3]
will always be correct...!
=> More reliable would then be to be "more specific" on the 1st split()
and to use the Name of the Attr as part of the split()
like you already found out in your 'Later Edit' Section to your Qt, like in:
SET !ERRORIGNORE YES
'Extracted: "<meta name="description" content="here is the text extracted">"
SET Item EVAL("var s='{{!EXTRACT}}', x,y,z; x=s.split('content=\"'); y=x[1].split('\"'); z=y[0]; z;")
PROMPT Extract:<SP>_{{!EXTRACT}}_<BR>Item:<SP>_{{Item}}_
split()
is Case-Sensitive, make sure your "content" is not "Content" (with a Capital)...
And the !ERRORIGNORE
(=YES) is needed in case the Extract is "empty" for any Reason (=> = "#EANF#"), or the x[1]
will yield some Runtime Error, even inside the EVAL()
...
And notice that I deliberately "only" used y=x[1].split('\"');
for the 2nd split()
(and not y=x[1].split('\">');
like you used @OP), => not taking the >
Char as that presumes that the "content" Attr will always be the last Attr in the HTML Extract, which might not always be true for the Reasons I explained.
Both Sols/Implementations not (specifically) tested for this specific Scenario/Thread/SO_Qt, no URL posted anyway...!, but tested many-many times "before" (in v8.8.2 for FF
v8.9.7 for FF
various Versions of PM ('Pale Moon') FF ('Firefox')), ... and posted many times before on the iMacros Forum (where @OP probably found the Sol in the 'Later Edit' Section), this Answer and all Scripts written from scratch/memory, I could do some extra-Testing if "anything" is not correct, Typo or whatever...