I'm trying to get the property "href" i.e. "/kkanagas/status/1491996075573587994" from inside the < article > < /article >. I know it's long but I couldn't find another way to shortcut it. Just search for "href" and you'll see "/kkanagas/status/1491996075573587994" - it's what I'm looking for.
<article aria-labelledby="id__fhryquvd10b id__xyp79hh56xk id__ufvrjjfijmj id__t3x5a40wpik id__bar6zadu3v9 id__bskm5hb18ak id__k838i141bg id__sxzis5aczdj id__ymircvocx1 id__ebx7cry08s9 id__qql66qtrpr id__l377gbo5gll id__rz84acqn28" role="article" tabindex="0" data-testid="tweet">
<div >
<div >
<div >
<div >
<div >
<div ></div>
</div>
</div>
<div >
<div >
<div >
<div >
<div id="id__bskm5hb18ak" style="height: 48px; width: 48px;">
<div style="padding-bottom: 100%;"></div>
<div >
<div >
<div style="padding-bottom: 100%;"></div>
<div >
<div style="height: calc(100% - -4px); width: calc(100% - -4px);">
<a href="/kkanagas" role="link" >
<div style="height: calc(100% - 4px); width: calc(100% - 4px);">
<div ></div>
</div>
<div style="height: calc(100% - 4px); width: calc(100% - 4px);">
<div ></div>
</div>
<div style="height: calc(100% - 4px); width: calc(100% - 4px);">
<div style="">
<div style="padding-bottom: 100%;"></div>
<div >
<div aria-label="" >
<div style="background-image: url("https://pbs.twimg.com/profile_images/3202473125/48583ddb2e2de5d9193020f2cf38694b_bigger.jpeg");"></div>
<img alt="" draggable="true" src="https://pbs.twimg.com/profile_images/3202473125/48583ddb2e2de5d9193020f2cf38694b_bigger.jpeg" >
</div>
</div>
</div>
</div>
<div style="height: calc(100% - 4px); width: calc(100% - 4px);">
<div ></div>
</div>
</a>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div >
<div >
<div >
<div >
<div >
<div >
<a href="/kkanagas" role="link" >
<div id="id__bar6zadu3v9">
<div >
<div dir="auto" ><span ><span >Karthik K</span></span></div>
<div dir="auto" ></div>
</div>
<div >
<div dir="ltr" ><span >@kkanagas</span></div>
</div>
</div>
</a>
</div>
<div dir="auto" aria-hidden="true" ><span >·</span></div>
<a href="/kkanagas/status/1491996075573587994" dir="auto" aria-label="1 minute ago" role="link" id="id__sxzis5aczdj"><time datetime="2022-02-11T04:42:39.000Z">1m</time></a>
</div>
<div >
<div >
<div >
<div >
<div aria-expanded="false" aria-haspopup="menu" aria-label="More" role="button" tabindex="0" data-testid="caret">
<div dir="ltr" >
<div >
<div ></div>
<svg viewBox="0 0 24 24" aria-hidden="true" >
<g>
<circle cx="5" cy="12" r="2"></circle>
<circle cx="12" cy="12" r="2"></circle>
<circle cx="19" cy="12" r="2"></circle>
</g>
</svg>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div >
<div >
<div lang="en" dir="auto" id="id__ebx7cry08s9"><span ><a dir="ltr" href="/hashtag/USA?src=hashtag_click" role="link" >#USA</a></span><span ></span><span > and </span><span ><a dir="ltr" href="/hashtag/CZE?src=hashtag_click" role="link" >#CZE</a></span><span > are tied 0-0 at the end of the first period of this Women's Hockey QF. </span><img alt="Eyes" draggable="false" src="https://abs-0.twimg.com/emoji/v2/svg/1f440.svg" title="Eyes" ><span >
</span><span ><a dir="ltr" href="/hashtag/Beijing2022?src=hashtag_click" role="link" >#Beijing2022<img alt="" draggable="false" src="https://abs.twimg.com/hashflags/Beijing_Winter_Olympics_Beijing2022_2022/Beijing_Winter_Olympics_Beijing2022_2022.png" ></a></span><span > </span><span ><a dir="ltr" href="/hashtag/IceHockey?src=hashtag_click" role="link" >#IceHockey<img alt="" draggable="false" src="https://abs.twimg.com/hashflags/Beijing_Winter_Olympics_IceHockey_2022/Beijing_Winter_Olympics_IceHockey_2022.png" ></a></span>
</div>
</div>
<div ></div>
<div >
<div aria-label="" role="group" id="id__eui52vqqmkn">
<div >
<div aria-label="0 Replies. Reply" role="button" tabindex="0" data-testid="reply">
<div dir="ltr" >
<div >
<div ></div>
<svg viewBox="0 0 24 24" aria-hidden="true" >
<g>
<path d="M14.046 2.242l-4.148-.01h-.002c-4.374 0-7.8 3.427-7.8 7.802 0 4.098 3.186 7.206 7.465 7.37v3.828c0 .108.044.286.12.403.142.225.384.347.632.347.138 0 .277-.038.402-.118.264-.168 6.473-4.14 8.088-5.506 1.902-1.61 3.04-3.97 3.043-6.312v-.017c-.006-4.367-3.43-7.787-7.8-7.788zm3.787 12.972c-1.134.96-4.862 3.405-6.772 4.643V16.67c0-.414-.335-.75-.75-.75h-.396c-3.66 0-6.318-2.476-6.318-5.886 0-3.534 2.768-6.302 6.3-6.302l4.147.01h.002c3.532 0 6.3 2.766 6.302 6.296-.003 1.91-.942 3.844-2.514 5.176z"></path>
</g>
</svg>
</div>
<div ><span data-testid="app-text-transition-container" style="transition-property: transform; transition-duration: 0.3s; transform: translate3d(0px, 0px, 0px);"><span ></span></span></div>
</div>
</div>
</div>
<div >
<div aria-expanded="false" aria-haspopup="menu" aria-label="0 Retweets. Retweet" role="button" tabindex="0" data-testid="retweet">
<div dir="ltr" >
<div >
<div ></div>
<svg viewBox="0 0 24 24" aria-hidden="true" >
<g>
<path d="M23.77 15.67c-.292-.293-.767-.293-1.06 0l-2.22 2.22V7.65c0-2.068-1.683-3.75-3.75-3.75h-5.85c-.414 0-.75.336-.75.75s.336.75.75.75h5.85c1.24 0 2.25 1.01 2.25 2.25v10.24l-2.22-2.22c-.293-.293-.768-.293-1.06 0s-.294.768 0 1.06l3.5 3.5c.145.147.337.22.53.22s.383-.072.53-.22l3.5-3.5c.294-.292.294-.767 0-1.06zm-10.66 3.28H7.26c-1.24 0-2.25-1.01-2.25-2.25V6.46l2.22 2.22c.148.147.34.22.532.22s.384-.073.53-.22c.293-.293.293-.768 0-1.06l-3.5-3.5c-.293-.294-.768-.294-1.06 0l-3.5 3.5c-.294.292-.294.767 0 1.06s.767.293 1.06 0l2.22-2.22V16.7c0 2.068 1.683 3.75 3.75 3.75h5.85c.414 0 .75-.336.75-.75s-.337-.75-.75-.75z"></path>
</g>
</svg>
</div>
<div ><span data-testid="app-text-transition-container" style="transition-property: transform; transition-duration: 0.3s; transform: translate3d(0px, 0px, 0px);"><span ></span></span></div>
</div>
</div>
</div>
<div >
<div aria-label="0 Likes. Like" role="button" tabindex="0" data-testid="like">
<div dir="ltr" >
<div >
<div ></div>
<svg viewBox="0 0 24 24" aria-hidden="true" >
<g>
<path d="M12 21.638h-.014C9.403 21.59 1.95 14.856 1.95 8.478c0-3.064 2.525-5.754 5.403-5.754 2.29 0 3.83 1.58 4.646 2.73.814-1.148 2.354-2.73 4.645-2.73 2.88 0 5.404 2.69 5.404 5.755 0 6.376-7.454 13.11-10.037 13.157H12zM7.354 4.225c-2.08 0-3.903 1.988-3.903 4.255 0 5.74 7.034 11.596 8.55 11.658 1.518-.062 8.55-5.917 8.55-11.658 0-2.267-1.823-4.255-3.903-4.255-2.528 0-3.94 2.936-3.952 2.965-.23.562-1.156.562-1.387 0-.014-.03-1.425-2.965-3.954-2.965z"></path>
</g>
</svg>
</div>
<div ><span data-testid="app-text-transition-container" style="transition-property: transform; transition-duration: 0.3s; transform: translate3d(0px, 0px, 0px);"><span ></span></span></div>
</div>
</div>
</div>
<div >
<div aria-expanded="false" aria-haspopup="menu" aria-label="Share Tweet" role="button" tabindex="0" >
<div dir="ltr" >
<div >
<div ></div>
<svg viewBox="0 0 24 24" aria-hidden="true" >
<g>
<path d="M17.53 7.47l-5-5c-.293-.293-.768-.293-1.06 0l-5 5c-.294.293-.294.768 0 1.06s.767.294 1.06 0l3.72-3.72V15c0 .414.336.75.75.75s.75-.336.75-.75V4.81l3.72 3.72c.146.147.338.22.53.22s.384-.072.53-.22c.293-.293.293-.767 0-1.06z"></path>
<path d="M19.708 21.944H4.292C3.028 21.944 2 20.916 2 19.652V14c0-.414.336-.75.75-.75s.75.336.75.75v5.652c0 .437.355.792.792.792h15.416c.437 0 .792-.355.792-.792V14c0-.414.336-.75.75-.75s.75.336.75.75v5.652c0 1.264-1.028 2.292-2.292 2.292z"></path>
</g>
</svg>
</div>
</div>
</div>
</div>
**</div>**
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</article>
My code to try to find it:
try:
tweets = wait.until(EC.presence_of_all_elements_located(
(By.CSS_SELECTOR, "article[data-testid='tweet']a[href]")))
links = [link.get_attribute('href')
for link in tweets]
print(links)
except TimeoutException:
print("Couldn't detect any tweets")
Error:
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
(Session info: chrome=98.0.4758.82)
Stacktrace:
Backtrace:
Ordinal0 [0x00977AC3 2587331]
Ordinal0 [0x0090ADD1 2141649]
Ordinal0 [0x00803BB8 1063864]
Ordinal0 [0x008063D7 1074135]
Ordinal0 [0x0080629E 1073822]
Ordinal0 [0x00806510 1074448]
Ordinal0 [0x0082FF42 1244994]
Ordinal0 [0x008303CB 1246155]
Ordinal0 [0x0085A64C 1418828]
Ordinal0 [0x008486D4 1345236]
Ordinal0 [0x00858A0A 1411594]
Ordinal0 [0x008484A6 1344678]
Ordinal0 [0x008253F6 1201142]
Ordinal0 [0x008262E6 1204966]
GetHandleVerifier [0x00B1DF22 1680738]
GetHandleVerifier [0x00BD0DBC 2413564]
GetHandleVerifier [0x00A0D151 563089]
GetHandleVerifier [0x00A0BF13 558419]
Ordinal0 [0x0091081E 2164766]
Ordinal0 [0x00915508 2184456]
Ordinal0 [0x00915650 2184784]
Ordinal0 [0x0091F5BC 2225596]
BaseThreadInitThunk [0x753FFA29 25]
RtlGetAppContainerNamedObjectPath [0x76FE7A9E 286]
RtlGetAppContainerNamedObjectPath [0x76FE7A6E 238]
What seems to be wrong?
CodePudding user response:
This error message...
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
...implies that the locator strategy you have used is not a valid locator strategy.
You nneed to insert a space between article[data-testid='tweet']
and a[href]
to make it a valid locator strategy. So effectively your line of code would have been:
try:
tweets = wait.until(EC.presence_of_all_elements_located(
(By.CSS_SELECTOR, "article[data-testid='tweet'] a[href]")))
links = [link.get_attribute('href')
for link in tweets]
print(links)
except TimeoutException:
print("Couldn't detect any tweets")
However, to extract the value of the href
attribute i.e. /kkanagas/status/1491996075573587994
you can use the following locator strategy:
try:
tweets = wait.until(EC.presence_of_all_elements_located(
(By.CSS_SELECTOR, "article[data-testid='tweet'] a[href*='status']")))
links = [link.get_attribute('href')
for link in tweets]
print(links)
except TimeoutException:
print("Couldn't detect any tweets")