I'm working on an answer site crawler, how should I get the questions text inside this td, instead of including the text in the tag
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Document</title>
</head>
<body>
<table
border="0"
width="100%"
onclick="GiveAns(event.srcElement||event.target)"
onm ouseover="ChangeColor(event.srcElement||event.target)"
>
<tbody>
<tr>
<th >Question number</th>
<th >key<br />answer</th>
<th >Choose your <br />own answer</th>
<th >Selected Topics<span id="cdes"></span></th>
<th >Error<br />Notification</th>
</tr>
</tbody>
<tbody id="s1234">
<tr id="d1">
<th><a name="P1">1</a></th>
<th><b>(1)</b></th>
<th><tt> </tt></th>
<td>
question1
<i>
<a>(1)ans1</a>
</i>
<i>(2)ans2</i>
<i>(3)ans3</i>
<i>ans4</i>。<q>360 02-137</q>
</td>
<th onclick="E(this)"><img src="/e.gif" /></th>
</tr>
<tr id="d2">
<th><a name="P2">2</a></th>
<th><b>(4)</b></th>
<th><tt> </tt></th>
<td>
question2
<i>(1)ans1</i>
<i>(2)ans2</i>
<i>(3)ans3</i>
<i>
<a>(4)ans4</a>
</i>
。
<q>1149 </q>
</td>
<th onclick="E(this)"><img src="/e.gif" /></th>
</tr>
</tbody>
</table>
</body>
</html>
This is my table for site
I tried these methods
document.querySelectorAll('#s1234 tr > td:not(i)').forEach((e)=>{console.log(e)})
document.querySelectorAll('#s1234 tr > td'))
But all of these methods contain <i> and <a> tags, so how do I get just the question text?
The result I need is like this: "question1"
CodePudding user response:
It isn't super clear what you are asking, do you just need the innerText
? e.g.
document.querySelectorAll('#s1234 tr > td').forEach((e) => {
console.log(e.innerText)
})
Gives
question1 (1)ans1 (2)ans2 (3)ans3 ans4。360 02-137
question2 (1)ans1 (2)ans2 (3)ans3 (4)ans4 。 1149
Edit:
if you just need the question
part then...
document.querySelectorAll('#s1234 tr > td').forEach((e) => {
console.log(e.firstChild.data.trim())
})
gives...
question1
question2
CodePudding user response:
I believe you only want to extract Question, your statements are little confusing
document.querySelectorAll('#s1234 tr > td').forEach((e)=>{console.log(e.firstChild.data)}) # this will give you only question
CodePudding user response:
You can't do it with a CSS selector (see this question).
But since you're already in JS, you can get text content in a few other ways, for which there is also a dedicated question with many options (probably this is currently the best one).
Applied to the question's code:
const extractText= (node) => {
// Assuming there's 1 text node you want.
// Change to `filter` if you want to extract all text nodes in an element.
const text = [...node.childNodes].find(child => child.nodeType === Node.TEXT_NODE);
return text && text.textContent.trim();
}
const allTextNodes = [...document.querySelectorAll('#s1234 tr > td')].map(extractText);
CodePudding user response:
Thank you, but the answer can only be given to one person, and I would love to give it to all of you
The above three items worked very well for me, thank you very much!