Home > Mobile >  How to parse #text inside td using css selector
How to parse #text inside td using css selector

Time:11-05

I'm working on an answer site crawler, how should I get the questions text inside this td, instead of including the text in the tag

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Document</title>
  </head>
  <body>
    <table
      border="0"
      width="100%"
      onclick="GiveAns(event.srcElement||event.target)"
      onm ouseover="ChangeColor(event.srcElement||event.target)"
    >
      <tbody>
        <tr>
          <th >Question number</th>
          <th >key<br />answer</th>
          <th >Choose your <br />own answer</th>
          <th >Selected Topics<span id="cdes"></span></th>
          <th >Error<br />Notification</th>
        </tr>
      </tbody>
      <tbody id="s1234">
        <tr id="d1">
          <th><a name="P1">1</a></th>
          <th><b>(1)</b></th>
          <th><tt> </tt></th>
          <td>
            question1
            <i>
              <a>(1)ans1</a>
            </i>
            <i>(2)ans2</i>
            <i>(3)ans3</i>
            <i>ans4</i>。<q>360 02-137</q>
          </td>
          <th  onclick="E(this)"><img src="/e.gif" /></th>
        </tr>
        <tr id="d2">
          <th><a name="P2">2</a></th>
          <th><b>(4)</b></th>
          <th><tt> </tt></th>
          <td>
            question2
            <i>(1)ans1</i>
            <i>(2)ans2</i>
            <i>(3)ans3</i>
            <i>
              <a>(4)ans4</a>
            </i>
            。
            <q>1149 </q>
          </td>
          <th  onclick="E(this)"><img src="/e.gif" /></th>
        </tr>
      </tbody>
    </table>
  </body>
</html>

This is my table for site

I tried these methods

document.querySelectorAll('#s1234 tr > td:not(i)').forEach((e)=>{console.log(e)})
document.querySelectorAll('#s1234 tr > td'))

But all of these methods contain <i> and <a> tags, so how do I get just the question text?

The result I need is like this: "question1"

CodePudding user response:

It isn't super clear what you are asking, do you just need the innerText? e.g.

document.querySelectorAll('#s1234 tr > td').forEach((e) => {
  console.log(e.innerText)
})

Gives

question1 (1)ans1 (2)ans2 (3)ans3 ans4。360 02-137
question2 (1)ans1 (2)ans2 (3)ans3 (4)ans4 。 1149 

Edit:

if you just need the question part then...

document.querySelectorAll('#s1234 tr > td').forEach((e) => {
  console.log(e.firstChild.data.trim())
})

gives...

question1
question2

CodePudding user response:

I believe you only want to extract Question, your statements are little confusing

document.querySelectorAll('#s1234 tr > td').forEach((e)=>{console.log(e.firstChild.data)}) # this will give you only question

CodePudding user response:

You can't do it with a CSS selector (see this question).

But since you're already in JS, you can get text content in a few other ways, for which there is also a dedicated question with many options (probably this is currently the best one).

Applied to the question's code:

const extractText= (node) => {
// Assuming there's 1 text node you want.
// Change to `filter` if you want to extract all text nodes in an element.
  const text = [...node.childNodes].find(child => child.nodeType === Node.TEXT_NODE);
  return text && text.textContent.trim();
}

const allTextNodes = [...document.querySelectorAll('#s1234 tr > td')].map(extractText);

CodePudding user response:

Thank you, but the answer can only be given to one person, and I would love to give it to all of you

The above three items worked very well for me, thank you very much!

  • Related