Home > front end >  Find text into response using expression
Find text into response using expression

Time:07-30

I have this code which I want to use to check domain registration:

    private final static String WHO ="cnn.com";
    private final static String WHOIS_HOST = "whois.verisign-grs.com";
    private final static int WHOIS_PORT = 43;

    public static void main(final String[] args) throws IOException {
        SpringApplication.run(TestApplication.class, args);

        int c;
        Socket socket = null;

        String query = WHO   "\r\n";
        byte buf[] = query.getBytes();


        String regex = ".*Registry Expiry Date:*";

        try {
            socket = new Socket(WHOIS_HOST, WHOIS_PORT);
            InputStream in = socket.getInputStream();
            OutputStream out = socket.getOutputStream();

            out.write(buf);
            out.flush();

            StringBuilder text = new StringBuilder();
            while ((c = in.read()) != -1) {
                System.out.print((char) c);
                text.append(c);
            }

            boolean matches = Pattern.matches(regex, text.toString());


            System.out.print("\nDone\n"   matches);
        } catch (IOException ex) {
             System.out.print(ex.getMessage());
        } finally {
            if(socket != null){
                try {
                    socket.close();
                } catch (IOException ex) {
                 }
            }
        }
    }

I get this output:

   Domain Name: CNN.COM
   Registry Domain ID: 3269879_DOMAIN_COM-VRSN
   Registrar WHOIS Server: whois.corporatedomains.com
   Registrar URL: http://cscdbs.com
   Updated Date: 2018-04-10T16:43:38Z
   Creation Date: 1993-09-22T04:00:00Z
   Registry Expiry Date: 2026-09-21T04:00:00Z
   Registrar: CSC Corporate Domains, Inc.
   Registrar IANA ID: 299
   Registrar Abuse Contact Email: [email protected]
   Registrar Abuse Contact Phone: 8887802723
   Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
   Domain Status: serverDeleteProhibited https://icann.org/epp#serverDeleteProhibited
   Domain Status: serverTransferProhibited https://icann.org/epp#serverTransferProhibited
   Domain Status: serverUpdateProhibited https://icann.org/epp#serverUpdateProhibited
   Name Server: NS-1086.AWSDNS-07.ORG
   Name Server: NS-1630.AWSDNS-11.CO.UK
   Name Server: NS-47.AWSDNS-05.COM
   Name Server: NS-576.AWSDNS-08.NET
   DNSSEC: unsigned
   URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/
>>> Last update of whois database: 2022-07-29T20:55:54Z <<<

For more information on Whois status codes, please visit https://icann.org/epp

NOTICE: The expiration date displayed in this record is the date the
registrar's sponsorship of the domain name registration in the registry is
currently set to expire. This date does not necessarily reflect the expiration
date of the domain name registrant's agreement with the sponsoring
registrar.  Users may consult the sponsoring registrar's Whois database to
view the registrar's reported date of expiration for this registration.

TERMS OF USE: You are not authorized to access or query our Whois
database through the use of electronic processes that are high-volume and
automated except as reasonably necessary to register domain names or
modify existing registrations; the Data in VeriSign Global Registry
Services' ("VeriSign") Whois database is provided by VeriSign for
information purposes only, and to assist persons in obtaining information
about or related to a domain name registration record. VeriSign does not
guarantee its accuracy. By submitting a Whois query, you agree to abide
by the following terms of use: You agree that you may use this Data only
for lawful purposes and that under no circumstances will you use this Data
to: (1) allow, enable, or otherwise support the transmission of mass
unsolicited, commercial advertising or solicitations via e-mail, telephone,
or facsimile; or (2) enable high volume, automated, electronic processes
that apply to VeriSign (or its computer systems). The compilation,
repackaging, dissemination or other use of this Data is expressly
prohibited without the prior written consent of VeriSign. You agree not to
use electronic processes that are automated and high-volume to access or
query the Whois database except as reasonably necessary to register
domain names or modify existing registrations. VeriSign reserves the right
to restrict your access to the Whois database in its sole discretion to ensure
operational stability.  VeriSign may restrict or terminate your access to the
Whois database for failure to abide by these terms of use. VeriSign
reserves the right to modify these terms at any time.

The Registry database contains ONLY .COM, .NET, .EDU domains and
Registrars.

Done
false

Do you know how I can get only the line content Registry Expiry Date: 2026-09-21T04:00:00Z?

CodePudding user response:

there is a problem in the while you are appending the byte value to StringBuilder so the matcher doesn't works if you change like this way you can have the value of the regex you need

        StringBuilder text = new StringBuilder();
        while ((c = in.read()) != -1) {
           // System.out.print((char) c);
            text.append((char) c);
        }
        System.out.println(text);
        String regex = ".*Registry Expiry Date.*Z";
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(text.toString());
        if (matcher.find())
        {
            System.out.println("\nDone\n"   matcher.group(0));
        }

in this case the result will be

Done
   Registry Expiry Date: 2026-09-21T04:00:00Z

CodePudding user response:

There are a few things to clean up:

  • c is defined as an int, so when you call text.append(c), you are appending an integer value – to fix that, you can cast "c" to be a character: text.append((char) c)
  • your code reads the entire response into a StringBuilder, then processes that text afterward, looking for any potential matches – this isn't a huge inefficiency, but it's not necessary either; you could instead inspect the data along the way to see if you've encountered the interesting part of the data and if so, skip processing the rest
  • Using Pattern and Matcher is ok, but for the case you've presented, it's extra complexity.

Here's a solution that:

  • Opens the socket and "out" in a try-with-resources block – that way, both will be closed for you automatically (simpler code)
  • Closes the output stream "out" – this is probably minor in that your program will work fine without it, but it's always good practice to close anything that you open
  • Opens the input stream in a try-with-resources block – again, less code, automatic management of the opened resources
  • Wraps the input stream in a BufferedReader – this allows you to read the input line by line
  • In the while loop, instead of using Pattern and Matcher, it simply checks if each line of text contains "Registry Expiry Date"
  • If a match is found, it prints the match, then breaks from the loop – it isn't necessary to look at any more input data
String WHO = "cnn.com";
String WHOIS_HOST = "whois.verisign-grs.com";
int WHOIS_PORT = 43;

try (Socket socket = new Socket(WHOIS_HOST, WHOIS_PORT)) {
    try (OutputStream out = socket.getOutputStream()) {
        out.write((WHO   "\r\n").getBytes());
        out.flush();

        try (BufferedReader input = new BufferedReader(new InputStreamReader(socket.getInputStream()))) {
            String line;
            while ((line = input.readLine()) != null) {
                if (line.contains("Registry Expiry Date")) {
                    System.out.println("---> "   line);
                    break; // don't need to read any more input
                }
            }
        }
    }

} catch (Exception e) {
    e.printStackTrace();
}

Here's the output:

--->    Registry Expiry Date: 2026-09-21T04:00:00Z
  • Related