Home > database >  Boost Beast Read Conent By Portions
Boost Beast Read Conent By Portions

Time:06-08

I am trying to understand how can I limit the amount of data that is read from the internet by calling 'read_some' function in boost beast.

The starting point is the incremental read example in the beast's docs. From the docs I understood that the really read data is stored in the flat_buffer. I make the following experiment:

  1. Set max flat_buffer's size to 1024
  2. Connect to a relatively large (several KB) html page
  3. Call read_some one time
  4. Turn the internet off
  5. Try to read the page to the end

Since buffer's capacity is not large enough to store the entire page, my experiment should fail - I should not be able to read the entire page. Nevertheless, it finishes successfully. That means that there exists some additional buffer where the read data is stored. But what is it made for and how can I limit its size?

UPD Here is my source code:

#include <boost/beast/core.hpp>
#include <boost/beast/http.hpp>
#include <boost/beast/version.hpp>
#include <boost/asio/strand.hpp>
#include <cstdlib>
#include <functional>
#include <iostream>
#include <memory>
#include <string>

namespace beast = boost::beast;         // from <boost/beast.hpp>
namespace http = beast::http;           // from <boost/beast/http.hpp>
namespace net = boost::asio;            // from <boost/asio.hpp>

using namespace http;

template<
        bool isRequest,
        class SyncReadStream,
        class DynamicBuffer>
void
read_and_print_body(
        std::ostream& os,
        SyncReadStream& stream,
        DynamicBuffer& buffer,
        boost::beast::error_code& ec ) {
    parser<isRequest, buffer_body> p;
    read_header( stream, buffer, p, ec );
    if ( ec )
        return;
    while ( !p.is_done()) {
        char buf[512];
        p.get().body().data = buf;
        p.get().body().size = sizeof( buf );
        read_some( stream, buffer, p, ec );
        if ( ec == error::need_buffer )
            ec = {};
        if ( ec )
            return;
        os.write( buf, sizeof( buf ) - p.get().body().size );
    }
}

int main(int argc, char** argv)
{
    try
    {
        // Check command line arguments.
        if(argc != 4 && argc != 5)
        {
            std::cerr <<
            "Usage: http-client-sync <host> <port> <target> [<HTTP version: 1.0 or 1.1(default)>]\n" <<
            "Example:\n" <<
            "    http-client-sync www.example.com 80 /\n" <<
            "    http-client-sync www.example.com 80 / 1.0\n";
            return EXIT_FAILURE;
        }
        auto const host = argv[1];
        auto const port = argv[2];
        auto const target = argv[3];
        int version = argc == 5 && !std::strcmp("1.0", argv[4]) ? 10 : 11;

        // The io_context is required for all I/O
        net::io_context ioc;

        // These objects perform our I/O
        boost::asio::ip::tcp::resolver resolver(ioc);
        beast::tcp_stream stream(ioc);

        // Look up the domain name
        auto const results = resolver.resolve(host, port);

        // Make the connection on the IP address we get from a lookup
        stream.connect(results);

        // Set up an HTTP GET request message
        http::request<http::string_body> req{http::verb::get, target, version};
        req.set(http::field::host, host);
        req.set(http::field::user_agent, BOOST_BEAST_VERSION_STRING);

        // Send the HTTP request to the remote host
        http::write(stream, req);

        // This buffer is used for reading and must be persisted
        beast::flat_buffer buffer;

        boost::beast::error_code ec;
        read_and_print_body<false>(std::cout, stream, buffer, ec);
    }
    catch(std::exception const& e)
    {
        std::cerr << "Error: " << e.what() << std::endl;
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

CodePudding user response:

The operating system's TCP IP stack obviously needs to buffer data, so that's likely where it gets buffered.

The way to test your desired scenario:

Live On Coliru

#include <boost/beast.hpp>
#include <iostream>
#include <thread>
namespace net = boost::asio;
namespace beast = boost::beast;
namespace http = beast::http;
using net::ip::tcp;

void server()
{
    net::io_context ioc;
    tcp::acceptor acc{ioc, {{}, 8989}};
    acc.listen();

    auto conn = acc.accept();

    http::request<http::string_body> msg(
        http::verb::get, "/", 11, std::string(20ull << 10, '*'));
    msg.prepare_payload();

    http::request_serializer<http::string_body> ser(msg);

    size_t hbytes = write_header(conn, ser);
    // size_t bbytes = write_some(conn, ser);
    size_t bbytes = write(conn, net::buffer(msg.body(), 1024));

    std::cout << "sent " << hbytes << " header and " << bbytes << "/"
              << msg.body().length() << " of body" << std::endl;
    // closes connection
}

namespace {
    template<bool isRequest, class SyncReadStream, class DynamicBuffer>
        auto
        read_and_print_body(
                std::ostream& /*os*/,
                SyncReadStream& stream,
                DynamicBuffer& buffer,
                boost::beast::error_code& ec)
        {
            struct { size_t hbytes = 0, bbytes = 0; } ret;

            http::parser<isRequest, http::buffer_body> p;
            //p.header_limit(8192);
            //p.body_limit(1024);

            ret.hbytes = read_header(stream, buffer, p, ec);
            if(ec)
                return ret;
            while(! p.is_done())
            {
                char buf[512];
                p.get().body().data = buf;
                p.get().body().size = sizeof(buf);
                ret.bbytes  = http::read_some(stream, buffer, p, ec);
                if(ec == http::error::need_buffer)
                    ec = {};
                if(ec)
                    break;
                //os.write(buf, sizeof(buf) - p.get().body().size);
            }
            return ret;
        }
}

void client()
{
    net::io_context ioc;
    tcp::socket conn{ioc};
    conn.connect({{}, 8989});

    beast::error_code ec;
    beast::flat_buffer buf;
    auto [hbytes, bbytes] = read_and_print_body<true>(std::cout, conn, buf, ec);

    std::cout << "received hbytes:" << hbytes << " bbytes:" << bbytes
              << " (" << ec.message() << ")" << std::endl;
}

int main()
{
    std::jthread s(server);

    std::this_thread::sleep_for(std::chrono::seconds(1));
    std::jthread c(client);
}

Prints

sent 41 header and 1024/20480 of body
received 1065 bytes of message (partial message)

Side Notes

You start your question with:

I am trying to understand how can I limit the amount of data that is read from the internet

That's built in to Beast

by calling 'read_some' function in boost beast.

To just limit the total amount of data read, you don't have to use read_some in a loop (http::read by definition already does exactly that).

E.g. with the above example, if you replace 20ull<<10 (20 KiB) with 20ull<<20 (20 MiB) you will exceed the default size limit:

http::request<http::string_body> msg(http::verb::get, "/", 11,
                                     std::string(20ull << 20, '*'));

Prints Live On Coliru

sent 44 header and 1024/20971520 of body
received hbytes:44 bbytes:0 (body limit exceeded)

You can also set your own parser limits:

http::parser<isRequest, http::buffer_body> p;
p.header_limit(8192);
p.body_limit(1024);

Which prints Live On Coliru:

sent 41 header and 1024/20480 of body received hbytes:41 bbytes:0 (body limit exceeded)

As you can see it even knows to reject the request after just reading the headers, using the content-length information from the headers.

  • Related