Home > Back-end >  Selector.select() is not blocking after a SelectionKey.isReadable is read
Selector.select() is not blocking after a SelectionKey.isReadable is read

Time:10-09

I have a Socket that connects to a ServerSocketChannel which passes off to another Selector. The client socket sends a one time message of 8 bytes, I successfully read it, but then selector which I call selectorIO should block on the select() method, but it immediately returns and then re-reads the same message that was already sent.

public void readData()
    {
        int numberOfKeys = 0;
        buffer = ByteBuffer.allocate(8);
        buffer.clear();
        
        while(true)
        {
            try
            {
                //This is not blocking anymore?!
                numberOfKeys = selectorIO.select();
                
                Set<SelectionKey> keys = selectorIO.selectedKeys();
                Iterator<SelectionKey> itr = keys.iterator();
                
                while(itr.hasNext())
                {
                    SelectionKey key = itr.next();
                    if(key.isReadable())
                    {
                        SocketChannel channel = (SocketChannel)key.channel();
                        
                        int numread = channel.read(buffer);
                    
                        String s = new String(buffer.array());
                        System.out.println(s);
                        System.out.println(numread);
                        
                        
                        buffer.flip();
                        
                        //channel.write(buffer);
                        int numwrote = channel.write(buffer);
                        System.out.println(numwrote " Bytes writtent");
                    
                    
                        buffer.flip();
                        
                        //buffer.reset();
                    }   
                    itr.remove();
                }
        
            }
            catch(Exception e)
            {
                System.out.println(e);
            }
        }
    }

CodePudding user response:

When you call the buffer.array() to create the String, the ByteBuffer has no clue that the bytes have been consumed, so the state of the ByteBuffer remains unchanged. It still contains the read bytes and still wants them to be consumed. This causes the rereading of the same message.

The array returned on ByteBuffer.array(), has no clue about how many useful bytes are available. If the array has a capacity of 10 and only 8 byte has been set, you are trying to create a String with 2 bogus bytes. And if only 2 bytes have been read, you try to create as string based on 2 instead of 8 bytes. The string creation approach is incorrect.

After you do a channel.write, you should do a compact or clear depending on if data is still available or not.

I normally use 2 separate ByteBuffers; one for reading and one for writing.

CodePudding user response:

Multiple issues at work.

Buffer Management

String creation broken

You create a ByteBuffer and like all BBs they have a set capacity. You then read into it (int numRead = channel.read(buffer)) and this does one of two things:

  1. The capacity of the BB is less than the amount of bytes that can be immediately stuffed into that buffer, copied straight over from your network card's buffers.

In this case, the entire BB is filled (numRead will be equal to the BB's capacity), and the 'READ READY' status of that channel remains up (because there are still more bytes ready to copy over).

Note that bb.array() returns the entire backing array, but in this scenario, given that the whole BB is filled to capacity, that 'works out' so to speak.

  1. The capacity of the BB is more than the amount of bytes that can be immediately stuffed into that buffer.

In this case, numRead will be less than the total capacity of that bytebuffer, and new String(bb.array()) is broken - that would attempt to turn into a string the bytes you read so far and a whole bunch of garbage at the end.

new String(bb.array(), 0, bb.position()) would do the job, but in general this isn't how you're meant to be doing things. For starters, you're now charset-confused (You really should be using new String(bb.array(), 0, bb.position(), StandardCharsets.UTF_8) - do not ever convert bytes to chars or vice versa unless you specify which encoding is being used, otherwise the system chooses for you and that's rarely correct, and always confusing).

No proper resetting

The general way a buffer is meant to be used is like this:

  • fill it (either you fill it, or you call read() on something).
  • flip it.
  • process with it (either you give it to something that sends the data in it, or you go through the bytes).
  • clear it.
  • repeat.

You fill it (channel.read()), then use direct array manipulation instead of flip reads to 'process' it (by passing the backing array to a string constructor), and then you .flip() it which is the wrong call, you want .clear().

BBs work that way because, well, logic:

  • They have a set capacity and you don't necessarily use all of that capacity. Often you use a little less. So, you first fill it, and then you want that BB to allow injecting data from 0 all the way up to capacity: the 'position' is 0 (and as we fill this thing it updates), the 'limit' is set to the capacity.
  • Then to process it, we want position to be 0 again (we start processing from the beginning of course), but we want limit not to be the end, because maybe it wasn't fully filled up exactly to capacity by whatever process put data into this thing... we want limit to be the position as it was (as that's where the 'process that filled the buffer' left things). flip() does this: It sets position back to 0 and limit to where the position was.

Once you've read data into your buffer and then processed that data, you want clear: You want position back to 0 and the limit back to the capacity, ready for the process that fills the buffer to star filling it again. clear() does that. Your code calls flip() which is wrong.

Confusion about selectors

A selector is set up with certain thing you are 'interested in'. When you ask it to .select(), you're saying:

  1. Is any of the stuff I'm interested in possible right now? If yes, return immediately.
  2. If not, go to sleep until something I'm interested in might be possible.

The thing is, as you process a channel your opinion on what you're interested in changes over time, and you need to update that selector and turn on/off SelectorKeys as needed.

For example, let's say you are writing a simple chat program. Alice just pasted half of the collected works of shakespeare and your chat program now needs to send all this to Bob. You should now turn on SelectorKey.OP_WRITE on bob's network channel. It should have been off before as you did not have anything to send to bob. But you have something to send now, so turn it on.

You then go to select() which is highly likely to return immediately (the network card has free buffer space for bob's connection). You start copying those collected works of shakespeare over into the bytebuffer but you won't 'make it' - that buffer's capacity is less than shakespeare's total size. That's the point, that's fine. You then hand that buffer to the network and go back to selecting while still interested in OP_WRITE because you haven't copied all of shakespeare's collected works yet, you only did like a quarter so far.

Eventually the network clears that buffer out through the network cable, and only then will your selector go: Oh, hey, we're ready for some more writing!

You keep doing this process (add some more of the shakespeare you need to send) until you stuff the last of it in the buffer you then hand to the channel. You should then remove SelectorKey.OP_WRITE because you now no longer care that the network buffer has room.

Whilst all this was going on, you have a problem: What if Alice keeps sending more and more books, and she sends them faster than bob can receive them? That's possible, of course - maybe Alice is on glass fiber and Bob is on a satellite phone. You can of course choose to buffer all of this server side, but all things are limited: There comes a point when Alice has queued up 50GB worth of book content that you still have to send to Bob. You can either decide that your server will just crash if Alice does this, or, you're going to have to put in a limit: Once the 'data that alice sent that I have yet to shove into the bob's channel' reaches a certain amount, you have to go: Okay, Alice, no more.

When that happens, you have to deregister the OP_READ key - you know alice has sent you some data that is ready to read, but you don't want to read it, your buffers are full. This is sensible: If Bob has the slower connection and alice is sending heaps of data to bob, you can't process alice's bytes as fast as she can send them.

Remember also that .select() is free to return spuriously (for no reason). Your code cannot assume 'oh hey select() returned therefore there MUST be at least one thing ready to do here'. Maybe not. Why does it 'quickly return twice'? Cuz. the JVM is allowed to.

Low-level async NIO like this tends to cause spinning fans. This is the logic:

  1. Your code loop works very simply: You while (true) {} your way through: Do whatever I can do. Write whatever I can write, read whatever I can read, and then loop to do it over again forever and ever.
  2. The x.select() call is the only thing stopping a runaway while loop. That is the only place you ever 'sleep'. It's async: Thread sleeping is the death of your app (in an async model nothing is ever allowed to sleep except when selecting).
  3. If the selector is misconfigured, for example you have OP_WRITE registered and nothing to write, the selector will always instantly return.
  4. Thus, your code is runaway: It loops forever never sleeping, causing 100% CPU usage, fans turn on, laptops drain their battery in minutes, power is wasted, things get hot, IAAS costs go through the roof.

async NIO is rocket science; it's really really hard to do it correctly. Usually you want to use frameworks that make things easier like grizzly or netty.

Likely: Focusing on the wrong thing

Writing low-level async code is like writing an app in low-level machine code. People tend to do it because 'they think it will be faster' but the only thing they accomplish is that they make a task that took an hour to program, and made it a thing that takes a week, the end result is a hard to test, buggy mess, and is actually slower because you don't know what you are doing and you underestimate how well all the middle layers (the compiler, the runtime, the OS, and so on) optimize.

There are reasons to want to go that low level (when you're writing a kernel driver, for example), but usually you don't.

Same here. Why are you using NIO here, exactly? It's not faster except in quite exotic circumstances, and it's definitely a lot harder to code for it. Java suffers (like most languages) from 'the red/blue' problem quite severely.

OSes have absolutely no issue handling 5000 threads and can do it quite efficiently. Yes, 'oh no the context switch', but note that context switching is intrinsic to handling many connections simultaneously: Cache misses are going to be frequent and async doesn't solve that problem. The blog posts writing about how async is cool because it avoids 'context switches' all seem to forget that a cache miss due to having to hop to the buffers of another connection are just as much a 'context switch' as a thread hop.

The one thing that you need to take care of when writing this code in a threaded fashion which is way, way simpler to write and maintain and test, is that you want to manage the stack sizes of your thread: You both want your threads to use limited stack size (if an exception occurs and the stack trace is a screen ful, that's a problem), and you want to set them up with limited sizes. You can specify stack sizes when creating threads (the thread constructor allows it, and various things that make threads such as a threaded ExecutorPool let you specify either the stack size, or a closure that makes threads). Use that and you can just write code that swiftly processes 5000 simultaneous connections using 10,000 threads, and it's all much, much simpler to write than async. If you must go with async, use a framework to avoid the complications.

To go back to that alice sends faster than bob can receive model, note how much easier it is:

  1. You ask alice's InputStream to fill some byte array.
  2. You then ask bob's OutputStream to send the bytes in that array from 0 to however many you read.
  3. Go back to 1.

That's it. With the async stuff, either alice outpaces bob (in which case you better be turning off OP_READ on alice's connection to gate her input), or, bob outpaces alice (in which case you need to be turning off and on bob's OP_WRITE), or even that due to vagaries in network speeds, sometimes alice outpaces bob, and sometimes bob outpaces alice.

With sync, as above, none of that matters - alice's read() call or bob's write() call, as needed, blocks, and that fixes all. See how much simpler that is?

  • Related