Home > Net >  Perl do input one char from stdin
Perl do input one char from stdin

Time:12-16

How can Perl do input from stdin, one char like

readline -N1

does?

CodePudding user response:

You can do that with the base perl distribution, no need to install extra packages:

use strict;
sub IO::Handle::icanon {
        my ($fh, $on) = @_;
        use POSIX;
        my $ts = new POSIX::Termios;
        $ts->getattr(fileno $fh) or die "tcgetattr: $!";
        my $f = $ts->getlflag;
        $ts->setlflag($on ? $f | ICANON : $f & ~ICANON);
        $ts->setattr(fileno $fh) or die "tcsetattr: $!";
}

# usage example
# a key like `Left` or `á` may generate multiple bytes
STDIN->icanon(0);
sysread STDIN, my $c, 256;
STDIN->icanon(1);
# the read key is in $c

Reading just one byte may not be a good idea because it will just leave garbage to be read later when pressing a key like Left or F1. But you can replace the 256 with 1 if you want just that, no matter what.

CodePudding user response:

<STDIN> will read stdin one byte (C char type, which is not the same as a character which these days are typically made of several bytes except for those in the US-ASCII charset) at a time from stdin if the record separator is set to a reference to the number 1.

$ echo perl | perl -le '$/ = \1; $a = <STDIN>; print "<$a>"'
<p>

Note that underneath, it may read (consume) more than one byte from the input. Above, the next <STDIN> within perl would return <e>, but possibly from some large buffer that was read beforehand.

$ echo perl | (perl -le '$/ = \1; $a = <STDIN>; print "<$a>"'; wc -c)
<p>
0

Above, you'll notice that wc didn't receive any input as it had all already been consumed by perl.

$ echo perl | (PERLIO=raw perl -le '$/ = \1; $a = <STDIN>; print "<$a>"'; wc -c)
<p>
4

This time, wc got 4 bytes (e, r, l, \n) as we told perl to use raw I/O so the <STDIN> translates to a read(0, bud, 1).

Instead of <STDIN>, you can use perl's read with the same caveat:

$ echo perl | (perl -le 'read STDIN, $a, 1; print "<$a>"'; wc -c)
<p>
0
$ echo perl | (PERLIO=raw perl -le 'read STDIN, $a, 1; print "<$a>"'; wc -c)
<p>
4

Or use sysread which is the true wrapper for the raw read():

$ echo perl | (perl -le 'sysread STDIN, $a, 1; print "<$a>"'; wc -c)
<p>
4

To read one character at a time, you need to read one byte at a time until the end of the character.

You can do it for UTF-8 encoded input (in locales using that encoding) in perl with <STDIN> or read (not sysread) with the -C option, including with raw PERLIO:

$ echo été | (PERLIO=raw perl -C -le '$/ = \1; $a = <STDIN>; print "<$a>"'; wc -c)
<é>
4
$ echo été | (PERLIO=raw perl -C -le 'read STDIN, $a, 1; print "<$a>"'; wc -c)
<é>
4

With strace, you'd see perl does two read(0, buf, 1) system calls underneath to read that 2-byte é character.

Like with ksh93 / bash's read -N (or zsh's read -k), you can get surprises if the input is not properly encoded in UTF-8:

$ printf '\375 12345678' | (PERLIO=raw perl -C -le 'read STDIN, $a, 1; print "<$a>"'; wc -c)
<� 1234>
4

\375 (\xFD) would normally be the first byte of the encoding of a 6 byte character in UTF-8¹, so perl reads all 6 bytes here even though the second to sixth can't possibly be part of that character as they don't have the 8th bit set.

Note that when stdin is a tty device, read() will not return until the terminal at the other end sends a LF (eol), CR (which is by default converted to LF), or eof (usually ^D) or eol2 (usually not defined) character as configured in the tty line discipline (like with the stty command) as the tty driver implements its own internal line editor allowing you to edit what you type before pressing enter.

If you want to read the byte(s) that is(are) sent for each key pressed by the user there, you'd need to disable that line editor (which bash/ksh93's read -N or zsh's read -k do when stdin is a tty), see @guest's answer for details on how to do that.


¹ While now Unicode restricts codepoints to up to 0x10FFFF which means UTF-8 encodings have at most 4 bytes, UTF-8 was originally designed to encode code points up to 0x7fffffff (up to 6 byte encoding) and perl extends it to up to 0x7FFFFFFFFFFFFFFF (13 byte encoding)

  • Related