Home > Mobile >  Why unicode character is printing even using 1 byte to handling it
Why unicode character is printing even using 1 byte to handling it

Time:11-09

I am doing a school project and I came across something that shouldn't work in theory.

I need to create two programs where one communicates with the other through unix signals, I will call them client and server, I pass a message in my client's argv, break each char into bit and send to the server

The idea is to use bitwise communication (Something simple and rudimentary, if the bit is 0 I send SIGUSR1 to the server PID using the kill system call, if it is 1 I send SIGUSR2.

#client send a char to server
int send_sig(int pid, unsigned char b)
{
    int a;

    a = 0;
    while (a < 8)
    {
        if (b & 1)
            kill(pid, SIGUSR2);
        else
            kill(pid, SIGUSR1);
        b = b >> 1;
        a  ;
        usleep(1000);
    }
    return (0);
}

the problem is when I use unicode characters, the argv will always be a string (an array of char) so when I pass some unicode character it will vary from 1 to 4 bytes, even so the process continues normal, the problem happens on my server side where I get these bits

The way I structured my code is that I need to print one bit at a time (which is acceptable since in theory a char in C is equivalent to one byte) but even when passing 4 byte unicode characters, printing them one at a time it keeps working (it's like Russian roulette, it breaks sometimes and works normally sometimes)

# Server receiving the 
unsigned char   reverse(unsigned char b)
{
    b = (b & 0xF0) >> 4 | (b & 0x0F) << 4;
    b = (b & 0xCC) >> 2 | (b & 0x33) << 2;
    b = (b & 0xAA) >> 1 | (b & 0x55) << 1;
    return (b);
}

void    signal_handler(int sig, siginfo_t *p_info, void *ucontext)
{
    static unsigned int     a = 0;
    static unsigned int     b = 0;

    a <<= 1;
    if (sig == SIGUSR2)
        a  ;
    b  ;
    if (b == 8)
    {
        b = 0;
        ft_printf("%c\0", reverse(a));
    }
    p_info = p_info;
    ucontext = ucontext;
}

Why this behavior happens ? wasn't it just for it to break and print something wrong ?

Expeculations:

  • the way I print on stdout without NULL byte make the shell and terminal interpreter a whole byte without losing the UTF-8 map

  • The unicode fitt in char (But this is impossible I guess)

reproduce this behavior with theses code:

#client.c file
#include <signal.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
void send_sig(int pid, char b)
{
    int a = 0;
    printf("%c", b);
    while (a < 8)
    {
        if (b & 1)
            kill(pid, SIGUSR2);
        else
            kill(pid, SIGUSR1);
        b >>= 1;
        a  ;
        usleep(500);
    }
}
int main(int argc, char *argv[])
{
    char *s = "           
  • Related