Why do we use additional semaphore in Producer-Consumer in C-CodePudding

#include <stdio.h>
#include <pthread.h>
#include <semaphore>

sem_t empty, full, mutex;
#define N 10

void* producerThread(void*) {
    int i = 0;

    while (1) {
        sem_wait(&empty);
        sem_wait(&mutex);

        buff[i] = rand();
        i =   i % N;

        sem_post(&mutex);
        sem_post(&full);
    }
}

void* consumerThread(void*) {
    int i = 0;

    while (1) {
        sem_wait(&full);
        sem_wait(&mutex);

        printf("%d\n", buff[i]);
        i =   i % N;

        sem_post(&mutex);
        sem_post(&empty);
    }
}

void main() {
    pthread_t producer, consumer;

    sem_init(&empty, N);
    sem_init(&full, 0);
    sem_init(&mutex, 1);

    pthread_create(&producer, NULL, &producerThread, NULL);
    pthread_create(&consumer, NULL, &consumerThread, NULL);

    pthread_join(producer, NULL);
    pthread_join(consumer, NULL);

    sem_destroy(&empty);
    sem_destroy(&full);
    sem_destroy(&mutex);
}

I have the following question, this code is well know Producer-Consumer problem when learning about multi-threading, but i do not understand why do we need an additional semaphore (mutex) in this case? Can't we do everything with semaphores full & empty and there will be no problems whatsoever where producer produces on the spot consumer didnt already consume or vice-versa? Afaik with mutex we are adding additional bagage on the code and this is not necessary. Can someone point to me why we need 3 semaphores instead of 2?

I've tried running this code on my computer and everything works the same with and without additional semaphore, so I do not understand why did author choose 3 semaphores in this instance?

CodePudding user response：

The empty and full semaphores take values in the range [0..N), allowing the producer to run ahead of the consumer by up to N elements.

The mutex semaphore only bounces between values 0 and 1, and enforces a critical section ensuring that only one thread is touching any part of the buffer memory at a time. However, the separate computation of i on each thread and the empty/full handshake ensures there can be no data race on individual elements of buff, so that critical section is probably overkill.

You don't show the definition of buff. For a sufficiently narrow datatype (like individual bytes), some architectures may exhibit word tearing on concurrent writes to adjacent elements. However in your example only one thread is performing writes, so even in the presence of word-tearing the concurrent adjacent reads are unlikely to observe a problem.