Writing a clock(loop) that is triggered in the mhz consistently and effeciently for an emulator in m-CodePudding

I'm currently developing an emulator for an old CPU (intel 8085). Said CPU clock is 3.2mhz. I'm trying to be as accurate as possible, and as cross-platform as possible

To me, that means I need a clock that gets called with a frequency of 3.2mhz. I don't care about it being too accurate, anything within 10% accuracy is good enough.

The easy way is

auto _PrevCycleTime = std::chrono::high_resolution_clock::now();
double _TimeBetweenClockCycles = 1.0 / 3200000;
while (1)
{
    auto now = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> t = std::chrono::duration_cast<std::chrono::duration<double>>(now - _PrevCycleTime);
    
    if (t.count() >= _TimeBetweenClockCycles)
    {
        _PrevCycleTime = now;
        //Clock trigger
    }
}

That produces a clock that's called 2.4 million times a second. The bigger problem is that I realised is that even running a while(1) {} uses 50-60% of my CPU. So that won't do.

My next approach is using a sleep function.

long long _TimeBetweenClockCyclesNS = 1000000000 / 3200000
double _TimeBetweenClockCycles = 1.0 / 3200000;
auto now = std::chrono::high_resolution_clock::now();

while (_Running)
{
    std::chrono::duration<double> t = std::chrono::duration_cast<std::chrono::duration<double>>(now - _PrevCycleTime);

    if (t.count() >= _TimeBetweenClockCycles)
    {
        _PrevCycleTime = now;
        //Clock trigger
    }
    else
    {            
        std::this_thread::sleep_for(std::chrono::nanoseconds(_TimeBetweenClockCyclesNS));
    }
}

while that (somewhat) works, it's not consistent at all. I want it to be just "close enough", but the above code gets called:

100.000 times / second in Visual Studio debug mode (<3% cpu usage)
1.7 million times / second in Visual Studio debug mode using 1 nanosecond sleep time instead but again ~30% cpu usage.
100 times / second (yes, only 100, no other change in code) in Visual Studio Release mode (<1% cpu usage)
Even using std::this_thread::sleep_for(std::chrono::nanoseconds(1)) the clock is only triggered 1200 times a second.

I'm most likely missing something obvious and I'm over-complicating it but I'm sure there must be a better way, since other emulators of much "heavier" systems seem to use less CPU while requiring much more accuracy.

In my use case, I mostly care about:

Being cross-platform, but I don't mind writing different code for different OS
Not using an absurd amount of resources
I don't care about it not being very accurate, as long as it does not have too much of an effect on the timing of the CPU. (For example, if a "wait" routine is supposed to wait for 1 second, I don't really mind if it's 0.8 or 1.2 seconds)

What are my options?

(Note: Without any clock limiting logic, my clock can run more than 30 million times a second. Again, using about 50-60% of my CPU. So it should be able to run at 3million times at a much lower CPU usage)

(Note: The code runs in a separate std::thread if that matters)

CodePudding user response：

The important thing to recognize is that nobody can notice if some things happen at the wrong time. You have a CPU talking to some peripherals. Let's say the peripherals are GPIO pins. As long as the CPU turns the GPIO pins on and off at the right time, nobody can actually notice if the CPU is running too quickly in between those times. If it drives a display output, nobody can notice if the display pixels are calculated too quickly, as long as they are displayed at the right frame rate. And so on.

One technique used by emulators is to count up the number of clock cycles used by the CPU instructions. If you're using an interpreted design you can write clockCycles = 5; in the instruction handler. If you're using a JIT design you can do it at the end of each basic block. If you're not using a JIT design and you don't know what a basic block is, you can ignore the previous sentence.

Then, when the CPU actually does something that matters, like changing the GPIO pins, you can sleep to catch up. If 3,200,000 clock cycles happened since the last sleep, but it's only been 0.1 seconds of real time, then you can sleep for 0.9 seconds before updating the screen.

The less "sleep points" you have, the less accurate timing you have, and the less time you waste trying to maintain accurate timing. A video game emulator will typically render the whole frame as fast as possible. In fact since the N64/PS1 era, many emulators don't even bother emulating CPU timing at all. The timing on these systems is so complicated that every game already knows it has to wait for the next frame to start, so the emulator just has to start frames at the correct rate.

Another idea is to calculate and send timing information to the peripheral without actually timing it. Emulators for earlier systems (e.g. SNES) where games do rely on precise display timing can run the CPU at full speed and then tell the display code "at clock cycle 12345 the CPU wrote 0x6789 to register 12." The display code can then calculate which pixel the display was drawing on that clock cycle, and change how it's drawn. There's still no need to actually synchronize the CPU and display timing.

If you want precise timing without horribly slowing down the program, you might want to use an FPGA instead of a CPU.