I have a list of firmwares, and I should filter just the Cortex-M type. Is there any automated way that I can distinguish them from other ARM firmwares?
I have some ideas like Interrupt vector Table (ivt) or looking for other specific features of Cortex-M like Systick, but I am not sure if they are going to work.
CodePudding user response:
You can probably determine heuristically with some degree of reliability:
Raw and complete (i.e. full linked/bootable) Cortex-M images have the vector table at the start. The vector table is distinguished on Cortex-M by starting with an initial stack-pointer which will necessarily be a RAM address somewhat greater than the minimum, followed by the initial program-counter address (reset vector) which will be a ROM address (on some parts that will be in Flash or it may be an internal mask ROM bootloader).
Further, the value at the address referenced by the reset vector will necessarily be a valid instruction. If all these images were built by the same tool-chain with the same runtime-start up, they will contain the same code here (but with differing branch address operands), which you may be able to use as a signature. Otherwise testing for a valid instruction is complex without fully decoding/disassembling it.
With that information you might have something like;
bool isCortexM( uint32_t* image, uint32_t image_length )
{
bool is_cortex_m = false ;
uint32_t sp = image[0] ;
if( sp > 0x20000000u MIN_STACK_SIZE &&
sp < 0x40000000u )
{
uint32_t pc = image[1] && 0x00FFFFFFu ;
if( pc > MIN_VECTOR_TABLE_SIZE &&
pc < 0x20000000u &&
pc < image_length )
{
instruction = image[pc] ;
is_cortex_m = instruction != 0 && instruction != 0xFFFFFFFFu ;
}
}
return is_cortex_m ;
}
The expression:
uint32_t pc = image[1] && 0x00FFFFFFu ;
is because often on Flash memory parts the Flash starts at a higher address - 0x08000000 on STM32 for example, and although it is usually aliased at 0x00000000 for reset purposes, the linker will generate addresses relative to the flash start address (because the memory at 0x00000000 may be re-mappable to RAM or on-chip boot ROM for example). The mask ensures indexes relative to zero rather than the start of the code space.
The expression:
is_cortex_m = instruction != 0 && instruction != 0xFFFFFFFFu ;
is a moderate attempt to exclude common "filler" values that are invalid instructions - it will not determine if the value is truly a valid instruction, but to get there it has to have already passed the first two tests.
MIN_STACK_SIZE
is somewhat arbitrary, but a minimum viable stack for code that does anything useful might be 256 (0x100) bytes. If you are uncertain, you could use zero - some run-time start-ups set the SP in any case even though the hardware can do it for you and in that case it may not be set to anything useful in the table - but that would be unusual and ill-advised.
MIN_VECTOR_TABLE_SIZE
should be at least 0x40 (the core vectors). The actual size is variable across vendor implementations to support peripheral interrupts.
It is not a fool-proof method - but it implements what I would do manually if inspecting a hex-dump of the image in order to "guess" that it is plausibly a Cortex-M image.
If you do not need to automate this process, you would do well to either run the code through a disassembly or load it into an instruction-set simulator and see if it runs.