Home > OS >  Search for all occurrences of a substring within a string
Search for all occurrences of a substring within a string


This program must search for all occurrences of string 2 in string 1.
It works fine with all the strings i have tried except with
s1="Ciao Cia Cio Ociao ciao Ocio CiCiao CieCiaCiu CiAo eeCCia"
in this case the correct result would be: 0 5 31 39 54
instead, it prints 0 5 39.
I don't understand why, the operation seems the same as
s1="Sette scettici sceicchi sciocchi con la sciatica a Shanghai"
with which the program works correctly.
I can't find the error!
The code:

#include <stdio.h>

void main()
    #define MAX_LEN 100

        // Input
    char s1[] = "Ciao Cia Cio Ociao ciao Ocio CiCiao CieCiaCiu CiAo eeCCia";
    unsigned int lengthS1 = sizeof(s1) - 1;
    char s2[] = "Cia";
    unsigned int lengthS2 = sizeof(s2) - 1;
    // Output
    unsigned int positions[MAX_LEN];
    unsigned int positionsLen;

    // Blocco assembler
        MOV ECX, 0
        MOV EAX, 0
        DEC lenghtS1
        DEC lengthS2
        MOV EBX, lengthS1
        CMP EBX, 0
        JZ fine
        MOV positionsLen, 0
        XOR EBX, EBX
        XOR EDX, EDX

    uno: CMP ECX, lengthS1
    JG fine
    CMP EAX, lengthS2
    JNG restart

    restart : MOV BH, s1[ECX]
    CMP BH, s2[EAX]
    JE due
    JNE tre

    due : XOR EBX, EBX
    CMP EAX, 0
    JNE duedue
    MOV positions[EDX * 4], ECX
    JMP uno

    duedue : CMP EAX, lengthS2
    JNE duetre
    INC positionsLen
    JMP uno

    duetre : INC EAX
    JMP uno

    tre : XOR EBX, EBX
    JMP uno


    // Stampa su video
        unsigned int i;
        for (i = 0; i < positionsLen; i  )
            printf("Sottostringa in posizione=%d\n", positions[i]);


CodePudding user response:

The trickier programming gets, the more systematic and thoughtful your approach should be. If you programmed x86 assembly for a decade, you will be able to skip a few of the steps I line out below. But especially if you are a beginner, you are well advised to not expect from yourself, that you can just hack in assembly with confidence and without safety nets.

The code below is just a best guess (I did not compile or run or debug the C-code). It is there, to give the idea.

  • Make a plan for your implementation
    So you will have 2 nested loops, comparing the characters and then collecting matches.
  • Implement the "assembly" in low level C, which already resembles the end product.
    C is nearly an assembly language itself...
  • Write yourself tests, debug and analyze your "pseudo assembly" C-version.
  • Translate the C lines step by step by assembly lines, "promoting" the c-lines to comments.

This is my first shot at doing that - the initial c-version, which might or might not work. But it is still faster and easier to write (with the assembly code in mind). And easier to debug and step through. Once this works, it is time to "translate".

#include <stdint.h>
#include <stddef.h>
#include <string.h>

size_t substring_positions(const char *s, const char* sub_string, size_t* positions, size_t positions_capacity) {
  size_t positions_index = 0;
  size_t i = 0;
  size_t j = 0;
  size_t i_max = strlen(s) - strlen(sub_string);
  size_t j_max = strlen(sub_string) - 1;

  if (i > i_max)
    goto end;
  j = 0;
  if (j == j_max)
    goto match;
  if (s[i j] == sub_string[j])
    goto go_on;
  i  ;
  goto loop0;
  j  ;
  goto loop1;
  positions[positions_index] = i;
  positions_index  ;
  if (positions_index < positions_capacity)
    goto loop0;
  goto end;
  return positions_index;

As you can see, I did not use "higher level language features" for this function (does C even have such things?! :)). And now, you can start to "assemble". If RAX is supposed to hold your i variable, you could replace size_t i = 0; with XOR RAX,RAX. And so on.

With that approach, other people even have a chance to read the assembly code and with the comments (the former c-code), you state the intent of your instructions.

CodePudding user response:

Thanks, everyone for the answers here's how i solved it:

#include <stdio.h>

void main()
#define MAX_LEN 100
    char s1[] = "Sette scettici sceicchi sciocchi con la sciatica a Shanghai";  //first string
    unsigned int lengthS1 = sizeof(s1) - 1;
    char s2[] = "icchi";   //second string
    unsigned int lengthS2 = sizeof(s2) - 1;
    unsigned int positions[MAX_LEN];
    unsigned int positionsLen;

        XOR EBX, EBX //i j
        XOR EDX, EDX //j
        XOR ESI, ESI //num occurrences
        XOR EDI, EDI //length
        XOR ECX, ECX //i

        //if s1<s2 it means that I have no occurrences
        MOV EDI, lengthS1 //length
        CMP EDI, lengthS2 
        JB end
        SUB EDI, lengthS2 // length of s1 - lenth of s2
        MOV positionsLen, EDI
        loop0 :
        CMP ECX, positionsLen  //if i> lengthS1-lengthS2 jump to end
            JA end
            XOR EDX, EDX        //set to 0 j

            loop1 :
        CMP EDX, lengthS2 //if j==lengthS2 jump to check
            JE check
            XOR EBX, EBX    //set to 0 i j
            MOV EBX, ECX    //load in EBX i
            ADD EBX, EDX    //i j
            MOV AL, s1[EBX]  //move the string index in AL (8bit)
            CMP AL, s2[EDX] //check the contents of the index of both strings
            JNE check   //if not equal jump to check
            INC EDX     //increase j
            JMP loop1  //restart from loop1
            check :
        CMP EDX, lengthS2 //if j==lengthS2 jump to equal
            JE equal
            XOR EDX, EDX    //set to 0 j
            INC ECX     //increase i
            JMP loop0   //restart from loop0
            equal :
        XOR EDX, EDX     //set to 0 j
            MOV positions[ESI * 4], ECX   //load in positions[ESI*4] i
            INC ESI     //increase ESI
            INC ECX     //increase i
            JMP loop0   //restart from loop0
            end :
        MOV positionsLen, ESI   //load in positionsLen ESI


        unsigned int i;
        for (i = 0; i < positionsLen; i  )
            printf("substring in position-%d\n", positions[i]);

  • Related