How to count assignment operators in a text file?-CodePudding

My task is to create a program in C that processes a text file in sequential mode. The data must be read from the file one line at a time. Do not back up the entire contents of the file to RAM. The text file contains syntactically correct C code and I have to count how many assignment operators are there.

The only thing I could think of was making a function that searches for patterns and then counts how many times they appear. I insert every assignment operator as a pattern and then sum all the counts together. But this does not work because if I insert the pattern "=" many operators such as "%=" or " =" also get counted in. And even operators like "!=" or "==" get counted, but they shouldn't because they are comparison operators.

My code gives the answer 7 but the real answer should be 5.

#include <iostream>
#include <fstream>

using namespace std;

int patternCounting(string pattern, string text){
    int x = pattern.size();
    int y = text.size();
    int rez = 0;

    for(int i=0; i<=y-x; i  ){
        int j;
        for(j=0; j<x; j  )
            if(text[i j] !=pattern[j]) break;

        if(j==x) rez  ;
    }
    return rez;
}

int main()
{
    fstream file ("test.txt", ios::in);
    string rinda;
    int skaits=0;
    if(!file){cout<<"Nav faila!"<<endl; return 47;}

    while(file.good()){
        getline(file, rinda);
        skaits =patternCounting("=",rinda);
        skaits =patternCounting(" =",rinda);
        skaits =patternCounting("*=",rinda);
        skaits =patternCounting("-=",rinda);
        skaits =patternCounting("/=",rinda);
        skaits =patternCounting("%=",rinda);
    }

    cout<<skaits<<endl;

    return 0;
}

Contents of the text file:

#include <iostream>

using namespace std;

int main()
{
    int z=3;
    int x=4;

    for(int i=3; i<3; i  ){
        int f =x;
        float g%=3;

    }

}

Note that as a torture test, the following code has 0 assignments on older C standards and one on newer ones, due to the abolition of trigraphs.

// = Torture test
int a = 0; int b = 1;

int main()
{
    // The next line is part of this comment until C  17 ??/
    a = b;
    struct S
    {
        virtual void foo() = 0;
        void foo(int, int x = 1);
        S& operator=(const S&) = delete;
        int m = '==';
        char c = '=';
    };
    const char* s = [=]{return "=";}();
    sizeof(a = b);
    decltype(a = b) c(a);
}

CodePudding user response：

There are multiple issues with the code.

The first, rather mundane issue, is your handling of file reading. A loop such as while (file.good()) … is virtually always an error: you need to test the return value of getline instead!

std::string line;
while (getline(file, line)) {
    // Process `line` here.
}

Next, your patternCounting function fundamentally won’t work since it doesn’t account for comments and strings (nor any of C ’s other peculiarities, but these seem to be out of scope for your assignment). It also doesn’t really make sense to count different assignment operators separately.

The third issue is that your test case misses lots of edge cases (and is invalid C ). Here’s a better test case that (I think) exercises all interesting edge cases from your assignment:

int main()
{
    int z=3; // 1
    int x=4; // 2

    // comment with = in it
    "string with = in it";
    float f = 3; // 3
    f = f /= 4; // 4, 5

    for (int i=3; i != 3; i  ) { // 6
        int f=x  = z; // 7, 8
        bool g=3 == 4; // 9
    }
}

I’ve annotated each line with a comment indicating up to how many occurrences we should have counted by now.

Now that we have a test case, we can start implementing the actual counting logic. Note that, for readability, function names generally follow the pattern “verb subject”. So instead of patternCounting a better function name would be countPattern. But we won’t count arbitrary patterns, we will count assignments. So I’ll use countAssignments (or, using my preferred C naming convention: count_assignments).

Now, what does this function need to do?

It needs to count assignments (incl. initialisations), duh.
It needs to discount occurrences of = that are not assignments:
1. inside strings
2. inside comments
3. inside comparison operators

Without a dedicated C parser, that’s a rather tall order! You will need to implement a rudimentary lexical analyser (short: lexer) for C .

First off, you will need to represent each of the situations we care about with its own state:

enum class state {
    start,
    comment,
    string,
    comparison
};

With this in hand, we can start writing the outline of the count_assignments function:

int count_assignments(std::string const& str) {
    auto count = 0;
    auto state = state::start;
    auto prev_char = '\0';

    for (auto c : str) {
        switch (state) {
            case state::start:
                break;
            case state::comment:
                break;
            case state::string:
                break;
            case state::comparison:
                break;
        }
        prev_char = c;
    }

    // Useful for debugging:
    // std::cerr << count << "\t" << str << "\n";

    return count;
}

As you can see, we iterate over the characters of the string (for (c : str)). Next, we handle each state we could be currently in.

The prev_char is necessary because some of our lexical tokens are more than one character in length (e.g. comments start by //, but /= is an assignment that we want to count!). This is a bit of a hack — for a real lexer I would split such cases into distinct states.

So much for the function skeleton. Now we need to implement the actual logic — i.e. we need to decide what to do depending on the current (and previous) character and the current state.

To get you started, here’s the case state::start:

switch (c) {
    case '=':
          count;
        state = state::comparison;
        break;
    case '<': case '>': case '!':
        state = state::comparison;
        break;
    case '"' :
        state = state::string;
        break;
    case '/' :
        if (prev_char == '/') {
            state = state::comment;
        }
        break;
}

Be very careful: the above will over-count the comparison ==, so we will need to adjust that count once we’re inside case state::comparison and see that the current and previous character are both =.

I’ll let you take a stab at the rest of the implementation.

Note that, unlike your initial attempt, this implementation doesn’t distinguish the separate assignment operators (=, =, etc.) because there’s no need to do so: they’re all counted automatically.

CodePudding user response：

The clang compiler has a feature to dump the syntax tree (also called AST). If you have syntactically correct C code (which you don't have), you can count the number of assignment operators for example with the following command line (on a unixoid OS):

clang   -Xclang -ast-dump -c my_cpp_file.cpp | egrep "BinaryOperator.*'='" | wc -l

Note however that this will only match real assigments, not copy initializations, which also can use the = character, but are something syntactically different (for example an overloaded = operator is not called in that case).

If you want to count the compound assignments and/or the copy initializations as well, you can try to look for the corresponding lines in the output AST and add them to the egrep search pattern.

CodePudding user response：

In practice, your task is incredibly difficult.

Think for example of C raw string literals (you could have one spanning dozen of source lines, with arbitrary = inside them). Or of asm statements doing some addition....

Think also of increment operators like (for some declared int x;) a x (which is equivalent to x = x 1; for a simple variable, and semantically is an assignment operator - but not syntactically).

My suggestion: choose one open source C compiler. I happen to know GCC internals.

With GCC, you can write your own GCC plugin which would count the number of Gimple assignments.

Think also of Quine programs coded in C ...

NB: budget months of work.