Home > Software design >  Find the occurences of different keys in a file has multiple columns in C
Find the occurences of different keys in a file has multiple columns in C

Time:09-30

I have a file in the format as belows. I've done parsing it. I have the code to find the occurences of each key by switch-case or if(). But I want to find a better way to do it. Here's the message I need to parse then find occurence of keys. We can see the message on each row is tokenized by ";", can be considered it as 7 columns. The element on column 2 is known as "Name". We can understand if a name appears on each row means its occurence should be incremented. I want to find total occurence of each key name: AMD, OTHER...

09:00:15.390001;AMD;2837197;I;BUY;111;35.20
09:00:15.680001;AMD;2837197;C;BUY;111;35.20
09:00:16.040001;AMD;2837198;I;BUY;20;35.00
09:00:16.500001;AMD;2837196;C;BUY;1;35.10
09:00:16.860001;DVAM2;2837181;C;SELL;36;9.30
09:00:16.870001;AMD;2837198;C;BUY;20;35.00
09:00:17.310001;AMD;2837199;I;SELL;8;36.10
09:00:18.920001;AMD;2837200;I;SELL;9;36.10
09:00:19.190001;DVAM2;2837201;I;SELL;9;9.00
09:00:19.650001;AMD;2837202;I;SELL;160;35.90
09:00:19.940001;OTHER;2837180;C;BUY;7;18.40
09:00:19.960001;AMD;2837202;C;SELL;160;35.90
09:00:20.210001;AMD;2837199;C;SELL;8;36.10
09:00:20.550001;AMD;2837200;C;SELL;9;36.10
09:00:20.640001;AMD;2837203;I;BUY;4;35.70
09:00:21.400001;OTHER;2837204;I;BUY;6;18.20
09:00:21.460001;AMD;2837205;I;BUY;5;35.50
09:00:22.110001;AMD;2837203;A;BUY;4;35.70
09:00:22.350001;DVAM2;2837201;C;SELL;9;9.00
09:00:22.430001;OTHER;2837206;I;BUY;8;18.10
09:00:22.650001;TEST1;2837207;I;SELL;1;32.70
09:00:23.410001;AMD;2837208;I;SELL;9;36.40
09:00:23.420001;AMD;2837208;C;SELL;9;36.40
09:00:24.140001;AMD;2837205;C;BUY;5;35.50
09:00:24.980001;TEST0;2837182;C;SELL;76;23.20
09:00:25.310001;DVAM2;2837185;C;SELL;3;9.00
09:00:25.470001;AMD;2837203;C;BUY;4;35.70
09:00:25.470001;AMD;2837209;I;BUY;4;35.20
09:00:25.470001;OTHER;2837206;C;BUY;8;18.10
09:00:25.630001;TEST0;2837210;I;BUY;3;22.90
09:00:26.020001;AMD;2837209;C;BUY;4;35.20
09:00:26.480001;AMD;2837211;I;SELL;8;36.00
09:00:26.960001;AMD;2837211;C;SELL;8;36.00
09:00:27.060001;AMD;2837212;I;SELL;5;36.20
09:00:27.350001;AMD;2837213;I;BUY;9;35.30
09:00:27.690001;OTHER;2837204;C;BUY;6;18.20
09:00:27.960001;TEST4;2837214;I;SELL;9;16.20

Here's my code

// IManager.h
class inventory
{
    public:
          std::unordered_map<std::string, int64_t> orderCountPerDVAM1Symbol, orderCountPerDVAM2Symbol, orderCountPerOTHERSymbol;
}

//BManager.cpp
    // Order count of each key, e.g. DVAM1 as you see in the attached photo will be increased no matter we seel or buy it. 
    int64_t inventory::orderCounts(std::string symbols)
    {
        if (sellHead != NULL && orderCountPerDVAM1Symbol.find(symbols) != orderCountPerDVAM1Symbol.end() && symbols == "DVAM1")
            orderCountPerDVAM1Symbol[sellTail->symbol]  ; // increment order count when selling     
        
        if (buyHead != NULL && orderCountPerDVAM1Symbol.find(symbols) != orderCountPerDVAM1Symbol.end() && symbols == "DVAM1")
            orderCountPerDVAM1Symbol[buyTail->symbol]  ; // increment order count when buying       
                                                    
        return orderCountPerDVAM1Symbol[sellTail->symbol];
    }



// main.cpp   
    struct messageData // Defining one row of message data
    {
        // Message Format: timestamp;symbol;order-id;operation;side;volume;price
    
        std::string timestamp;
        std::string symbol;
        unsigned long long int orderid;
        char operation; 
        char side;  
        unsigned long int volume; 
        float price;
    };
    
    int main(int argc, const char * argv[])
    {
        messageData inMsg;
    
        std::ifstream messageStream("test1.txt"); //orders coding test developer.dat
        std::ofstream logBook("_LogBook.txt", std::ios::out);
    
        int64_t orderCountDVAM1, orderCountDVAM2, orderCountOther, orderCount0, orderCount1, orderCount2, orderCount3, orderCount4, orderCount5, orderCount6, orderCount7;
    
    while (std::getline(messageStream, messageLine)) // Loop until last line of message stream
        {
            std::stringstream lineStream(messageLine); // Read each message line
            std::cout << "Message row no." << messageRow << ":\n" << messageLine << '\n'; // For test purposes
    
            cellID = 0; // cellID: 0(timestamp); 1(symbol); 2(order-id), 3(operation); 4(side); 5(volume); 6(price)
            while (std::getline(lineStream, messageCell, ';') && cellID < 8) // Loop & Quality check until last cell of message line
            {
              ....
               cellID  ;
            }
            if(messageTestPassed)
            {
              vol.push_back(inMsg.symbol);
            }
            // Next message line will be read from the stream and the cycle repeated.
            messageRow  ; // Increment message row number.
            std::cout << '\n';
    
            for (auto symbol : vol) {
                orderCount = inventory.orderCounts(symbol);
                std::cout << "\t\t\t\tTOTAL Order count = " << orderCount << '\n';
                
            }
            logBook << "\t\t\t\tTOTAL Order count = " << myLTOS(orderCount) << '\n';
            logBook.close();
            return 0;
    }

The problem is that I got wrong value of occurence for each key. As in this photo Occurence finding wrong

Msg #1  09:00:00.440000;AMD;2837174;I;SELL;72;36.30
                TOTAL AMD Order count = 
Msg #2  09:00:00.690000;TEST8;2837175;I;BUY;9;9.60
                TOTAL AMD Order count = 1
Msg #3  09:00:00.730000;AMD;2837176;I;SELL;5;36.30
                TOTAL AMD Order count = 3
Msg #4  09:00:01.040000;AMD;2837177;I;SELL;2;36.60
                TOTAL AMD Order count = 6
Msg #5  09:00:01.170000;AMD;2837174;A;SELL;72;36.00             
                TOTAL AMD Order count = 9
Msg #6  09:00:01.580000;AMD;2837178;I;SELL;620;36.00
                TOTAL AMD Order count = 13
Msg #7  09:00:02.030000;AMD;2837179;I;BUY;59;35.20
                TOTAL AMD Order count = 23
Msg #8  09:00:02.270000;OTHER;2837180;I;BUY;7;18.40 
                TOTAL AMD Order count = 33
Msg #9  09:00:03.040000;DVAM2;2837181;I;SELL;36;9.30
                TOTAL AMD Order count = 43
Msg #10 09:00:03.240000;TEST0;2837182;I;SELL;76;23.20   
                TOTAL AMD Order count = 5
Msg #11 09:00:03.410000;AMD;2837177;C;SELL;2;36.60              
                TOTAL AMD Order count = 10
Msg #12 09:00:03.600000;AMD;2837174;C;SELL;72;36.00             
                TOTAL Order count = 16
Msg #13 09:00:04.170000;OTHER;2837183;I;BUY;8;18.20 
                TOTAL AMD Order count = 22
Msg #14 09:00:04.340000;AMD;2837179;C;BUY;59;35.20              
                TOTAL AMD Order count = 29
Msg #15 09:00:04.580000;AMD;2837184;I;BUY;3;35.10   
                TOTAL AMD Order count = 37

Expection: Get result from Msg 1~15 should be TOTAL AMD Order count = 10. Above the wrong result is 37.

Even if the result is correct, this method is inefficient, as it's very time consuming to check each ticket name, e.g. AMD, NVDA, TSLA... there're thousands of ticket name. So if(), switch-case method to increment counter each time we find it in the map is not good, or unodered_map is good for this application and this wrong result is simply because of my wrong implementation? I hope you can help me with code modification, or suggest me a direction.

CodePudding user response:

I've solved it by myself using switch(cellID) when we meet a corresponding name in the column 2. I got the expecting result. Anyways, I still welcome someone suggests me a more optimized solution as my solution may not be the best one.

  • Related