Home > Enterprise >  Problem in removing duplicate reverse lines from file
Problem in removing duplicate reverse lines from file

Time:10-07

I have a file which contains the following lines,

connection list
current check OK

connect "A" to "B"
connect "A" to "C"
connect "A" to "D"
connect "C" to "A"
connect "A" to "E"

Here connect "C" to "A" is a reverse connection to connect "A" to "C" The requirement is to remove the duplicate reverse connection.

I am new to C and vector. I tried using the following:

First I took a structure of 2 strings con1 and con2: connectPair Then I took a vector of the structure

Now, I am saving the file lines to a vector: rawFileLines I am trying to operate on rawFileLines to find the connection components.

I am storing the connection components to another vector: values

Here is my code:

typedef struct {
    std::string con1;
    std::string con2;
} ConnectPair;

void RemoveReversePairs(std::string inputFile) {
    std::vector<std::string> fileData;
    std::string line, scan, token1, token2;
    std::size_t tokenLeft, tokenRight, maxLines, lineNumber = 0, pos = 0;
    std::size_t found = 0, storeCount = 0;
    
    std::vector<std::string> rawFileLines;
    
    ConnectPair connectPair = {};
    std::vector<ConnectPair> values;
    


    std::ifstream source(inputFile.c_str());
    while (std::getline(source, line)) {
        rawFileLines.push_back(line);
    }
    source.close();
    maxLines = rawFileLines.size();

    for (size_t i = 0; i < maxLines; i  ) {
        line = rawFileLines[i];
        pos = 0;
        scan = "\"";
        found = 0;

        while (found < 2) /*line.find(scan, pos) != std::string::npos*/ {
            tokenLeft = line.find(scan, pos);
            tokenRight = line.find(scan, tokenLeft   1);

            if ((tokenLeft != std::string::npos) && (tokenRight != std::string::npos)) {
                found  ;
                if (found == 1) {
                    connectPair.con1 = line.substr(tokenLeft   1, (tokenRight - tokenLeft) - 1);
                }
                else if (found == 2) {
                    connectPair.con2 = line.substr(tokenLeft   1, (tokenRight - tokenLeft) - 1);
                    values.push_back(connectPair);
                    storeCount  ;
                }
                pos = tokenRight   1;
            }
            else {
                connectPair.con1 = "  ";
                connectPair.con2 = "  ";
                values.push_back(connectPair);

                fileData.push_back(line);
                break;
            }
        }
    }

Now, I am having trouble comparing the connections. Please suggest me how to proceed.

Thank you.

CodePudding user response:

Leaving the code to read in the connections to you as you mentioned in comments that that was working for you, consider an STL solution using the algorithms header's find_if along with a lambda.

For simplicity I have a std::vector<std::pair<std::string, std::string>> populated with your sample connections data.

I use a loop to print it to ensure the data is what I expect. This loop makes use of destructuring to cut out a lot of annoying boilerplate.

Then comes the real meat of the solution. We use an explicit iterator to loop over the vector, using std::find_if to check the rest of the vector for connections that are either identical, or are identical when reversed. If std::find_if returns the end iterator, it didn't find anything, and we can push that pair back onto the map2 vector. If an equivalent does exist in the rest of the vector, the current pair does not get pushed onto the map2 vector.

In the lambda it's important that we capture the current iter so we can compare it to the rest of them (represented by the argument to the lambda b).

[&iter](auto b) {
    return (iter->first == b.first  && iter->second == b.second) ||
           (iter->first == b.second && iter->second == b.first );
}
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>

int main() {
    std::vector<std::pair<std::string, std::string>> map, map2;

    map.push_back({"A", "B"});
    map.push_back({"A", "C"});
    map.push_back({"A", "D"});
    map.push_back({"C", "A"});
    map.push_back({"A", "E"});

    std::cout << "Before:" << std::endl;

    for (auto &[k, v] : map) {
        std::cout << k << " -> " << v << std::endl;
    }

    auto end = map.end();

    for (auto iter = map.begin(); iter != end; iter  ) {
        if (std::find_if(iter   1, end,
                         [&iter](auto b) {
                             return (iter->first == b.first  && iter->second == b.second) ||
                                    (iter->first == b.second && iter->second == b.first );
                         }) == end) {
            map2.push_back(*iter);
        }
    }

    std::cout << "After: " << std::endl;

    for (auto &[k, v] : map2) {
        std::cout << k << " -> " << v << std::endl;
    }
}

Result:

Before:
A -> B
A -> C
A -> D
C -> A
A -> E
After:
A -> B
A -> D
C -> A
A -> E

CodePudding user response:

Since your connection is bidirectional implicitly.

I'll suggest to use std::unordered_map<std::string,set::unordered_set<std::string>> if data has tons of connections to handle.

Because both unordered_map and unordered_set has constant time to lookup on average, but it takes longer time to insert.

I borrowed Chris's code to construct data.

Please note that Chris's example is good enough if your data is not large.

Live demo

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <map>
#include <unordered_set>

int main() {
    std::vector<std::pair<std::string, std::string>> map;

    map.push_back({"A", "B"});
    map.push_back({"A", "C"});
    map.push_back({"A", "D"});
    map.push_back({"D", "A"});
    map.push_back({"C", "A"});
    map.push_back({"A", "E"});
    map.push_back({"E", "A"});

    std::cout << "Before:" << std::endl;

    for (auto &[k, v] : map) {
        std::cout << k << " -> " << v << std::endl;
    }
    std::unordered_map<std::string,std::unordered_set<std::string>> connection;
    for (auto &[k, v] : map) {

        // Existed connection
        if((connection[k].find(v) != connection[k].end()) || (connection[v].find(k) != connection[v].end()) ){
            continue;
        }
        connection[k].insert(v);
    }

    std::cout << "After: " << std::endl;

    for (auto &[k, v] : connection) {

        for(auto& item : v){
            std::cout << k << " -> " << item << std::endl;
        }
    }
}

CodePudding user response:

Let me first say that I think that your code is somehow complicated.

Then, next. To remove the duplicates you can use the erase / remove_if idiom.

The code fragment that you need to put at the end of your function could be:

    int i = 1;
    while (i < values.size()) {
        values.erase(std::remove_if(values.begin(), values.end(),
            [&](const ConnectPair& cp)-> bool
            { return ((cp.con1 == values[i].con1) && (cp.con2 == values[i].con2)) || ((cp.con1 == values[i].con2) && (cp.con2 == values[i].con1)); }),
            values.end());
          i;
    }

Important is here the compare function. You make a 1 to 1 comparison and additionally you compare con1 with con2 and vice versa.


But let me say. Life can be easier. You can add alreadya compare function to your struct. That would be the more object oriented approach. And then you can use your struct in an appropriate container like std::set which will not allow duplicates.

And because we will not use a direction, but a connection, we can simply sort the first and the second element. This makes comparison ultra simple.

And the whole reading of data and doing all the task, can the be done in one line of code in main.

Please see:

#include <string>
#include <vector>
#include <iostream>
#include <fstream>
#include <regex>
#include <utility>
#include <set>

const std::regex re{ R"(\"(\w )\")" };

struct Terminal {
    // Store undirected connection in a noram std::pair
    std::pair<std::string, std::string> end{};

    // Read new connection from stream
    friend std::istream& operator >> (std::istream& is, Terminal& t) {
        bool found{};
        // Read a line, until we found a connection or until eof
        for (std::string line{}; not found and std::getline(is, line);) 

            // Get connection end names
            if (std::vector ends(std::sregex_token_iterator(line.begin(), line.end(), re), {}); found = (ends.size() == 2)) 
                t.end = std::minmax(ends[0], ends[1]);
        return is;
    }
    bool operator < (const Terminal& other) const { return end < other.end;  }
};

int main() {
    // Open file and check, if it could be opened
    if (std::ifstream inputFileStream{ "r:\\data.txt" }; inputFileStream) {

        // Read complete data, without doubles into our container
        std::set data(std::istream_iterator<Terminal>(inputFileStream), {});

        // Debug output
        for (const auto& d : data) std::cout << d.end.first << " <-> " << d.end.second << '\n';
    }
}

Please note, if you need to original data, then one line with the original pair can be added to the struct.

  • Related