Home > OS >  How to read a CSV dataset in which each row has a distinct length. C
How to read a CSV dataset in which each row has a distinct length. C

Time:01-29

I'm just new to C and am studying how to read data from csv file. I want to read the following csv data into vector. Each row is a vector. The file name is path.csv:

0 

0 1 

0 2 4

0 3 6 7

I use the following function:

vector<vector<int>> read_multi_int(string path) {
    vector<vector<int>> user_vec;
    ifstream fp(path); 
    string line;
    getline(fp, line); 
    while (getline(fp, line)) { 
        vector<int> data_line;
        string number;
        istringstream readstr(line); 
        
        while (getline(readstr, number, ',')) { 
            //getline(readstr, number, ','); 
            data_line.push_back(atoi(number.c_str())); 
        }
        user_vec.push_back(data_line); 
    }
    return user_vec;
}

vector<vector<int>> path = read_multi_int("C:/Users/data/paths.csv");

Print funtion:

template <typename T>
void print_multi(T u)
{
    for (int i = 0; i < u.size();   i) {
        if (u[i].size() > 1) {
            for (int j = 0; j < u[i].size();   j) {
                //printf("%d ", u[i][j]);
                cout << u[i][j] << " ";
            }
            printf("\n");
        }
    }
    printf("\n");
}

Then I get

0 0 0 

0 1 0

0 2 4

0 3 6 7

Zeros are added at the end of the rows. Is possible to just read the data from the csv file without adding those extra zeros? Thanks!

CodePudding user response:

Based on the output you are seeing and the code with ',' commas, I beleive that your actual input data really looks like this:

A,B,C,D
0,,,
0,1,,
0,2,4,
0,3,6,7

So the main change is to replace atoi with strtol, as atoi will always return 0 on a failure to parse a number, but with strtol we can check if the parse succeeded.

That means that the solution is as follows:

vector<vector<int>> read_multi_int(string path) {
    vector<vector<int>> user_vec;
    ifstream fp(path);
    string line;
    getline(fp, line);
    while (getline(fp, line)) {
        vector<int> data_line;
        string number;
        istringstream readstr(line);

        while (getline(readstr, number, ',')) {
            char* temp;
            char numberA[30];
            int numberI = strtol(number.c_str(), &temp, 10);
            if (temp == number || *temp != '\0' ||
                ((numberI == LONG_MIN || numberI == LONG_MAX) && errno == ERANGE))
            {
                // Could not convert
            }else{
                data_line.emplace_back(numberI);
            }
        }
        user_vec.emplace_back(data_line);
    }
    return user_vec;
}

Then to display your results:

vector<vector<int>> path = read_multi_int("C:/Users/data/paths.csv");

for (const auto& row : path)
{
    for (const auto& s : row) std::cout << s << ' ';
    std::cout << std::endl;
}

Give the expected output:

0
0 1
0 2 4
0 3 6 7

CodePudding user response:

Already very good, but there is one obvious error and another error in your print function. Please see, how I output the values, with simple range based for loops.

If your source file does not contain a comma (','), but a different delimiter, then you need to call std::getline with this different delimiter, in your case a blank (' '). Please read here about std::getline.

If we then use the following input

Header
0
0 1
0 2 4
0 3 6 7

with the corrected program.

#include <vector>
#include <fstream>
#include <iostream>
#include <string>
#include <sstream>

using namespace std;

vector<vector<int>> read_multi_int(string path) {
    vector<vector<int>> user_vec;
    ifstream fp(path);
    string line;
    getline(fp, line);
    while (getline(fp, line)) {
        vector<int> data_line;
        string number;
        istringstream readstr(line);

        while (getline(readstr, number, ' ')) {
            //getline(readstr, number, ','); 
            data_line.push_back(atoi(number.c_str()));
        }
        user_vec.push_back(data_line);
    }
    return user_vec;
}

int main() {
    vector<vector<int>> path = read_multi_int("C:/Users/data/paths.csv");
    for (vector<int>& v : path) {
        for (int i : v) std::cout << i << ' ';
        std::cout << '\n';
    }
}

then we receive this as output:

0
0 1
0 2 4
0 3 6 7

Which is correct, but unfortunately different from your shown output.

So, your output routine, or some other code, may also have some problem.

Besides. If there is no comma, then you can take advantage of formatted input functions using the extraction operator >>. This will read your input until the next space and convert it automatically to a number.

Additionally, it is strongly recommended, to initialize all variables during definition. You should do this always.

Modifying your code to use formatted input, initialization, and, maybe, better variable names, then it could look like the below.

#include <vector>
#include <fstream>
#include <iostream>
#include <string>
#include <sstream>

using namespace std;

vector<vector<int>> multipleLinesWithIntegers(const string& path) {

    // Here we will store the resulting 2d vector
    vector<vector<int>> result{};

    // Open the file
    ifstream fp{ path };

    // Read header line
    string line{};
    getline(fp, line);

    // Now read all lines with numbers in the file
    while (getline(fp, line)) {

        // Here we will store all numbers of one line
        vector<int> numbers{};

        // Put the line into an istringstream for easier extraction
        istringstream sline{ line };

        int number{};
        while (sline >> number) {
            numbers.push_back(number);
        }
        result.push_back(numbers);
    }
    return result;
}

int main() {
    vector<vector<int>> values = multipleLinesWithIntegers("C:/Users/data/paths.csv");
    for (const vector<int>& v : values) {
        for (const int i : v) std::cout << i << ' ';
        std::cout << '\n';
    }
}

And, the next step would be to use a some more advanced style:

#include <vector>
#include <fstream>
#include <iostream>
#include <string>
#include <sstream>
#include <iterator>

auto multipleLinesWithIntegers(const std::string& path) {

    // Here we will store the resulting 2d vector
    std::vector<std::vector<int>> result{};

    // Open the file and check, if it could be opened
    if (std::ifstream fp{ path }; fp) {

        // Read header line
        if (std::string line{}; getline(fp, line)) {

            // Now read all lines with numbers in the file
            while (getline(fp, line)) {

                // Put the line into an istringstream for easier extraction
                std::istringstream sline{ line };
                // Get the numbers and add them to the result
                result.emplace_back(std::vector(std::istream_iterator<int>(sline), {}));
            }
        }
        else std::cerr << "\n\nError: Could not read header line '" << line << "'\n\n";
    }
    else std::cerr << "\n\nError: Could not open file '" << path << "'\n\n'";
    return result;
}

int main() {
    const std::vector<std::vector<int>> values{ multipleLinesWithIntegers("C:/Users/data/paths.csv") };
    for (const std::vector<int>& v : values) {
        for (const int i : v) std::cout << i << ' ';
        std::cout << '\n';
    }
}

Edit


You have shown your output routine. That should be changed to:

void printMulti(const std::vector<std::vector<int>>& u)
{
    for (int i = 0; i < u.size();   i) {
        if (u[i].size() > 0) {
            for (int j = 0; j < u[i].size();   j) {
                std::cout << u[i][j] << ' ';
            }
            std::cout << '\n';
        }
    }
    std::cout << '\n';
}
  • Related