Home > database >  How to parse CSV by columns and save into arrays C
How to parse CSV by columns and save into arrays C

Time:09-24

I'm a new learner of C . I have some data saved in Data.csv like this:

   |  S0001    |   S0002   | S0003  | S0004  |  ...
0  | 10.289461 | 17.012874 |        |
1  | 11.491483 | 13.053712 |        |
2  | 10.404887 | 12.190057 |        | 
3  | 10.502540 | 16.363996 | ...    |  ...
4  | 11.102104 | 12.795502 |        | 
5  | 13.205706 | 13.707030 |        |
6  | 10.544555 | 12.173467 |        | 
7  | 10.380928 | 12.578932 |        | 
8  | 10.962240 | 12.615608 |        | 
9  | 10.690547 | 17.678212 |        | 
10 | 12.416197 | 13.769609 |        |
...    ...         ...         ...     ...  

I need values column by column to do calculations (column average, moving average, etc), so I want to parse by columns and save them into arrays.

I read these answers but didn't find what I want. What I did is

void read_file(std::vector<std::string>& v, std::string filename){
    std::ifstream inFile(filename);
    std::string temp;
    if (!inFile.is_open()) {
        std::cout<<"Unable to open the file"<<std::endl;
        exit(1);
    }
    while (getline(inFile, temp)) {
        v.push_back(temp);
    }
    inFile.close();
}

int main()
{
    std::string filename = "Book1.csv";
    std::vector<std::string> vec;
    read_file(vec, filename);
    std::cout<<vec[1]<<std::endl;
 }

I can only get values line by line. But how can I parse the file and get values by column?

CodePudding user response:

You can only read text file line-by-line. If you only need ONE column (unlikely), you could parse that value out of the line and push it into a vector.

If you need to load ALL columns in one pass, create a vector of vectors and push parsed values into a different column vectors.

CodePudding user response:

I would start by trying to build classes that can represent the data in the file properly. Looking at this, I see three potential classes:

struct header {    // to hold the header data
    std::vector<std::string> cols;
};
struct datarow {   // to hold the data in one row
    unsigned rowno;
    std::vector<double> cols;
};
struct csvfile {   // to hold one header   a number of data rows
    header head;
    std::vector<datarow> rows;
};

To simplify the extraction from the file, you could define some overloads for operator>>.

header
  • Read a line from an istream
  • Put the line in an istringstream for easy extraction
  • Use getline with the delimiter | (or whatever you choose) to extract column by column.
  • Store each extracted value in the header.cols vector.
std::istream& operator>>(std::istream& is, header& h) {   
    if(std::string line; std::getline(is, line)) {
        h.cols.clear();
        std::istringstream iss(line);
        while(std::getline(iss, line, '|')) {
            h.cols.push_back(line);
        }
    }
    return is;
}
datarow
  • Very similar to header but here we need to read the rowno separately and skip the delimiter in a slightly different way.
  • Use formatted input to extract the floating point values
  • Put the result in the datarow.cols vector
std::istream& operator>>(std::istream& is, datarow& dr) {   
    if(std::string line; std::getline(is, line)) {
        std::istringstream iss(line);        
        if(iss >> dr.rowno) {
            dr.cols.clear();
            char delim;
            double tmp;
            while(iss >> delim >> tmp) {
                dr.cols.push_back(tmp);
            }
        }
    }
    return is;
}
csvfile
  • This one is simpler since it's only using the two operator>>s we defined above.
  • Read one header
  • Read and store rows until there are no more
std::istream& operator>>(std::istream& is, csvfile& cf) {
    if(is >> cf.head) {
        cf.rows.clear();
        datarow tmp;
        while(is >> tmp) cf.rows.push_back(tmp);
    }
    return is;
}

With these additions, you should be able to open your file and just stream it into a csvfile instance.

if(std::ifstream file("filename.csv"); file) {
    csvfile cf;
    if(file >> cf) {
       // success - all the data is now available in cf
    }
}

  • Related