In read.txt file I have:
//start of read.txt
The first matrix is:
1, 2, 3, 4;
5, 6, 7, 8;
9, 8, 1, 2;\
//end of read.txt
Notice spaces, commas, and semicolons
The matrix can be different! (thats why I wrote matrix with unknown size)
My very very inefficient approach:
using getline(File, string)
and while(getline(File,str))
loop I will go through the loop for the first time to find rows and columns. Then I will use rows and columns to create int arr[row][col]
, and go through the second loop and the same file to assign each integer to arr[row][col]
.
I was also considering malloc()
, however, it erases arrays content every time when I allocate new memory(for example in order to create more rows).
My method above is very slow and messy. So I'm looking for advice how to make it more efficient!
CodePudding user response:
If you can't change input file format let me know. In modern C don't use malloc, and avoid new/delete. You can't use 'c' style arrays for dynamic arrays, use std::vector.
Example :
#include <cassert>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
auto load_matrix()
{
// mimic input file
// to improve reading from file
// - add size specifying line
// - remove the commas and ;
// 3 rows, 4 columns
std::istringstream input_file{ "3 4\n1 2 3 4\n5 6 7 8\n9 8 1 2" };
// replace with std::ifstream input_file{"matrix.txt"};
std::size_t rows;
std::size_t cols;
input_file >> rows;
input_file >> cols;
// create a dynamically sized 2d array of the correct size
std::vector<std::vector<int>> matrix(rows, std::vector<int>(cols));
// loop over all rows all values (in columns) and load them
for (auto& row : matrix)
{
for (auto& value : row)
{
input_file >> value;
}
}
return matrix;
}
int main()
{
auto matrix = load_matrix();
assert(matrix[0][0] == 1);
assert(matrix[1][1] == 6);
assert(matrix[2][1] == 8);
}
CodePudding user response:
You need to select the correct approach to solve that problem.
If you want to store unknown number of columns, then you can use a std::vector
. It will grow dynamically, as you like.
And if you want to store an unknown number of rows with columns in it, then you will use again a std::vector
. But at this time a vector of vector, So, a 2 dimensional vector: std::vector<std::vector<std::string>>
.
This will store any number of rows with any number of different columns.
Next. To extract the data from a line, or better said, split the line.
There is a special dedicated iterator for this. The std::sregex_token_iterator
. You may define a pattern on what you are looking for. Or, you may define a pattern, what you are not looking for, the separator.
And since regexes are very versatile, you can build complex patterns that fit your needs.
For positively sarach digits you can use R"(\d )", for negative search separators you could use R"([\.;\\])".
If you want to to search for separators, then you can add a -1 as last parameter to the constructor.
To get the result of the split of the line, we will use the std::vector
s range constructor. Here you can specify a start iterator and an end iterator and the constructor, together with the std::sregex_token_iterator
will do all work for you.
See the following simple example:
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <regex>
using Columns = std::vector<std::string>;
using Rows = std::vector<Columns>;
const std::string fileName{ "data.txt" };
const std::regex re{ R"(\d )" };
int main() {
// Open file and check, if it could be opened
if (std::ifstream inputFileStream{ fileName }; inputFileStream) {
// Here we will store the result
Rows rows{};
// Read all complete text lines from text file
for (std::string line{}; std::getline(inputFileStream, line);) {
// Get the columns
Columns columns(std::sregex_token_iterator(line.begin(), line.end(), re), {});
// Add the columns to rows
rows.push_back(columns);
}
// Debug Ouput
for (const auto& row : rows) {
for (const auto& column : row) std::cout << column << ' ';
std::cout << '\n';
}
} // Error message, if file could not be opened
else std::cerr << "\nError:Could not open file '" << fileName << "'\n\n";
return 0;
}
To be compiled with C 17
CodePudding user response:
On a Linux operating system you could use mmap(2) to see your textual file in your address space, and later use standard parsing techniques, maybe ANTLR or GNU bison.
I believe the performance won't change a lot: most of the time would be spent in doing the IO (e.g. waiting for the rotating disk, if you have one). Unless you have to parse a huge matrix (e/g. many million integers) I believe your approach is good enough in practice in 2021.
I recommend using a profiler (e.g. GNU gprof) before making any reengineering decisions. It is likely that the CPU time would be spent elsewhere in your program.
You may want to study for inspiration the source code of existing open source programs with some parsing, e.g. GCC, ninja, RefPerSys, fish.
Of course, read a good C programming book and use C containers.
You could use partial evaluation techniques and generate machine code at runtime with asmjit or libgccjit suited to the particular size of your matrix. I believe it is not worth the effort.
I recommend to write a fully correct program first, debug it, and only later optimize it.
Consider using static analyzers like Frama-C.