I try to read tokens from the input of a user like a compiler. The tokenization works fine but when outputting all the tokens i want to make a newline after they are all given out.
Here is my code:
#include <iostream>
#include <map>
#include <vector>
//import for using std::getline()
#include <string>
//DIGITs
const std::string DIGITS = "0123456789";
const std::string WHITESPACE = " \t\n\r";
//TOKENS
const std::string TT_INT = "INT";
const std::string TT_FLOAT = "FLOAT";
const std::string TT_PLUS = "PLUS";
const std::string TT_MINUS = "MINUS";
const std::string TT_MUL = "MUL";
const std::string TT_DIV = "DIV";
const std::string TT_LPAREN = "LPAREN";
const std::string TT_RPAREN = "RPAREN";
const std::string TT_INVALID_NUMBER = "INVALID_NUMBER_LITERAL";
class Token{
public:
std::string type;
std::string value;
void repr(){
std::cout << type << ":" << "value" << "\n";
}
};
class Lexer{
public:
std::string text;
int position = -1;
std::string current_char;
void advance(){
this->position = 1;
this->current_char = this->text[this->position];
}
void make_digit(std::string *type, std::string *value){
//if its number or floating point
std::string digit = "";
int is_float = 0;
while(DIGITS.find(this->current_char) != std::string::npos || this->current_char == "."){
digit = this->current_char;
if(this->current_char == "."){
is_float = 1;
}
this->advance();
}
*value = digit;
if(is_float == 0){
*type = TT_INT;
} else if((0 < is_float) && (is_float < 2)){
*type = TT_FLOAT;
} else {
*type = TT_INVALID_NUMBER;
}
}
std::vector<std::string> make_tokens(){
std::vector<std::string> tokens;
this->advance();
while (!(this->text.length() <= this->position))
{
if(WHITESPACE.find(this->current_char) != std::string::npos){
//dont add a token
this->advance();
} else if(DIGITS.find(this->current_char) != std::string::npos){
std::string type;
std::string value;
this->make_digit(&type, &value);
tokens.push_back(type);
tokens.push_back(value);
} else if(this->current_char == " "){
tokens.push_back(TT_PLUS);
tokens.push_back(this->current_char);
this->advance();
} else if(this->current_char == "-"){
tokens.push_back(TT_MINUS);
tokens.push_back(this->current_char);
this->advance();
} else if(this->current_char == "*"){
tokens.push_back(TT_MUL);
tokens.push_back(this->current_char);
this->advance();
} else if(this->current_char == "/"){
tokens.push_back(TT_DIV);
tokens.push_back(this->current_char);
this->advance();
} else if(this->current_char == "("){
tokens.push_back(TT_LPAREN);
tokens.push_back(this->current_char);
this->advance();
} else if(this->current_char == ")"){
tokens.push_back(TT_RPAREN);
tokens.push_back(this->current_char);
this->advance();
} else {
//nothing
this->advance();
}
}
return tokens;
}
};
int main(){
//previous: true
while(std::getline(std::cin, input)){
std::string input;
//previous: std::cin >> input;
//fix
std::getline(std::cin, input);
Lexer mylexer;
mylexer.text = input;
int x = 0;
std::vector<std::string> output = mylexer.make_tokens();
for (int i = 0; i < output.size(); i = 2){
std::cout << output.at(i) << ":" << output.at(i 1) << std::endl;
}
std::cout << "\n";
}
};
When entering 1 2
What i expected
1 2
INT:1
PLUS:
INT:2
here is the cursor
What i got
1 2
INT:1
PLUS:
INT:2
here is the cursor
When removing the newline at the end i get this, but when entering a second input line it is all together without empty lines, which is not what i want
1 2
INT:1
PLUS:
INT:2
here is the cursor
But i want it to look like this
1 2
INT:1
PLUS:
INT:2
3 4
INT:3
PLUS:
INT:4
Can anyone explain what this strange behaviour is? Am I missing something? Note that i don't have much C experience. I'm on windows compiling with clang-cl.exe. And im also wondering what the throw_bad_array_new_lengthv error means when compiling with MSYS2 g .exe
CodePudding user response:
The reason for the extra line breaks in your output is because you are using operator>>
to read in the input
.
operator>>
only reads in 1 word at a time. It stops reading when whitespace is encountered.
So, when you enter 1 2
as your input, you end up calling make_tokens()
with only the first word 1
as the mylexer.text
, then your loop prints out INT:1
followed by a line break, and then you print out another line break after the loop exits. Then, you read in the next word
, tokenize it, and print out PLUS:
followed by 2 line breaks. Then you read in the next word 2
, tokenize it, and print out INT:2
followed by 2 line breaks.
Use std::getline(std::cin, input);
instead. Then you will tokenize the entire input 1 2
in one call to make_tokens()
, and then you will print out the kind of output you are expecting - all 3 tokens, with 1 line break between them, and then 1 more line break after the end.
On a side note: you should not be using a while(true)
loop, especially since you are ignoring whether or not std::cin
is even successful in reading. You are causing an endless loop that can crash the code.
You should use std::cin
's error state to stop the loop when there is no more input to read, eg:
std::string input;
while (std::cin >> input){
// use input as needed...
}
Or, in the case of std::getline()
:
std::string input;
while (std::getline(std::cin, input)){
// use input as needed...
}