How do I properly return an object and set it to a new object?

Hey everyone, I'm trying to make a simple lexer that splits up a given string into tokens. As you can see below, I have a Token class, and a Lexer class.

I want the Lexer class to interact with the Token class and create new objects. But when I try to return the object and use the returned object to create a new one in main, I get a weird error. If someone could help guide me to the right direction, that would be great. Thanks!

main.cpp:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include "includes/token.h"
#include "lexer.cpp"
#include <iostream>

using namespace std;

int main() {

    string input = "Hello World!";
    Lexer lexer(input);
    Token token = lexer.getToken();

    while(token.TokenType != "EOF") {
        cout << token.TokenType << endl;
        lexer.getToken();
    }   

    return 0;
}


token.h:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#pragma once

class Token {
    public:

        const char* TokenValue;
        const char* TokenType;

        Token(const char* value, const char* type) {

            TokenValue = value;
            TokenType = type;

        }

};


lexer.h:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#pragma once
#include "token.h"
#include <iostream>
#include <vector>
#include <string>

using namespace std;

class Lexer {
    public:

        std::vector<string> TokenType = {

            "EOF",
            "NEWLINE",
            "STRING",
            "IDENTIFIER",
            
            "PRINT"

        };

        std::vector<char> value;
        string source;
        char currentCharacter;
        int currentPosition;
        string keyword;
        Token token;

        Lexer(string input);
        void nextCharacter();
        char peek();
        void skipWhitespace();
        Token getToken();
        string checkIfKeywordIsToken(string value);

};


lexer.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#include "includes/lexer.h"

Lexer::Lexer(string input) {
    source = input + '\n';
    currentPosition = -1;
    nextCharacter();
}

void Lexer::nextCharacter() {

    currentPosition += 1;

    if(currentPosition >= source.size()) {
        currentCharacter = '\0';
    }

    else {
        currentCharacter = source[currentPosition];
    }

}

char Lexer::peek() {

    if(currentPosition + 1 >= source.size()) {
        return '\0';
    }

    return source[currentPosition + 1];

}

void Lexer::skipWhitespace() {

    while(currentCharacter == ' ' || currentCharacter == '\t' || currentCharacter == '\r') {
        nextCharacter();
    }

}

//============================================
//             START OF TOKENIZER
//============================================

Token Lexer::getToken() {

    skipWhitespace();

    if(currentCharacter == '\n') {

        keyword = "NEWLINE";
        for(int i = 0; i < TokenType.size(); i++) {
            if(TokenType[i] == keyword) {
                Token token(&currentCharacter, TokenType[i].c_str());
            }
        }

    }

    return token;

}

//============================================
//               END OF TOKENIZER
//============================================

string Lexer::checkIfKeywordIsToken(string value) {

    for(int i = 0; i < TokenType.size(); i++) {
        if(TokenType[i] == value) {
            return TokenType[i];
        }
    } 

    return "IDENTIFIER";

}


Error:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
In file included from basic-compiler.cpp:2:0:
lexer.cpp: In constructor 'Lexer::Lexer(std::__cxx11::string)':
lexer.cpp:3:26: error: no matching function for call to 'Token::Token()'
 Lexer::Lexer(string input) {
                          ^
In file included from basic-compiler.cpp:1:0:
includes/token.h:9:9: note: candidate: Token::Token(const char*, const char*)
         Token(const char* value, const char* type) {
         ^~~~~
includes/token.h:9:9: note:   candidate expects 2 arguments, 0 provided
includes/token.h:3:7: note: candidate: constexpr Token::Token(const Token&)
 class Token {
       ^~~~~
includes/token.h:3:7: note:   candidate expects 1 argument, 0 provided
includes/token.h:3:7: note: candidate: constexpr Token::Token(Token&&)
includes/token.h:3:7: note:   candidate expects 1 argument, 0 provided


Update:
When I run this program, it should endlessly print "NEWLINE" to console until I manually stop the program from running.
Last edited on
In lexer.cpp on line 54 you create a variable named token that goes out of scope (and is destroyed) on line 55 ( at the next } ).

Either create the variable outside the loop and assign to it inside the loop, or return the object directly from inside the loop. Think about what to do if the keyword is not found. Do you return some kind of "empty" token, or throw an exception, or something else?
Last edited on
@Peter87

Darn, what a rookie mistake.
Thanks so much for pointing it for me!

How would I directly create an object outside the loop and assign it inside the loop if you don't mind me asking?

Passing NULL or "" to the constructor just throws an error.
Sorry, I'm still pretty new to C++ and OOP programming in general.
How would I directly create an object outside the loop and assign it inside the loop if you don't mind me asking?
1
2
3
4
5
6
7
8
9
10
Token token; // assuming there is a default constructor
if(...) {
  for(...) {
    if(...) {
      token = Token(&currentCharacter, TokenType[i].c_str());
      break; // no need to continue searching since we have already found what we're looking for
    }
  }
}
return token;


But in this case I suspect it's better to simply return the object from inside the loop. Then there would be no need for a default constructor.

1
2
3
4
5
6
7
if(...) {
  for(...) {
    if(...) {
      return Token(&currentCharacter, TokenType[i].c_str());
    }
  }
}

But this begs the question what to do if currentCharacter is not equal to the newline character or if the loop finishes without returning? You probably should do something because reaching the end of a non-void function leads to UB.
Last edited on
@Peter87

Thanks once again for the answer, I really appreciate it.
I'm building a simple BASIC to C compiler as a exercise to understand C/C++ a bit more.

What you see above is extremely early in development.
There'll be way more statements to check if the current character equals something else, if it won't equal to anything, it'll return a error and stop the program, other wise if the current character reaches the end of the source, it'll return a '\0' and stop the program.

Thanks once again for the help!
Also, don't include cpp files, including the appropriate header file is sufficient.
A couple of points:

1) token.h L11,12 This will copy the pointer only - not the contents pointed to. This means that the passed value and type must exist and be valid for the lifetime of the token object. Is this what you mean - or do you really intend to copy the contents?

2) lexer.h . As there's only one TokenType used by all instances of Lexer class, this can be made static const.

3) When parsing, it is common to use an integer value for token type - rather than a string literal. So for each of token parsed, you assign a numeric token type. eg one value for each of the keywords, one for a string literal, one for a numeric. one for each special symbol (, ; [ etc), one for eol, one for unknown etc etc etc. These are often set as an enum. You have something like (example only):

1
2
3
4
5
6
7
8
enum class Tokens {UNKNOWN, EOI, NL, STR, NUM, ID, PRINT, INPUT};

struct KeyWords {
	std::string name;
	Tokens token;
};

const std::vector<KeyWords> keywrds { {"PRINT", Tokens::PRINT}, {"INPUT", Tokens::INPUT} };


Note that EOF is already defined as -1!

So .getToken() returns a type Tokens and a program (eg the Basic program) is then parsed into a sequence of Tokens. When parsing you have getToken(), peekToken() etc
Last edited on
@seeplus

Thank you so much for all the tips!
Topic archived. No new replies allowed.