The type of sting literals

Pages: 12
What is the type of string literals, say, "Hello", in C++, please?
Is it a const char* as Stroustrup says in the book Tour of C++ or is it an array of constant characters?
Last edited on
Both are correct, because there's an ambiguity when someone says "a string literal". There's the syntactical element "Hello" at the point of usage and the data Hello\0 (which "Hello" points to) which is stored wherever. Both can be correctly said to be string literals.
The more complete answer is that "Hello" is a pointer to an array of constant characters containing Hello\0.
Last edited on
sizeof "hello" is 6, so I would interpret it as the latter of your two options, i.e. it is the array itself, which follows normal array-degrading-into-pointer rules.
You can also think of it like this: When you write a code like...

const char *mystr = "Hello";

...then the compiler allocates an array of constant characters in the static memory of the executable file, which contains the data Hello\0, and it puts the address (pointer) of that array into the mystr variable.

It's pretty much a shorthand for:
1
2
static const char _mystr_buffer[6] = { 'H', 'e', 'l', 'l', 'o', '\0' };
const char *mystr = &_mystr_buffer[0];
Last edited on
The type of the expression "Hello" is char const[6]
https://eel.is/c++draft/lex.string#1
I guess the following consideration may be correct, as Stroustrup says:
"hello" is an array of constant characters but its type is a const char*.
We can also dereference it: *"hello".
Last edited on
"hello" is of type const char[6]. It can be dereferenced via * as an array degrades to a pointer. *"hello" gives a type of char and value 'h'.

If your compiler supports intellisense, move the cursor over "hello" and see what the compiler thinks it's type is.
> "hello" is an array of constant characters but its type is a const char*

No, its type is not const char*, though it may decay to const char* at the slightest provocation.

Whether the IDE has intellisense or not, to cement your understanding, write a small program. (Ideally run it on more than one implementation.)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <iostream>
#include <type_traits>
#include <cstring>

template < typename T > requires std::is_bounded_array_v<T>
void test( const T& ) { std::cout << "bounded array of size " << std::extent_v<T> << '\n' ; }

template < typename T > requires std::is_same_v< T, const char* >
void test( const T& cstr ) { std::cout << "pointer (c-style string of length "  << std::strlen(cstr) << ")\n" ; }

int main()
{
    test( "Hello" ) ; // bounded array of size 6
    test( ("Hello") ) ; // bounded array of size 6

    auto& x = "ab\0cd\0ef\0gh" ; // type of x is 'reference to array of 12 const char'
    test(x) ; // bounded array of size 12

    // the expression "ab\0cd\0ef\0gh" is an lvalue of type 'array of 12 const char'
    decltype( "ab\0cd\0ef\0gh" ) y = "ab\0cd\0ef\0gh" ; // type of y is 'reference to array of 12 const char'
    test(y) ; // bounded array of size 12

    // arrays are not copyable; ergo type of z is const char*
    auto z = "ab\0cd\0ef\0gh" ; // array to pointer decay: implicit conversion from string literal to const char*
    test(z) ; // pointer (c-style string of length 2) (the string literal has a size of 12, but std::strlen(z) == 2)

    // unary + on an array type, apply implicit array-to-pointer conversion.
    test( +"ab\0cd\0ef\0gh" ) ; // pointer (c-style string of length 2) (array to pointer decay)
}

http://coliru.stacked-crooked.com/a/a732ec0112d7bac9
Consider:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <typeinfo>
#include <iostream>

int main() {
	std::cout << typeid("hello").name() << '\n';

	const auto str { "hello" };

	std::cout << typeid(str).name() << '\n';

	const char str1[] { "hello" };

	std::cout << typeid(str1).name() << '\n';
}


which displays (for 64 bit)::


char const [6]
char const * __ptr64
char const [6]


which shows a difference between type char* and char[] for str and str1.

If a type has a size (ie is bounded), then std::begin(), std::end(), std::size() etc will work. If not bounded (ie char* etc) then they will not.
Last edited on
@JLBorges
Do you mean "decompose" by "decay"?
If so, what makes that decomposition? (in simple language please)
Do you mean "decompose" by "decay"?
No, he means that most valid expressions involving "Hello" implicitly cast it to const char *.

const char str1[] { "hello" };
Note that this means something very different. str1 != "hello". To preserve the type and the identity you'd have to do
 
auto &str2 = "hello";
By decay, I meant 'array to pointer decay'.

There is an implicit conversion from lvalues and rvalues of array type to rvalues of pointer type: it constructs a pointer to the first element of an array. This conversion is used whenever arrays appear in context where arrays are not expected, but pointers are.
https://en.cppreference.com/w/cpp/language/array#Array-to-pointer_decay


That's interesting that in:
1
2
auto str2{ "hello" };
std::cout << str2 << '\n';

the intellisense on the second line shows const char* as the deduction type for str2!
Last edited on
Add an ampersand and it will show array type: auto& str2{ "hello" };.
Consider:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <typeinfo>
#include <iostream>

int main() {
	std::cout << typeid("hello").name() << '\n';

	const auto str { "hello" };

	std::cout << typeid(str).name() << '\n';

	char str1[] { "hello" };

	std::cout << typeid(str1).name() << '\n';

	auto& str2 { "hello" };

	std::cout << typeid(str2).name() << '\n';

	const char(&str3)[6] {"hello"};

	std::cout << typeid(str3).name() << '\n';

	std::cout << (void*)str << '\n';
	std::cout << (void*)str1 << '\n';
}



char const [6]
char const * __ptr64
char [6]
char const [6]
char const [6]
000000013F0533C0
000000000023FAA0


Note how str3 is defined.

"hello" is of type const char[6]
str1 is of type const char[6]

but the address of "hello" (str) is a pointer into the code, whereby the address of str1 is within the stack.

Also this time, str1 is not const - which is OK. The others need to be const as they refer to "hello" in the code - which cannot be changed!

str1 contains the data "hello" - but this is not the same "hello" as in the .exe code. It is a copy and hence need not be const as its contents can be changed.
@Cubbi, What does that ampersand do that makes the type change?

@seeplus:
but the address of "hello" (str) is a pointer into the code, whereby the address of str1 is within the stack.
I didn't understand this very well I think. And why don't you say the same thing for str as the first two types, e.g., str is of type const char* as shown in the code output?
Last edited on
frek wrote:
What does that ampersand do that makes the type change?
it makes auto deduce a reference type rather than an object type, and reference types can be initialized from arrays so it doesn't have to invent a pointer. Same as on lines 15 and 19 in seeplus's example and line 16 in JLBorges's example.
another way you can avoid decay with auto is decltype(auto) str2{ "hello" };
reference types can be initialized from arrays
Interesting. any clear reason?

it doesn't have to invent a pointer
So without an ampersand there will be an implicit conversion to invent a pointer. What do you mean by "decay"? That is, which of its meaning is meant here, "decompose", "spoil", "damage", ..?
decay in this context means that if you have a type of bounded array (eg "hello" which is const char[6] ), then anywhere code expects a pointer to const char (ie type const char*), then the type const char[6] is 'converted' internally to const char*. This is array decay.
but the address of "hello" (str) is a pointer into the code, whereby the address of str1 is within the stack.


When a program is executed it is first read into memory. Hence in this case str is a pointer into memory where the program has been read into and points to "hello" within that memory.

stack is where the contents of local, non-static variables are stored (such as str1).

There's also heap memory which is used for dynamic memory (ie when using new etc). Global and static data are stored in yet another memory area.

The only really important thing to remember about this is that the size of the stack is limited. If you try to use too much stack memory (depends upon compiler/os etc) then you'll get an error. The size of the heap/static/global is limited by the amount of available memory (and whether you're doing 32/64 bit compiling). Also that the contents of memory used by the program can't be changed.

Pages: 12