class template
<codecvt>

std::codecvt_utf8

template < class Elem, unsigned long MaxCode = 0x10ffffUL, codecvt_mode Mode = (codecvt_mode)0 >  class codecvt_utf8 : public codecvt <Elem, char, mbstate_t>
Convert UTF-8

Converts between multibyte sequences encoded in UTF-8 and sequences of their equivalent fixed-width characters of type Elem (either UCS-2 or UCS-4).

Notice that if Elem is a 32bit-width character type (such as char32_t), and MaxCode is 0x10ffff, the conversion performed is between UTF-8 and UTF-32. For 16bit-width character types, this class would only generate code points that do not require surrogates (plain old UCS-2). To convert from UTF-8 to UTF-16 (both being variable-width encodings) or the other way around, see codecvt_utf8_utf16 instead.

The facet uses Elem as its internal character type, and char as its external character type (encoded as UTF-8). Therefore:
  • Member in converts from UTF-8 to its fixed-width character equivalent.
  • Member out converts from the fixed-width wide character encoding to UTF-8.

Template parameters

Elem
The internal character type, aliased as member intern_type. This shall be a wide character type: wchar_t, char16_t or char32_t.
For 16bit-wide characters, conversions in of characters outside the Basic Multilingual Plane may cause conversion errors.
The external character type in this facet is always char.
MaxCode
The largest code point that will be translated without reporting a conversion error.
Mode
Bitmask value of type codecvt_mode:
labelvaluedescription
consume_header4An optional initial header sequence (BOM) is read to determine whether a multibyte sequence converted in is big-endian or little-endian.
generate_header2An initial header sequence (BOM) shall be generated to indicate whether a multibyte sequence converted out is big-endian or little-endian.
little_endian1The multibyte sequence generated on conversions out shall be little-endian (as opposed to the default big-endian).

Member types

The following aliases are member types of codecvt_utf8, inherited from codecvt:

member typedefinitionnotes
intern_typeThe first template parameter (Elem)The internal character type (wide character type).
extern_typecharThe external character type (multibyte character type).
state_typembstate_tConversion state type (see mbstate_t).
resultcodecvt_base::resultEnum type with the result of a conversion operation (see codecvt_base::result).

Public member functions inherited from codecvt


Conversion functions:

Character encoding properties:

Virtual protected member functions

The class defines its functionality through its virtual protected member functions:
member functionbehavior in codecvt_utf16
do_always_no_convReturns 0 (not all conversions will yield a noconv result).
do_encodingReturns 0 (the external encoding is not fixed-width).
do_inConverts from UTF-8 to the fixed-width equivalent of type Elem.
do_lengthReturns length (for codecvt::length).
do_max_lengthReturns the maximum length (in bytes) of a code point.
do_outConverts from the fixed-width wide character encoding (UCS-2 / UCS-4) to UTF-8.
do_unshiftBrings the mbstate_t object to an initial state.
(destructor)Releases resources.

Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// codecvt_utf8: writing UTF-32 string as UTF-8
#include <iostream>
#include <locale>
#include <string>
#include <codecvt>
#include <fstream>

int main ()
{
  std::u32string str ( U"\U00004f60\U0000597d" );  // ni hao (你好)

  std::locale loc (std::locale(), new std::codecvt_utf8<char32_t>);
  std::basic_ofstream<char32_t> ofs ("test.txt");
  ofs.imbue(loc);

  std::cout << "Writing to file (UTF-8)... ";
  ofs << str;
  std::cout << "done!\n";

  return 0;
}

Output
Writing to file (UTF-8)... done!


See also