cgul_hchar.h File Reference

cgul 16-bit wide-character support More...

#include "cgul_common.h"
#include "cgul_exception.h"
#include "cgul_int.h"
Include dependency graph for cgul_hchar.h:
This graph shows which files directly or indirectly include this file:

Macros

#define CGUL_HCHAR__NUL   ((cgul_hchar_t)0)
 
#define CGUL_HCHAR__BOM   ((cgul_hchar_t)0xfeff)
 
#define CGUL_HCHAR__MB_LEN_MAX   (4)
 

Functions

CGUL_EXPORT size_t cgul_hchar__hcslen (cgul_exception_t *cex, const cgul_hchar_t *hs)
 
CGUL_EXPORT void cgul_hchar__hcscpy (cgul_exception_t *cex, cgul_hchar_t *dst, const cgul_hchar_t *src)
 
CGUL_EXPORT int cgul_hchar__hcscmp (const cgul_hchar_t *hs1, const cgul_hchar_t *hs2)
 
CGUL_EXPORT int cgul_hchar__isspace (cgul_exception_t *cex, cgul_hchar_t hc)
 

Variables

CGUL_BEGIN_C typedef cgul_uint16_t cgul_hchar_t
 
CGUL_EXPORT const cgul_hchar_t CGUL_HCHAR__EMPTY_STRING [1]
 

Detailed Description

This file defines cgul_hchar_t which is a 16-bit, wide-character type. It also defines the basic set of functions that manipulate raw, 16-bit, wide-character strings. It is called "hchar" because it is half the size of cgul_wchar_t (similar to how the printf() format string "%hu" describes an unsigned short).

Author
Paul Serice
See also
cgul_hstring
cgul_unicode
cgul_wchar

Macro Definition Documentation

§ CGUL_HCHAR__NUL

#define CGUL_HCHAR__NUL   ((cgul_hchar_t)0)

The NUL character in a cgul_hchar_t string.

§ CGUL_HCHAR__BOM

#define CGUL_HCHAR__BOM   ((cgul_hchar_t)0xfeff)

The Unicode Byte-Order Mark (BOM) character.

§ CGUL_HCHAR__MB_LEN_MAX

#define CGUL_HCHAR__MB_LEN_MAX   (4)

The maximum number of bytes needed to hold any UTF-8 multi-byte sequence that represents one Unicode character. A surrogate pair is required to represent Unicode values outside of the BMP (0x10000

  • 0x10ffff). A surrogate pair is two 16-bit characters or 4 bytes.

Function Documentation

§ cgul_hchar__hcslen()

CGUL_EXPORT size_t cgul_hchar__hcslen ( cgul_exception_t cex,
const cgul_hchar_t hs 
)

This function returns the number of 16-bit wide-characters in hs. The return value is determined by counting all the wide characters that come before the trailing NUL wide character. The caller is responsible for insuring that hs is NUL-terminated.

Note that this function does assumes the string is UCS-2, not UTF-16, encoded. Thus if the string is actually UTF-16 encoded, surrogate pairs will count as two characters. If you need an exact count of unicode characters, use cgul_unicode__get_hchar_count() instead.

Parameters
[in]cexc-style exception
[in]hs16-bit wide-character string
Returns
number of 16-bit wide characters in hs

Referenced by cgul_hchar_cxx::hcslen().

§ cgul_hchar__hcscpy()

CGUL_EXPORT void cgul_hchar__hcscpy ( cgul_exception_t cex,
cgul_hchar_t dst,
const cgul_hchar_t src 
)

This function copies the 16-bit wide characters in src to dst. The caller is responsible for making sure that src is NUL-terminated and that dst is large enough to hold all the characters in src including the trailing NUL character.

Parameters
[in]cexc-style exception
[in]dstdestination 16-bit wide-character string
[in]srcsource 16-bit wide-character string

Referenced by cgul_hchar_cxx::hcscpy().

§ cgul_hchar__hcscmp()

CGUL_EXPORT int cgul_hchar__hcscmp ( const cgul_hchar_t hs1,
const cgul_hchar_t hs2 
)

Perform a case-significant, 16-bit, wide-character string comparison of hs1 and hs2. Return -1, 0, or 1 if hs1 is less than, equal to, or greater than hs2 respectively.

For simplicity, this function compares strings stricly by the ordinal value of each character which should be sufficient, for example, when comparing strings when inserting them as keys into a cgul_rbtree; however, the comparison results are not likely to be the same as what strcoll() would return if used on the UTF-8 version of the strings.

Note
This function does not take a cgul_exception object as its first parameter in order to make it easier to use the function as the comparison function of cgul_rbtree objects.
Parameters
[in]hs1left-hand side
[in]hs2right-hand side
Returns
-1, 0, or 1 if hs1 is less than, equal to, or greater than hs2 respectively
See also
cgul_hstring__compare()

Referenced by cgul_hchar_cxx::hcscmp().

§ cgul_hchar__isspace()

CGUL_EXPORT int cgul_hchar__isspace ( cgul_exception_t cex,
cgul_hchar_t  hc 
)

Return whether the 16-bit, wide character hc is considered to be white-space. To avoid locale dependencies, white-space is defined as any character from the following list: ' ', '\t', '\n', '\r', '\f', '\v'

Parameters
[in]cexc-style exception
[in]hc16-bit, wide character
Returns
whether the character is white-space

Referenced by cgul_hchar_cxx::isspace().

Variable Documentation

§ cgul_hchar_t

CGUL_BEGIN_C typedef cgul_uint16_t cgul_hchar_t

The cgul_hchar_t typedef always defines a 16-bit, wide character which means it is always large enough to hold any Unicode value in the Basic Multilingual Plane (BMP). Surrogate pairs are required for characters outside the BMP.

If you need to convert between cgul_hchar_t and wchar_t, there are routines in cgul_unicode that convert between UTF-16 and UTF-8. You can then use an operating system dependent mechanism for converting from UTF-8 to wchar_t. Alternatively, it may just be easier to write simple loop that copies each character from a cgul_hchar_t string to a wchar_t string or vice versa.

Referenced by cgul_hstring_cxx::get_value(), cgul_hstring_cxx::get_value_at(), cgul_hstring_cxx::operator cgul_hchar_t *(), cgul_hstring_cxx::operator+=(), cgul_hstring_cxx::operator[](), and cgul_hstring_cxx::take_value().

§ CGUL_HCHAR__EMPTY_STRING

CGUL_EXPORT const cgul_hchar_t CGUL_HCHAR__EMPTY_STRING[1]

The most annoying thing about using cgul_hchar_t instead of hchar_t is that we lose support for the compiler automatically generating wide-character strings by simply prepending 'L' to the string literal. Because we often need the empty string, this class provides one as a convenience.