C++ bindings for cgul_crlf
More...
#include <cgul_crlf_cxx.h>
Public Member Functions | |
cgul_crlf_cxx () | |
virtual | ~cgul_crlf_cxx () |
virtual void | reset (unsigned long offset=0) |
virtual int | get_strip_utf8_bom () const |
virtual void | set_strip_utf8_bom (int strip_utf8_bom) |
virtual void | convert (char *buf, unsigned long int bsize) |
virtual const char * | get_line () const |
virtual unsigned long | get_line_count () const |
virtual unsigned long | get_line_offset () const |
virtual const char * | get_remainder () |
virtual void | convert_file (FILE *fin, FILE *fout, const char *eol) |
virtual cgul_crlf_t | get_obj () const |
virtual cgul_crlf_t | take_obj () |
virtual void | set_obj (cgul_crlf_t rhs) |
This class provides the C++ bindings for C cgul_crlf
objects. The main purpose of this class is to convert the C-style function calls and exception handling in cgul_crlf
into C++-style function calls and exception handling.
|
inline |
Default Constructor. If memory cannot be allocated, an exception is thrown.
References cgul_crlf__new().
Referenced by set_obj().
|
inlinevirtual |
Destructor.
References cgul_crlf__delete().
|
inlinevirtual |
This method is used to reset the object so that it can process a new stream of text or process the same stream of text after seeking to a different location in the stream.
The client must inform this class of the new offset into the underlying file so that subsequent calls to get_line_offset()
can return correct values. The value of offset
should be zero-based. Thus, to start processing at the beginning of a new file call this method with offset
set to 0
.
Calling this method resets the line count to zero. This can be used by the client to implement a line counter that does not overflow.
Calling this method does not reset whether a leading UTF-8 byte-order mark (BOM) is stripped.
[in] | offset | zero-based offset |
References cgul_crlf__reset().
|
inlinevirtual |
This method returns whether the leading UTF-8 byte-order mark (BOM) should be stripped from the first line if it is present.
References cgul_crlf__get_strip_utf8_bom().
|
inlinevirtual |
By default, this class detects the leading UTF-8 byte-order mark (BOM) and strips it from the first line returned by get_line()
if it is present. It then clears its internal flag so that BOMs internal to the text file will be returned. This is generally what you want because the leading BOM is not significant but the internal BOMs are.
You can alter the way this class handles the leading BOM by calling this method with strip_utf8_bom
set to 0
. This will cause the leading BOM to be returned as part of the first line. This can be useful, for example, if you just want to convert the text file and are not interested in its contents.
The value of strip_utf8_bom
is remembered across calls to reset()
.
It should be noted that most operating systems do not save UTF-8 text files with a leading BOM because UTF-8 is a character stream and, as such, does not have byte-order problems; however, Microsoft Windows adds the BOM to its UTF-8 text files presumably to help distinguish UTF-8 text files from text files with different encodings.
[in] | strip_utf8_bom | whether to strip leading UTF-8 byte-order mark |
References cgul_crlf__set_strip_utf8_bom().
|
inlinevirtual |
The caller feeds this method a block of text in buf
of size bsize
. The blocks you feed this method can end anywhere; they do not have to end exactly on a line boundary. This method knows how to splice together partial lines from the last call to form arbitrarily long lines using any of the common EOL markers: "\n", "\r", or "\r\n"
After each call to this method, you MUST call get_line()
iteratively until it returns NULL
before feeding this method another block.
Do not alter buf
until after you have exhausted get_line()
. This prevents convert()
from having to make a copy of each block because this method often (but not always) inserts NUL
characters directly into buf
to produce the lines returned by get_line()
.
After feeding the last block to this method and exhausting get_line()
, you should call get_remainder()
to fetch what remained if the last line had no trailing EOL marker.
This method dynamically allocates space to hold the lines that are split across calls to this method. If an error occurs, an exception is thrown, and the object will be in an undefined state.
WARNING: Because this method embeds NUL
characters directly into buf
to produce the lines returned by get_line()
, it goes without saying that buf
must be writable. What might not be obvious is that this means buf
probably should not be allocated on the stack because many operating systems have security mechanisms to prevent unexpected writes to stack variables.
[in] | buf | buffer |
[in] | bsize | buffer size |
References cgul_crlf__convert().
|
inlinevirtual |
After seeding this object by calling convert()
, you call this method to fetch the next line. If a line is ready, this method returns it. If no line is ready, this method returns NULL
. The caller should not try to call free()
on the line returned because it is really just a pointer back to the contents of the buffer passed into convert()
.
If this method does not return NULL
, you should keep calling it until it does. Once it returns NULL
, you can either refill this object by calling convert()
with the next block or call get_remainder()
to finish.
References cgul_crlf__get_line().
|
inlinevirtual |
This method returns the total number of lines returned by get_line()
and get_remainder()
. The line count is one-based. No attempt is made to prevent the return value from overflowing. So, the caller is responsible for verifying the return value.
Calls to reset()
reset the line count to zero. This can be used by the client to implement a line counter that does not overflow.
References cgul_crlf__get_line_count().
|
inlinevirtual |
This method returns the offset of the last line returned by get_line()
or get_remainder()
. The offset is zero-based. If you are feeding a binary stream into convert()
and if the stream is also a random access stream, you can use the return value to directly seek to the line as follows:
fseek(f, offset, SEEK_SET);
Because the prototype for fseek()
requires a long
for the offset parameter, no attempt is made to prevent the return value from overflowing. So, the caller is responsible for verifying the return value.
Note that the offset returned is basically the number of bytes from the start of the file to the current line. This is not necessarily the same as the number of characters which depends on how the file is encoded.
To get the offset of the remainder, just call get_remainder()
before calling this method.
This method throws an exception if, after converting a new block, it is called before get_line()
is called.
References cgul_crlf__get_line_offset().
|
inlinevirtual |
This is the last method you should call, and it should only be called once. It should be called only after all the blocks have been feed to convert()
and only after get_line()
has been exhausted. At this point, all that is left is the remainder.
This method returns NULL
if a remainder does not exist. The only time a remainder exists is if the last line in the file is missing the final EOL marker.
After calling this function, use get_line_offset()
to get the offset of the remainder.
The caller should not try to call free()
on the pointer returned because it points to an internal string that will be freed when delete()
is called.
References cgul_crlf__get_remainder().
|
inlinevirtual |
This method copies fin
to fout
stripping the original EOL markers and replacing them with eol
. fin
and fout
must have been opened in binary mode. This method internally uses a cgul_crlf
object to perform the conversion. If an error occurs, an exception is thrown.
Note that this method use C-style files because portably dealing with C++ files is very difficult.
[in] | fin | input file |
[out] | fout | output file |
[in] | eol | new EOL marker |
References cgul_crlf__convert_file().
|
inlinevirtual |
Get the underlying cgul_crlf
object.
|
inlinevirtual |
Take the underlying cgul_crlf
object. This means the underlying object will not be deleted when the wrapper goes out of scope. Also, because you have taken the underlying object, no other methods should be called on this wrapper's instance. Lastly, after taking the underlying object, it is the caller's responsibility to delete the underlying object by calling cgul_crlf__delete()
.
|
inlinevirtual |
Set the new underlying object to rhs
. This causes the old underlying object to be deleted which invalidates any outstanding pointers to or iterators for the old underlying object.
This instance takes ownership of rhs
which means rhs
will be automatically deleted when the C++ wrapper is deleted. To prevent automatic deletion of rhs
, call take_obj()
when the C++ wrapper is no longer needed.
[in] | rhs | right-hand side |
References cgul_crlf__delete(), and cgul_crlf_cxx().