parser for comma-separated values (CSV) files More...
#include "cgul_common.h"
#include "cgul_exception.h"
#include "cgul_crlf_file.h"
#include "cgul_rbtree.h"
#include "cgul_string.h"
#include "cgul_vector.h"
Typedefs | |
typedef typedefCGUL_BEGIN_C struct cgul_csv * | cgul_csv_t |
This class is a parser for Comma-Separated Values (CSV) files or Tab-Separated Values (TSV) files. This class works automatically with DOS, Mac, and Unix text files. See RFC 4180.
typedef typedefCGUL_BEGIN_C struct cgul_csv* cgul_csv_t |
Opaque pointer to a cgul_csv
instance.
CGUL_EXPORT cgul_csv_t cgul_csv__new | ( | cgul_exception_t * | cex | ) |
This method creates a new cgul_csv
instance. After calling this method, an input source should be opened before calling any other method. Use cgul_csv__set_separator()
to change the default separator which is a comma. The client is responsible for calling cgul_csv__delete()
on the value returned. If an error occurs, NULL
is returned, and an exception is thrown.
[in,out] | cex | c-style exception |
cgul_csv
instance Referenced by cgul_csv_cxx::cgul_csv_cxx().
CGUL_EXPORT cgul_csv_t cgul_csv__new_from_fname | ( | cgul_exception_t * | cex, |
const char * | fname, | ||
int | is_block_buffered | ||
) |
Create a new cgul_csv
object and call cgul_csv__open_fname()
passing it fname
. This method opens fname
so that input can be read from it. For regular files, is_block_buffered
should be set to true; for character-oriented or line-oriented files, is_block_buffered
should be set to false. Use cgul_csv__set_separator()
to change the default separator which is a comma. The client is responsible for freeing the object by calling cgul_csv__delete()
. If an error occurs, NULL
is returned, and an exception is thrown.
[in,out] | cex | c-style exception |
[in] | fname | file name of stanza file |
[in] | is_block_buffered | whether fname is block buffered |
cgul_csv
instance Referenced by cgul_csv_cxx::cgul_csv_cxx().
CGUL_EXPORT cgul_csv_t cgul_csv__new_from_file | ( | cgul_exception_t * | cex, |
FILE * | f, | ||
int | is_block_buffered | ||
) |
Create a new cgul_csv
object and call cgul_csv__open_file()
passing it f
. This method uses f
when reading input. This class does not take ownership of f
. Thus, the client is still responsible for calling fclose()
on f
. For regular files, is_block_buffered
should be set to true; for character-oriented or line-oriented files, is_block_buffered
should be set to false. Use cgul_csv__set_separator()
to change the default separator which is a comma. The client is responsible for freeing the object by calling cgul_csv__delete()
. If an error occurs, NULL
is returned, and an exception is thrown.
[in,out] | cex | c-style exception |
[in] | f | file |
[in] | is_block_buffered | whether f is block buffered |
cgul_csv
instance Referenced by cgul_csv_cxx::cgul_csv_cxx().
CGUL_EXPORT cgul_csv_t cgul_csv__new_from_memory | ( | cgul_exception_t * | cex, |
const char * | buf, | ||
size_t | buf_size | ||
) |
Create a new cgul_csv
object and call cgul_csv__open_memory()
passing it buf
and buf_size
. This method uses buf
when reading input. This class does not take ownership of buf
. Thus, the client is still responsible for calling free()
on buf
if necessary. Use cgul_csv__set_separator()
to change the default separator which is a comma. The client is responsible for freeing the object by calling cgul_csv__delete()
. If an error occurs, NULL
is returned, and an exception is thrown.
[in,out] | cex | c-style exception |
[in] | buf | memory buffer |
[in] | buf_size | size of buf in bytes |
cgul_csv
instance Referenced by cgul_csv_cxx::cgul_csv_cxx().
CGUL_EXPORT cgul_csv_t cgul_csv__new_from_crlf_file | ( | cgul_exception_t * | cex, |
cgul_crlf_file_t | crlf_file | ||
) |
Create a new cgul_csv
object and call cgul_csv__open_crlf_file()
passing it crlf_file
. This makes it possible to read from any file type supported by cgul_crlf_file
like memory for example. Use cgul_csv__set_separator()
to change the default separator which is a comma. The client is responsible for freeing the object returned by calling cgul_csv__delete()
. If an error occurs, NULL
is returned, and an exception is thrown.
The crlf_file
object is used as the interface for reading lines of text from a data source. It should already be initialized before calling this method. This class does not take ownership of crlf_file
so the client is still responsible for calling cgul_crlf_file__delete()
on it.
[in,out] | cex | c-style exception |
[in] | crlf_file | interface used for reading lines of text |
cgul_csv
instance Referenced by cgul_csv_cxx::cgul_csv_cxx().
CGUL_EXPORT void cgul_csv__delete | ( | cgul_csv_t | csv | ) |
This method deletes csv
releasing all internally allocated resources. The client must not use csv
after calling this method.
[in] | csv | cgul_csv instance |
Referenced by cgul_csv_cxx::set_obj(), and cgul_csv_cxx::~cgul_csv_cxx().
CGUL_EXPORT void cgul_csv__open_fname | ( | cgul_exception_t * | cex, |
cgul_csv_t | csv, | ||
const char * | fname, | ||
int | is_block_buffered | ||
) |
Open the file with name fname
so that cgul_csv__load_next()
can be called. If an input source is already open, it will be closed before this method attempts to open the new source. The new file will be closed when this instance is deleted. For regular files, is_block_buffered
should be set to true; for character-oriented or line-oriented files, is_block_buffered
should be set to false. If an error occurs, an exception is thrown.
[in,out] | cex | c-style exception |
[in] | csv | cgul_csv instance |
[in] | fname | file name |
[in] | is_block_buffered | whether f is block buffered |
Referenced by cgul_csv_cxx::open_fname().
CGUL_EXPORT void cgul_csv__open_file | ( | cgul_exception_t * | cex, |
cgul_csv_t | csv, | ||
FILE * | f, | ||
int | is_block_buffered | ||
) |
Open f
so that cgul_csv__load_next()
can be called. If an input source is already open, it will be closed before this method attempts to open the new source. This class does not take ownership of f
. Thus, the client is still responsible for calling fclose()
on f
. For regular files, is_block_buffered
should be set to true; for character-oriented or line-oriented files, is_block_buffered
should be set to false. If an error occurs, an exception is thrown.
[in,out] | cex | c-style exception |
[in] | csv | cgul_csv instance |
[in] | f | file |
[in] | is_block_buffered | whether f is block buffered |
Referenced by cgul_csv_cxx::open_file().
CGUL_EXPORT void cgul_csv__open_memory | ( | cgul_exception_t * | cex, |
cgul_csv_t | csv, | ||
const char * | buf, | ||
size_t | buf_size | ||
) |
Open buf
so that cgul_csv__load_next()
can be called. If an input source is already open, it will be closed before this method attempts to open the new source. This class does not take ownership of buf
. Thus, the client is still responsible for calling free()
on buf
if necessary. If an error occurs, an exception is thrown.
[in,out] | cex | c-style exception |
[in] | csv | cgul_csv instance |
[in] | buf | memory buffer |
[in] | buf_size | size of buf in bytes |
Referenced by cgul_csv_cxx::open_memory().
CGUL_EXPORT void cgul_csv__open_crlf_file | ( | cgul_exception_t * | cex, |
cgul_csv_t | csv, | ||
cgul_crlf_file_t | crlf_file | ||
) |
Open crlf_file
so that CSV records can be read from it. If an error occurs, an exception is thrown.
The crlf_file
object is used as the interface for reading lines of text from a data source. It should already be initialized before calling this method. This class does not take ownership of crlf_file
so the client is still responsible for calling cgul_crlf_file__delete()
on it.
[in,out] | cex | c-style exception |
[in] | csv | cgul_csv instance |
[in] | crlf_file | interface used for reading lines of text |
Referenced by cgul_csv_cxx::open_crlf_file().
CGUL_EXPORT void cgul_csv__close | ( | cgul_exception_t * | cex, |
cgul_csv_t | csv | ||
) |
This method closes csv
. After calling this method, an input source should be opened before calling any other method.
[in] | cex | c-style exception |
[in] | csv | cgul_csv instance |
Referenced by cgul_csv_cxx::close().
CGUL_EXPORT char cgul_csv__get_separator | ( | cgul_exception_t * | cex, |
cgul_csv_t | csv | ||
) |
Get the character that is currently being used as the separator. By default, the separator is a comma.
[in] | cex | c-style exception |
[in] | csv | cgul_csv instance |
Referenced by cgul_csv_cxx::get_separator().
CGUL_EXPORT void cgul_csv__set_separator | ( | cgul_exception_t * | cex, |
cgul_csv_t | csv, | ||
char | separator | ||
) |
Set the character that will be used as the separator. By default, the separator is a comma.
[in] | cex | c-style exception |
[in] | csv | cgul_csv instance |
[in] | separator | separator |
Referenced by cgul_csv_cxx::set_separator().
CGUL_EXPORT int cgul_csv__load_next | ( | cgul_exception_t * | cex, |
cgul_csv_t | csv | ||
) |
This method loads the next csv record. The client can use cgul_csv__lookup()
or cgul_csv__take_value()
to lookup the value for a particular field or cgul_csv__get_vector()
to iterate over all the fields.
The client can also use cgul_csv__lookup_by_header()
or cgul_csv__take_value_by_header()
to lookup the value associated with a particular header.
On success, 1 is returned. On EOF, 0 is returned. If an error occurs, 0 is returned, and an exception is thrown.
[in,out] | cex | c-style exception |
[in] | csv | cgul_csv instance |
Referenced by cgul_csv_cxx::load_next().
CGUL_EXPORT unsigned long int cgul_csv__get_number_of_fields | ( | cgul_exception_t * | cex, |
cgul_csv_t | csv | ||
) |
This method returns the number of fields that are present in the current record. This method throws an exception only if an input source has not been opened yet.
[in,out] | cex | c-style exception |
[in] | csv | cgul_csv instance |
Referenced by cgul_csv_cxx::get_number_of_fields().
CGUL_EXPORT const char* cgul_csv__lookup | ( | cgul_exception_t * | cex, |
cgul_csv_t | csv, | ||
unsigned long int | field | ||
) |
This method returns the value in the field
for the current record. The pointer returned is owned by this class and will be invalidated the next time cgul_csv__load_next()
is called. If field
is out of bounds, NULL
is returned, and an exception is thrown.
If the client is going to make a copy of the value returned, it should probably use cgul_csv__take_value()
instead which functions the same as this method except ownership is transfered to the client thus avoiding a copy.
If you need more functionality than what is provided by this method, you can use cgul_csv__get_vector()
and iterate over each field instead.
[in,out] | cex | c-style exception |
[in] | csv | cgul_csv instance |
[in] | field | field |
Referenced by cgul_csv_cxx::lookup().
CGUL_EXPORT const char* cgul_csv__lookup_by_header | ( | cgul_exception_t * | cex, |
cgul_csv_t | csv, | ||
const char * | header | ||
) |
This method returns the value in the field associated with header
for the current record. The pointer returned is owned by this class and will be invalidated the next time cgul_csv__load_next()
is called. If header
does not exist, NULL
is returned.
The basic idea is that many CSV files use the first row to store the headers for each column. The first time cgul_csv__load_next()
is called, it remembers the position of each header allowing this method to retrieve subsequent values by using the string for the header instead of the integer for the position.
If two columns have the same header, the value in the first column will be returned.
If the client is going to make a copy of the value returned, it should probably use cgul_csv__take_value_by_header()
instead which functions the same as this method except ownership is transfered to the client thus avoiding a copy.
If headers are manually added to the CSV file, remember that space is significant. So for this method to work, usually the headers in the CSV file will only be separated by commas (or tabs), not spaces.
This method throws an exception only if an input source has not been opened yet.
[in,out] | cex | c-style exception |
[in] | csv | cgul_csv instance |
[in] | header | header |
header
Referenced by cgul_csv_cxx::lookup_by_header().
CGUL_EXPORT char* cgul_csv__take_value | ( | cgul_exception_t * | cex, |
cgul_csv_t | csv, | ||
unsigned long int | field | ||
) |
This method returns the value in the field
for the current record. Ownership of the pointer returned is transfered to the client. Thus, the client must arrange for free()
to be called on the pointer. If field
is out of bounds, NULL
is returned, and an exception is thrown.
Note that once the value for a particular field has been taken, subsequent lookups on that field return NULL
for the value. In order to do otherwise would require making a copy of the value which would defeat the purpose of this method.
If the client only needs to inspect the value, it should probably use cgul_csv__lookup()
instead allowing the cgul_csv
instance to retain ownership thereby guaranteeing that the pointer will be freed.
If you need more functionality than what is provided by this method, you can use cgul_csv__get_vector()
and iterate over each field instead.
[in,out] | cex | c-style exception |
[in] | csv | cgul_csv instance |
[in] | field | field |
Referenced by cgul_csv_cxx::take_value().
CGUL_EXPORT char* cgul_csv__take_value_by_header | ( | cgul_exception_t * | cex, |
cgul_csv_t | csv, | ||
const char * | header | ||
) |
This method returns the value in the field associated with header
for the current record. Ownership of the pointer returned is transfered to the client. Thus, the client must arrange for free()
to be called on the pointer. If header
does not exist, NULL
is returned.
Note that once the value for a particular header has been taken, subsequent lookups on that field return NULL
for the value. In order to do otherwise would require making a copy of the value which would defeat the purpose of this method.
If the client only needs to inspect the value, it should probably use cgul_csv__lookup_by_header()
instead allowing the cgul_csv
instance to retain ownership thereby guaranteeing that the pointer will be freed.
If headers are manually added to the CSV file, remember that space is significant. So for this method to work, usually the headers in the CSV file will only be separated by commas (or tabs), not spaces.
This method throws an exception only if an input source has not been opened yet.
[in,out] | cex | c-style exception |
[in] | csv | cgul_csv instance |
[in] | header | header |
header
Referenced by cgul_csv_cxx::take_value_by_header().
CGUL_EXPORT cgul_vector_t cgul_csv__get_vector | ( | cgul_exception_t * | cex, |
cgul_csv_t | csv | ||
) |
This method returns the internal cgul_vector
that is used to hold all the fields of the current record. The client only needs to get the vector once. It will not be invalidated until cgul_csv__delete()
is called.
The fields in the vector are char*
types.
The client is free to take ownership of the strings for any field provided the client notifies this class that ownership has been transfered by setting the corresponding field in the vector to NULL
. The client must call free()
on any pointer taken.
This method throws an exception only if an input source has not been opened yet.
[in,out] | cex | c-style exception |
[in] | csv | cgul_csv instance |
CGUL_EXPORT cgul_rbtree_t cgul_csv__get_headers | ( | cgul_exception_t * | cex, |
cgul_csv_t | csv | ||
) |
Get the headers for the CSV file. This method returns a pointer to the internal cgul_rbtree
that maps each header to its zero-based index for its column. This method can be called at any time, but the tree will not be initialized until cgul_csv__load_next()
has been called the first time.
To iterate over the headers in the order they appear in the CSV file, use cgul_rbtree__get_oldest()
and cgul_rbtree_node__get_younger()
. The key for each cgul_rbtree_node
is the header for each column.
This method throws an exception only if an input source has not been opened yet.
[in,out] | cex | c-style exception |
[in] | csv | cgul_csv instance |
CGUL_EXPORT void cgul_csv__quote | ( | cgul_exception_t * | cex, |
const char * | bare_field, | ||
cgul_string_t | quoted_field | ||
) |
This static method is used to quote bare_field
when writing a CSV file. It appends the result to the quoted_field
string. This method only properly quotes bare_field
. It does not append the commas (or tabs) that need to appear between fields. If an error occurs, an exception is thrown.
To write a CSV file, simply use fopen()
to open a text file for writing. Use this method to quote each field before writing it to the CSV file, and do not forget to manual insert commas (or tabs) between fields as needed.
The tests/main_csv_sort.c program demonstrates the process. It reorders a CSV file by sorting the columns alphabetically based on the headers for each column.
[in,out] | cex | c-style exception |
[in] | bare_field | bare field |
[out] | quoted_field | quoted field |
Referenced by cgul_csv_cxx::quote().