cgul_csv.h File Reference

parser for comma-separated values (CSV) files More...

#include "cgul_common.h"
#include "cgul_exception.h"
#include "cgul_crlf_file.h"
#include "cgul_rbtree.h"
#include "cgul_string.h"
#include "cgul_vector.h"
Include dependency graph for cgul_csv.h:
This graph shows which files directly or indirectly include this file:

Typedefs

typedef typedefCGUL_BEGIN_C struct cgul_csv * cgul_csv_t
 

Functions

CGUL_EXPORT cgul_csv_t cgul_csv__new (cgul_exception_t *cex)
 
CGUL_EXPORT cgul_csv_t cgul_csv__new_from_fname (cgul_exception_t *cex, const char *fname, int is_block_buffered)
 
CGUL_EXPORT cgul_csv_t cgul_csv__new_from_file (cgul_exception_t *cex, FILE *f, int is_block_buffered)
 
CGUL_EXPORT cgul_csv_t cgul_csv__new_from_memory (cgul_exception_t *cex, const char *buf, size_t buf_size)
 
CGUL_EXPORT cgul_csv_t cgul_csv__new_from_crlf_file (cgul_exception_t *cex, cgul_crlf_file_t crlf_file)
 
CGUL_EXPORT void cgul_csv__delete (cgul_csv_t csv)
 
CGUL_EXPORT void cgul_csv__open_fname (cgul_exception_t *cex, cgul_csv_t csv, const char *fname, int is_block_buffered)
 
CGUL_EXPORT void cgul_csv__open_file (cgul_exception_t *cex, cgul_csv_t csv, FILE *f, int is_block_buffered)
 
CGUL_EXPORT void cgul_csv__open_memory (cgul_exception_t *cex, cgul_csv_t csv, const char *buf, size_t buf_size)
 
CGUL_EXPORT void cgul_csv__open_crlf_file (cgul_exception_t *cex, cgul_csv_t csv, cgul_crlf_file_t crlf_file)
 
CGUL_EXPORT void cgul_csv__close (cgul_exception_t *cex, cgul_csv_t csv)
 
CGUL_EXPORT char cgul_csv__get_separator (cgul_exception_t *cex, cgul_csv_t csv)
 
CGUL_EXPORT void cgul_csv__set_separator (cgul_exception_t *cex, cgul_csv_t csv, char separator)
 
CGUL_EXPORT int cgul_csv__load_next (cgul_exception_t *cex, cgul_csv_t csv)
 
CGUL_EXPORT unsigned long int cgul_csv__get_number_of_fields (cgul_exception_t *cex, cgul_csv_t csv)
 
CGUL_EXPORT const char * cgul_csv__lookup (cgul_exception_t *cex, cgul_csv_t csv, unsigned long int field)
 
CGUL_EXPORT const char * cgul_csv__lookup_by_header (cgul_exception_t *cex, cgul_csv_t csv, const char *header)
 
CGUL_EXPORT char * cgul_csv__take_value (cgul_exception_t *cex, cgul_csv_t csv, unsigned long int field)
 
CGUL_EXPORT char * cgul_csv__take_value_by_header (cgul_exception_t *cex, cgul_csv_t csv, const char *header)
 
CGUL_EXPORT cgul_vector_t cgul_csv__get_vector (cgul_exception_t *cex, cgul_csv_t csv)
 
CGUL_EXPORT cgul_rbtree_t cgul_csv__get_headers (cgul_exception_t *cex, cgul_csv_t csv)
 
CGUL_EXPORT void cgul_csv__quote (cgul_exception_t *cex, const char *bare_field, cgul_string_t quoted_field)
 

Detailed Description

This class is a parser for Comma-Separated Values (CSV) files or Tab-Separated Values (TSV) files. This class works automatically with DOS, Mac, and Unix text files. See RFC 4180.

Author
Paul Serice

Typedef Documentation

§ cgul_csv_t

typedef typedefCGUL_BEGIN_C struct cgul_csv* cgul_csv_t

Opaque pointer to a cgul_csv instance.

Function Documentation

§ cgul_csv__new()

CGUL_EXPORT cgul_csv_t cgul_csv__new ( cgul_exception_t cex)

This method creates a new cgul_csv instance. After calling this method, an input source should be opened before calling any other method. Use cgul_csv__set_separator() to change the default separator which is a comma. The client is responsible for calling cgul_csv__delete() on the value returned. If an error occurs, NULL is returned, and an exception is thrown.

Parameters
[in,out]cexc-style exception
Returns
new cgul_csv instance

Referenced by cgul_csv_cxx::cgul_csv_cxx().

§ cgul_csv__new_from_fname()

CGUL_EXPORT cgul_csv_t cgul_csv__new_from_fname ( cgul_exception_t cex,
const char *  fname,
int  is_block_buffered 
)

Create a new cgul_csv object and call cgul_csv__open_fname() passing it fname. This method opens fname so that input can be read from it. For regular files, is_block_buffered should be set to true; for character-oriented or line-oriented files, is_block_buffered should be set to false. Use cgul_csv__set_separator() to change the default separator which is a comma. The client is responsible for freeing the object by calling cgul_csv__delete(). If an error occurs, NULL is returned, and an exception is thrown.

Parameters
[in,out]cexc-style exception
[in]fnamefile name of stanza file
[in]is_block_bufferedwhether fname is block buffered
Returns
new cgul_csv instance

Referenced by cgul_csv_cxx::cgul_csv_cxx().

§ cgul_csv__new_from_file()

CGUL_EXPORT cgul_csv_t cgul_csv__new_from_file ( cgul_exception_t cex,
FILE *  f,
int  is_block_buffered 
)

Create a new cgul_csv object and call cgul_csv__open_file() passing it f. This method uses f when reading input. This class does not take ownership of f. Thus, the client is still responsible for calling fclose() on f. For regular files, is_block_buffered should be set to true; for character-oriented or line-oriented files, is_block_buffered should be set to false. Use cgul_csv__set_separator() to change the default separator which is a comma. The client is responsible for freeing the object by calling cgul_csv__delete(). If an error occurs, NULL is returned, and an exception is thrown.

Parameters
[in,out]cexc-style exception
[in]ffile
[in]is_block_bufferedwhether f is block buffered
Returns
new cgul_csv instance

Referenced by cgul_csv_cxx::cgul_csv_cxx().

§ cgul_csv__new_from_memory()

CGUL_EXPORT cgul_csv_t cgul_csv__new_from_memory ( cgul_exception_t cex,
const char *  buf,
size_t  buf_size 
)

Create a new cgul_csv object and call cgul_csv__open_memory() passing it buf and buf_size. This method uses buf when reading input. This class does not take ownership of buf. Thus, the client is still responsible for calling free() on buf if necessary. Use cgul_csv__set_separator() to change the default separator which is a comma. The client is responsible for freeing the object by calling cgul_csv__delete(). If an error occurs, NULL is returned, and an exception is thrown.

Parameters
[in,out]cexc-style exception
[in]bufmemory buffer
[in]buf_sizesize of buf in bytes
Returns
new cgul_csv instance

Referenced by cgul_csv_cxx::cgul_csv_cxx().

§ cgul_csv__new_from_crlf_file()

CGUL_EXPORT cgul_csv_t cgul_csv__new_from_crlf_file ( cgul_exception_t cex,
cgul_crlf_file_t  crlf_file 
)

Create a new cgul_csv object and call cgul_csv__open_crlf_file() passing it crlf_file. This makes it possible to read from any file type supported by cgul_crlf_file like memory for example. Use cgul_csv__set_separator() to change the default separator which is a comma. The client is responsible for freeing the object returned by calling cgul_csv__delete(). If an error occurs, NULL is returned, and an exception is thrown.

The crlf_file object is used as the interface for reading lines of text from a data source. It should already be initialized before calling this method. This class does not take ownership of crlf_file so the client is still responsible for calling cgul_crlf_file__delete() on it.

Parameters
[in,out]cexc-style exception
[in]crlf_fileinterface used for reading lines of text
Returns
new cgul_csv instance

Referenced by cgul_csv_cxx::cgul_csv_cxx().

§ cgul_csv__delete()

CGUL_EXPORT void cgul_csv__delete ( cgul_csv_t  csv)

This method deletes csv releasing all internally allocated resources. The client must not use csv after calling this method.

Parameters
[in]csvcgul_csv instance

Referenced by cgul_csv_cxx::set_obj(), and cgul_csv_cxx::~cgul_csv_cxx().

§ cgul_csv__open_fname()

CGUL_EXPORT void cgul_csv__open_fname ( cgul_exception_t cex,
cgul_csv_t  csv,
const char *  fname,
int  is_block_buffered 
)

Open the file with name fname so that cgul_csv__load_next() can be called. If an input source is already open, it will be closed before this method attempts to open the new source. The new file will be closed when this instance is deleted. For regular files, is_block_buffered should be set to true; for character-oriented or line-oriented files, is_block_buffered should be set to false. If an error occurs, an exception is thrown.

Parameters
[in,out]cexc-style exception
[in]csvcgul_csv instance
[in]fnamefile name
[in]is_block_bufferedwhether f is block buffered

Referenced by cgul_csv_cxx::open_fname().

§ cgul_csv__open_file()

CGUL_EXPORT void cgul_csv__open_file ( cgul_exception_t cex,
cgul_csv_t  csv,
FILE *  f,
int  is_block_buffered 
)

Open f so that cgul_csv__load_next() can be called. If an input source is already open, it will be closed before this method attempts to open the new source. This class does not take ownership of f. Thus, the client is still responsible for calling fclose() on f. For regular files, is_block_buffered should be set to true; for character-oriented or line-oriented files, is_block_buffered should be set to false. If an error occurs, an exception is thrown.

Parameters
[in,out]cexc-style exception
[in]csvcgul_csv instance
[in]ffile
[in]is_block_bufferedwhether f is block buffered

Referenced by cgul_csv_cxx::open_file().

§ cgul_csv__open_memory()

CGUL_EXPORT void cgul_csv__open_memory ( cgul_exception_t cex,
cgul_csv_t  csv,
const char *  buf,
size_t  buf_size 
)

Open buf so that cgul_csv__load_next() can be called. If an input source is already open, it will be closed before this method attempts to open the new source. This class does not take ownership of buf. Thus, the client is still responsible for calling free() on buf if necessary. If an error occurs, an exception is thrown.

Parameters
[in,out]cexc-style exception
[in]csvcgul_csv instance
[in]bufmemory buffer
[in]buf_sizesize of buf in bytes

Referenced by cgul_csv_cxx::open_memory().

§ cgul_csv__open_crlf_file()

CGUL_EXPORT void cgul_csv__open_crlf_file ( cgul_exception_t cex,
cgul_csv_t  csv,
cgul_crlf_file_t  crlf_file 
)

Open crlf_file so that CSV records can be read from it. If an error occurs, an exception is thrown.

The crlf_file object is used as the interface for reading lines of text from a data source. It should already be initialized before calling this method. This class does not take ownership of crlf_file so the client is still responsible for calling cgul_crlf_file__delete() on it.

Parameters
[in,out]cexc-style exception
[in]csvcgul_csv instance
[in]crlf_fileinterface used for reading lines of text

Referenced by cgul_csv_cxx::open_crlf_file().

§ cgul_csv__close()

CGUL_EXPORT void cgul_csv__close ( cgul_exception_t cex,
cgul_csv_t  csv 
)

This method closes csv. After calling this method, an input source should be opened before calling any other method.

Parameters
[in]cexc-style exception
[in]csvcgul_csv instance

Referenced by cgul_csv_cxx::close().

§ cgul_csv__get_separator()

CGUL_EXPORT char cgul_csv__get_separator ( cgul_exception_t cex,
cgul_csv_t  csv 
)

Get the character that is currently being used as the separator. By default, the separator is a comma.

Parameters
[in]cexc-style exception
[in]csvcgul_csv instance
Returns
separator

Referenced by cgul_csv_cxx::get_separator().

§ cgul_csv__set_separator()

CGUL_EXPORT void cgul_csv__set_separator ( cgul_exception_t cex,
cgul_csv_t  csv,
char  separator 
)

Set the character that will be used as the separator. By default, the separator is a comma.

Parameters
[in]cexc-style exception
[in]csvcgul_csv instance
[in]separatorseparator

Referenced by cgul_csv_cxx::set_separator().

§ cgul_csv__load_next()

CGUL_EXPORT int cgul_csv__load_next ( cgul_exception_t cex,
cgul_csv_t  csv 
)

This method loads the next csv record. The client can use cgul_csv__lookup() or cgul_csv__take_value() to lookup the value for a particular field or cgul_csv__get_vector() to iterate over all the fields.

The client can also use cgul_csv__lookup_by_header() or cgul_csv__take_value_by_header() to lookup the value associated with a particular header.

On success, 1 is returned. On EOF, 0 is returned. If an error occurs, 0 is returned, and an exception is thrown.

Parameters
[in,out]cexc-style exception
[in]csvcgul_csv instance
Returns
whether EOF was not reached

Referenced by cgul_csv_cxx::load_next().

§ cgul_csv__get_number_of_fields()

CGUL_EXPORT unsigned long int cgul_csv__get_number_of_fields ( cgul_exception_t cex,
cgul_csv_t  csv 
)

This method returns the number of fields that are present in the current record. This method throws an exception only if an input source has not been opened yet.

Parameters
[in,out]cexc-style exception
[in]csvcgul_csv instance
Returns
number of fields in the current record

Referenced by cgul_csv_cxx::get_number_of_fields().

§ cgul_csv__lookup()

CGUL_EXPORT const char* cgul_csv__lookup ( cgul_exception_t cex,
cgul_csv_t  csv,
unsigned long int  field 
)

This method returns the value in the field for the current record. The pointer returned is owned by this class and will be invalidated the next time cgul_csv__load_next() is called. If field is out of bounds, NULL is returned, and an exception is thrown.

If the client is going to make a copy of the value returned, it should probably use cgul_csv__take_value() instead which functions the same as this method except ownership is transfered to the client thus avoiding a copy.

If you need more functionality than what is provided by this method, you can use cgul_csv__get_vector() and iterate over each field instead.

Parameters
[in,out]cexc-style exception
[in]csvcgul_csv instance
[in]fieldfield
Returns
value present in the field

Referenced by cgul_csv_cxx::lookup().

§ cgul_csv__lookup_by_header()

CGUL_EXPORT const char* cgul_csv__lookup_by_header ( cgul_exception_t cex,
cgul_csv_t  csv,
const char *  header 
)

This method returns the value in the field associated with header for the current record. The pointer returned is owned by this class and will be invalidated the next time cgul_csv__load_next() is called. If header does not exist, NULL is returned.

The basic idea is that many CSV files use the first row to store the headers for each column. The first time cgul_csv__load_next() is called, it remembers the position of each header allowing this method to retrieve subsequent values by using the string for the header instead of the integer for the position.

If two columns have the same header, the value in the first column will be returned.

If the client is going to make a copy of the value returned, it should probably use cgul_csv__take_value_by_header() instead which functions the same as this method except ownership is transfered to the client thus avoiding a copy.

If headers are manually added to the CSV file, remember that space is significant. So for this method to work, usually the headers in the CSV file will only be separated by commas (or tabs), not spaces.

This method throws an exception only if an input source has not been opened yet.

Parameters
[in,out]cexc-style exception
[in]csvcgul_csv instance
[in]headerheader
Returns
value present in the field associated with header

Referenced by cgul_csv_cxx::lookup_by_header().

§ cgul_csv__take_value()

CGUL_EXPORT char* cgul_csv__take_value ( cgul_exception_t cex,
cgul_csv_t  csv,
unsigned long int  field 
)

This method returns the value in the field for the current record. Ownership of the pointer returned is transfered to the client. Thus, the client must arrange for free() to be called on the pointer. If field is out of bounds, NULL is returned, and an exception is thrown.

Note that once the value for a particular field has been taken, subsequent lookups on that field return NULL for the value. In order to do otherwise would require making a copy of the value which would defeat the purpose of this method.

If the client only needs to inspect the value, it should probably use cgul_csv__lookup() instead allowing the cgul_csv instance to retain ownership thereby guaranteeing that the pointer will be freed.

If you need more functionality than what is provided by this method, you can use cgul_csv__get_vector() and iterate over each field instead.

Parameters
[in,out]cexc-style exception
[in]csvcgul_csv instance
[in]fieldfield
Returns
value present in the field

Referenced by cgul_csv_cxx::take_value().

§ cgul_csv__take_value_by_header()

CGUL_EXPORT char* cgul_csv__take_value_by_header ( cgul_exception_t cex,
cgul_csv_t  csv,
const char *  header 
)

This method returns the value in the field associated with header for the current record. Ownership of the pointer returned is transfered to the client. Thus, the client must arrange for free() to be called on the pointer. If header does not exist, NULL is returned.

Note that once the value for a particular header has been taken, subsequent lookups on that field return NULL for the value. In order to do otherwise would require making a copy of the value which would defeat the purpose of this method.

If the client only needs to inspect the value, it should probably use cgul_csv__lookup_by_header() instead allowing the cgul_csv instance to retain ownership thereby guaranteeing that the pointer will be freed.

If headers are manually added to the CSV file, remember that space is significant. So for this method to work, usually the headers in the CSV file will only be separated by commas (or tabs), not spaces.

This method throws an exception only if an input source has not been opened yet.

Parameters
[in,out]cexc-style exception
[in]csvcgul_csv instance
[in]headerheader
Returns
value present in the field associated with header

Referenced by cgul_csv_cxx::take_value_by_header().

§ cgul_csv__get_vector()

CGUL_EXPORT cgul_vector_t cgul_csv__get_vector ( cgul_exception_t cex,
cgul_csv_t  csv 
)

This method returns the internal cgul_vector that is used to hold all the fields of the current record. The client only needs to get the vector once. It will not be invalidated until cgul_csv__delete() is called.

The fields in the vector are char* types.

The client is free to take ownership of the strings for any field provided the client notifies this class that ownership has been transfered by setting the corresponding field in the vector to NULL. The client must call free() on any pointer taken.

This method throws an exception only if an input source has not been opened yet.

Parameters
[in,out]cexc-style exception
[in]csvcgul_csv instance
Returns
vector of fields

§ cgul_csv__get_headers()

CGUL_EXPORT cgul_rbtree_t cgul_csv__get_headers ( cgul_exception_t cex,
cgul_csv_t  csv 
)

Get the headers for the CSV file. This method returns a pointer to the internal cgul_rbtree that maps each header to its zero-based index for its column. This method can be called at any time, but the tree will not be initialized until cgul_csv__load_next() has been called the first time.

To iterate over the headers in the order they appear in the CSV file, use cgul_rbtree__get_oldest() and cgul_rbtree_node__get_younger(). The key for each cgul_rbtree_node is the header for each column.

This method throws an exception only if an input source has not been opened yet.

Parameters
[in,out]cexc-style exception
[in]csvcgul_csv instance
Returns
tree that maps each header to the zero-based index for its column

§ cgul_csv__quote()

CGUL_EXPORT void cgul_csv__quote ( cgul_exception_t cex,
const char *  bare_field,
cgul_string_t  quoted_field 
)

This static method is used to quote bare_field when writing a CSV file. It appends the result to the quoted_field string. This method only properly quotes bare_field. It does not append the commas (or tabs) that need to appear between fields. If an error occurs, an exception is thrown.

To write a CSV file, simply use fopen() to open a text file for writing. Use this method to quote each field before writing it to the CSV file, and do not forget to manual insert commas (or tabs) between fields as needed.

The tests/main_csv_sort.c program demonstrates the process. It reorders a CSV file by sorting the columns alphabetically based on the headers for each column.

Parameters
[in,out]cexc-style exception
[in]bare_fieldbare field
[out]quoted_fieldquoted field

Referenced by cgul_csv_cxx::quote().