cgul_mime.h File Reference

parser for multipart MIME messages More...

#include "cgul_common.h"
#include "cgul_exception.h"
Include dependency graph for cgul_mime.h:
This graph shows which files directly or indirectly include this file:

Typedefs

typedef size_t(* cgul_mime__produce_t) (cgul_exception_t *cex, char *block, size_t block_size, void *data)
 
typedef int(* cgul_mime__consume_t) (cgul_exception_t *cex, unsigned long int part, cgul_mime__section_t section, char *block, size_t block_size, void *data)
 

Enumerations

enum  cgul_mime__section_t {
  CGUL_MIME__SECTION_INVALID = 0, CGUL_MIME__SECTION_PREAMBLE, CGUL_MIME__SECTION_HEADER, CGUL_MIME__SECTION_BODY,
  CGUL_MIME__SECTION_EPILOGUE, CGUL_MIME__SECTION_EOF
}
 

Functions

CGUL_EXPORT unsigned long int cgul_mime__parse (cgul_exception_t *cex, const char *boundary, const char *eol, cgul_mime__produce_t produce, void *produce_data, cgul_mime__consume_t consume, void *consume_data)
 
CGUL_EXPORT unsigned long int cgul_mime__parse_memory (cgul_exception_t *cex, char *block, size_t block_size, const char *boundary, const char *eol, cgul_mime__consume_t consume, void *consume_data)
 
CGUL_EXPORT unsigned long int cgul_mime__parse_file (cgul_exception_t *cex, FILE *fin, const char *boundary, const char *eol, cgul_mime__consume_t consume, void *consume_data)
 
CGUL_EXPORT unsigned long int cgul_mime__parse_fname (cgul_exception_t *cex, const char *fname, const char *boundary, const char *eol, cgul_mime__consume_t consume, void *consume_data)
 
CGUL_EXPORT unsigned long int cgul_mime__split_memory (cgul_exception_t *cex, char *block, size_t block_size, const char *dname, const char *pname, const char *hname, const char *bname, const char *ename, const char *boundary, const char *eol)
 
CGUL_EXPORT unsigned long int cgul_mime__split_file (cgul_exception_t *cex, FILE *fin, const char *dname, const char *pname, const char *hname, const char *bname, const char *ename, const char *boundary, const char *eol)
 
CGUL_EXPORT unsigned long int cgul_mime__split_fname (cgul_exception_t *cex, const char *fname, const char *dname, const char *pname, const char *hname, const char *bname, const char *ename, const char *boundary, const char *eol)
 

Detailed Description

Parser for multipart MIME messages. This multipart MIME parser should be sufficient for handling HTTP uploads. It should scale well being able to handle large, multipart MIME messages efficiently.

Author
Paul Serice
See also
cgul_base64

Typedef Documentation

§ cgul_mime__produce_t

typedef size_t(* cgul_mime__produce_t) (cgul_exception_t *cex, char *block, size_t block_size, void *data)

This typedef is the interface for the callback function invoked by cgul_mime__parse() to produce blocks from the original MIME source so that the blocks can be parsed.

Functions that implement this interface should read bytes into the block block that extends for block_size bytes and return the number of bytes read into block. If and only if EOF is reached, this function should return 0. It is permissible to return a short read count greater than zero. If the parser still needs more data, it will just invoke the callback again. If an error occurs, this function should throw an exception.

The value of data is the same as what was initially passed into cgul_mime__parse() when the callback was registered allowing the client to pass arbitrary data to the producer.

Parameters
[in,out]cexc-style exception
[in]blockblock
[in]block_sizeblock size
[in]dataclient data
Returns
number of bytes read into block or zero if EOF

§ cgul_mime__consume_t

typedef int(* cgul_mime__consume_t) (cgul_exception_t *cex, unsigned long int part, cgul_mime__section_t section, char *block, size_t block_size, void *data)

This typedef is the interface for the callback function invoked by the following functions to consume the blocks generated by the parser:

    cgul_mime__parse()
    cgul_mime__parse_file()
    cgul_mime__parse_fname()
    cgul_mime__parse_memory()

Functions that implement this interface will be passed each block block and its size in bytes block_size from the MIME file as it is parsed along with the MIME section type section from which the data originated. The function will also be passed the current part part of the multipart MIME file that is being parsed. The value for part will be 0 for the preamble and the epilogue, 1 for the first part, 2 for the second part, and so on.

The value of data is the same as what was initially passed in when the callback was registered allowing the client to pass arbitrary data to the consumer. If an error occurs, this function should throw an exception.

Parameters
[in,out]cexc-style exception
[in]partMIME part of the multipart message
[in]sectionMIME section where the block originates
[in]blockblock
[in]block_sizeblock size
[in]dataclient data
Returns
whether to continue

Enumeration Type Documentation

§ cgul_mime__section_t

The sections of a multipart MIME file.

Enumerator
CGUL_MIME__SECTION_INVALID 

Invalid section type.

CGUL_MIME__SECTION_PREAMBLE 

Preamble section. The "preamble" is defined in RFC 2046 section 5.1.1 as the part of the file "prior to the first boundary delimiter line." Note that when parsing an RFC 822 e-mail message, the e-mail headers will be part of the preamble because the headers necessarily come before the first boundary delimiter line.

CGUL_MIME__SECTION_HEADER 

Header section. This always indicates a header that immediately follows a boundary delimiter line. MIME allows for empty headers in which case the length of the block will be zero.

CGUL_MIME__SECTION_BODY 

Body section. This always indicates a body that immediately follows a header that immediately follows a boundary delimiter line.

CGUL_MIME__SECTION_EPILOGUE 

Epilogue section. The "epilogue" is defined in RFC 2046 section 5.1.1 as the part of the file "following the final boundary delimiter line."

CGUL_MIME__SECTION_EOF 

End of file. The block associated with EOF is always empty.

Function Documentation

§ cgul_mime__parse()

CGUL_EXPORT unsigned long int cgul_mime__parse ( cgul_exception_t cex,
const char *  boundary,
const char *  eol,
cgul_mime__produce_t  produce,
void *  produce_data,
cgul_mime__consume_t  consume,
void *  consume_data 
)

This is the generic parsing function that works by parsing blocks of data supplied by the produce function. For each block supplied by produce, it invokes the client's callback function consume at least once passing in a block that holds a subset of bytes from the current MIME message, the size of the block, the MIME section type, and the client data consume_data. This function returns the total number of MIME parts. If an error occurs, 0 is returned, and an exception is thrown.

The MIME parts are separated from each other by the MIME boundary boundary. Unfortunately, the boundary separators are required to start with "--", but the "Content-Type" header specifies the boundary without the implicit "--" prefix. To make it possible to directly pass the boundary read from the "Content-Type" header into this function, boundary must not include the implicit "--" prefix; instead, the "--" prefix will automatically be inserted by this function.

According to the MIME standard, all EOL sequences should be "\r\n", but as a practical matter, MIME messages often use the native EOL sequence instead. As a result, the client needs to pass in the correct EOL sequence eol. If eol is NULL this function will attempt to automatically detect the EOL sequence by scanning the first 16K of the file. If it cannot automatically detect the EOL sequence, this function will use "\r\n" as the EOL sequence.

Parameters
[in,out]cexc-style exception
[in]boundaryMIME boundary
[in]eolEOL sequence
[in]producecallback function to produce MIME blocks
[in]produce_dataclient data for produce
[in]consumecallback function to consume parsed blocks
[in]consume_dataclient data for consume
Returns
total number of parts
See also
cgul_mime__parse_memory()
cgul_mime__parse_file()

Referenced by cgul_mime_cxx::parse().

§ cgul_mime__parse_memory()

CGUL_EXPORT unsigned long int cgul_mime__parse_memory ( cgul_exception_t cex,
char *  block,
size_t  block_size,
const char *  boundary,
const char *  eol,
cgul_mime__consume_t  consume,
void *  consume_data 
)

This function is an adapter for cgul_mime__parse() for parsing multipart MIME messages already resident in memory. It registers a special "producer" that allows the parser to directly access the MIME message in memory rather than having to copy it block-by-block into the parser's internal buffer.

This function works by parsing a multipart MIME message that is in memory at block and extends for block_size bytes. The client's callback function consume is invoked once for each MIME section passing in a block that holds all the bytes from the current MIME section, the size of the block, the MIME section type, and the client data consume_data. This function returns the total number of MIME parts. If an error occurs, 0 is returned, and an exception is thrown.

Even though this function passes all the bytes from each MIME section to the consume callback, it may be worthwhile to write consume so that it can also be used with cgul_mime__parse_file() which is similar but generally requires multiple calls to consume in order to pass in the same amount of data.

The MIME parts are separated from each other by the MIME boundary boundary. Unfortunately, the boundary separators are required to start with "--", but the "Content-Type" header specifies the boundary without the implicit "--" prefix. To make it possible to directly pass the boundary read from the "Content-Type" header into this function, boundary must not include the implicit "--" prefix; instead, the "--" prefix will automatically be inserted by this function.

According to the MIME standard, all EOL sequences should be "\r\n", but as a practical matter, MIME messages often use the native EOL sequence instead. As a result, the client needs to pass in the correct EOL sequence eol. If eol is NULL this function will attempt to automatically detect the EOL sequence by scanning the first 16K of the file. If it cannot automatically detect the EOL sequence, this function will use "\r\n" as the EOL sequence.

Parameters
[in,out]cexc-style exception
[in]blockblock
[in]block_sizeblock size
[in]boundaryMIME boundary
[in]eolEOL sequence
[in]consumecallback function to consume parsed blocks
[in]consume_dataclient data for consume
Returns
total number of parts
See also
cgul_mime__split_memory()

Referenced by cgul_mime_cxx::parse_memory().

§ cgul_mime__parse_file()

CGUL_EXPORT unsigned long int cgul_mime__parse_file ( cgul_exception_t cex,
FILE *  fin,
const char *  boundary,
const char *  eol,
cgul_mime__consume_t  consume,
void *  consume_data 
)

This function is an adapter for cgul_mime__parse() for parsing multipart MIME messages from a file. It registers a "producer" that copies the MIME message from file block-by-block into the parser's internal buffer.

This function works by parsing a multipart MIME message in the file fin. For each block read from the file, it invokes the client's callback function consume at least once passing in a block that holds a subset of bytes from the current MIME section, the size of the block, the MIME section type, and the client data consume_data. This function returns the total number of MIME parts. If an error occurs, 0 is returned, and an exception is thrown.

The MIME parts are separated from each other by the MIME boundary boundary. Unfortunately, the boundary separators are required to start with "--", but the "Content-Type" header specifies the boundary without the implicit "--" prefix. To make it possible to directly pass the boundary read from the "Content-Type" header into this function, boundary must not include the implicit "--" prefix; instead, the "--" prefix will automatically be inserted by this function.

According to the MIME standard, all EOL sequences should be "\r\n", but as a practical matter, MIME messages often use the native EOL sequence instead. As a result, the client needs to pass in the correct EOL sequence eol. If eol is NULL this function will attempt to automatically detect the EOL sequence by scanning the first 16K of the file. If it cannot automatically detect the EOL sequence, this function will use "\r\n" as the EOL sequence.

Parameters
[in,out]cexc-style exception
[in]fininput file
[in]boundaryMIME boundary
[in]eolEOL sequence
[in]consumecallback function to consume parsed blocks
[in]consume_dataclient data for consume
Returns
total number of parts
See also
cgul_mime__split_file()

Referenced by cgul_mime_cxx::parse_file().

§ cgul_mime__parse_fname()

CGUL_EXPORT unsigned long int cgul_mime__parse_fname ( cgul_exception_t cex,
const char *  fname,
const char *  boundary,
const char *  eol,
cgul_mime__consume_t  consume,
void *  consume_data 
)

This function is an adapter for cgul_mime__parse() for parsing multipart MIME messages from a file. It registers a "producer" that copies the MIME message from file block-by-block into the parser's internal buffer.

This function works by parsing a multipart MIME message in the file with name fname. For each block read from the file, it invokes the client's callback function consume at least once passing in a block that holds a subset of bytes from the current MIME section, the size of the block, the MIME section type, and the client data consume_data. This function returns the total number of MIME parts. If an error occurs, 0 is returned, and an exception is thrown.

The MIME parts are separated from each other by the MIME boundary boundary. Unfortunately, the boundary separators are required to start with "--", but the "Content-Type" header specifies the boundary without the implicit "--" prefix. To make it possible to directly pass the boundary read from the "Content-Type" header into this function, boundary must not include the implicit "--" prefix; instead, the "--" prefix will automatically be inserted by this function.

According to the MIME standard, all EOL sequences should be "\r\n", but as a practical matter, MIME messages often use the native EOL sequence instead. As a result, the client needs to pass in the correct EOL sequence eol. If eol is NULL this function will attempt to automatically detect the EOL sequence by scanning the first 16K of the file. If it cannot automatically detect the EOL sequence, this function will use "\r\n" as the EOL sequence.

Parameters
[in,out]cexc-style exception
[in]fnamename of the input file
[in]boundaryMIME boundary
[in]eolEOL sequence
[in]consumecallback function to consume parsed blocks
[in]consume_dataclient data for consume
Returns
total number of parts
See also
cgul_mime__split_fname()

Referenced by cgul_mime_cxx::parse_fname().

§ cgul_mime__split_memory()

CGUL_EXPORT unsigned long int cgul_mime__split_memory ( cgul_exception_t cex,
char *  block,
size_t  block_size,
const char *  dname,
const char *  pname,
const char *  hname,
const char *  bname,
const char *  ename,
const char *  boundary,
const char *  eol 
)

This function is an adapter for cgul_mime__parse_memory() that splits a multipart MIME message in memory into multiple files on the host file system.

Split the multipart MIME message in memory starting at block and extending for block_size bytes into preamble, header, body, and epilogue sections. The directory where the files should be written is given by dname. The name of the preamble is given by pname. The names for the header files will have the form hname-lu. The names for the body files will have the form bname-lu where "%lu" will be replaced by the part number (starting with 1). If any of pname, hname, bname, or ename are NULL, those files will not be written. The name of the epilogue file is given by ename. This function returns the total number of MIME parts. If an error occurs, 0 is returned, and an exception is thrown.

The MIME parts are separated from each other by the MIME boundary boundary. Unfortunately, the boundary separators are required to start with "--", but the "Content-Type" header specifies the boundary without the implicit "--" prefix. To make it possible to directly pass the boundary read from the "Content-Type" header into this function, boundary must not include the implicit "--" prefix; instead, the "--" prefix will automatically be inserted by this function.

According to the MIME standard, all EOL sequences should be "\r\n", but as a practical matter, MIME messages often use the native EOL sequence instead. As a result, the client needs to pass in the correct EOL sequence eol. If eol is NULL this function will attempt to automatically detect the EOL sequence by scanning the MIME message. If it cannot automatically detect the EOL sequence, this function will use "\r\n" as the EOL sequence.

Parameters
[in,out]cexc-style exception
[in]blockblock
[in]block_sizeblock size
[in]dnamedirectory name where output files will be created
[in]pnamename for the preamble file
[in]hnamebase name for header files
[in]bnamebase name for body files
[in]enamename for the epilogue file
[in]boundaryMIME boundary
[in]eolEOL sequence
Returns
total number of parts
See also
cgul_mime__parse_memory()

Referenced by cgul_mime_cxx::split_memory().

§ cgul_mime__split_file()

CGUL_EXPORT unsigned long int cgul_mime__split_file ( cgul_exception_t cex,
FILE *  fin,
const char *  dname,
const char *  pname,
const char *  hname,
const char *  bname,
const char *  ename,
const char *  boundary,
const char *  eol 
)

This function is an adapter for cgul_mime__parse_file() that splits a multipart MIME message in a file into multiple files on the host file system.

Split the multipart MIME message in the file fin into preamble, header, body, and epilogue sections. The directory where the files should be written is given by dname. The name of the preamble is given by pname. The names for the header files will have the form hname-lu. The names for the body files will have the form bname-lu where "%lu" will be replaced by the part number (starting with 1). If any of pname, hname, bname, or ename are NULL, those files will not be written. The name of the epilogue file is given by ename. This function returns the total number of MIME parts. If an error occurs, 0 is returned, and an exception is thrown.

The MIME parts are separated from each other by the MIME boundary boundary. Unfortunately, the boundary separators are required to start with "--", but the "Content-Type" header specifies the boundary without the implicit "--" prefix. To make it possible to directly pass the boundary read from the "Content-Type" header into this function, boundary must not include the implicit "--" prefix; instead, the "--" prefix will automatically be inserted by this function.

According to the MIME standard, all EOL sequences should be "\r\n", but as a practical matter, MIME messages often use the native EOL sequence instead. As a result, the client needs to pass in the correct EOL sequence eol. If eol is NULL this function will attempt to automatically detect the EOL sequence by scanning the first 16K of the file. If it cannot automatically detect the EOL sequence, this function will use "\r\n" as the EOL sequence.

Parameters
[in,out]cexc-style exception
[in]finMIME input file
[in]dnamedirectory name where output files will be created
[in]pnamename for the preamble file
[in]hnamebase name for header files
[in]bnamebase name for body files
[in]enamename for the epilogue file
[in]boundaryMIME boundary
[in]eolEOL sequence
Returns
total number of parts
See also
cgul_mime__parse_file()

Referenced by cgul_mime_cxx::split_file().

§ cgul_mime__split_fname()

CGUL_EXPORT unsigned long int cgul_mime__split_fname ( cgul_exception_t cex,
const char *  fname,
const char *  dname,
const char *  pname,
const char *  hname,
const char *  bname,
const char *  ename,
const char *  boundary,
const char *  eol 
)

This function is an adapter for cgul_mime__parse_fname() that splits a multipart MIME message in a file into multiple files on the host file system.

Split the multipart MIME message in the file with name fname into preamble, header, body, and epilogue sections. The directory where the files should be written is given by dname. The name of the preamble is given by pname. The names for the header files will have the form hname-lu. The names for the body files will have the form bname-lu where "%lu" will be replaced by the part number (starting with 1). If any of pname, hname, bname, or ename are NULL, those files will not be written. The name of the epilogue file is given by ename. This function returns the total number of MIME parts. If an error occurs, 0 is returned, and an exception is thrown.

The MIME parts are separated from each other by the MIME boundary boundary. Unfortunately, the boundary separators are required to start with "--", but the "Content-Type" header specifies the boundary without the implicit "--" prefix. To make it possible to directly pass the boundary read from the "Content-Type" header into this function, boundary must not include the implicit "--" prefix; instead, the "--" prefix will automatically be inserted by this function.

According to the MIME standard, all EOL sequences should be "\r\n", but as a practical matter, MIME messages often use the native EOL sequence instead. As a result, the client needs to pass in the correct EOL sequence eol. If eol is NULL this function will attempt to automatically detect the EOL sequence by scanning the first 16K of the file. If it cannot automatically detect the EOL sequence, this function will use "\r\n" as the EOL sequence.

Parameters
[in,out]cexc-style exception
[in]fnamename of MIME input file
[in]dnamedirectory name where output files will be created
[in]pnamename for the preamble file
[in]hnamebase name for header files
[in]bnamebase name for body files
[in]enamename for the epilogue file
[in]boundaryMIME boundary
[in]eolEOL sequence
Returns
total number of parts
See also
cgul_mime__parse_fname()

Referenced by cgul_mime_cxx::split_fname().