Camel.MimeUtils

This is not a separate object, but a collection of MIME and related decoding utilities used extensively internally by Camel and useful for general purpose mail handling.

It is a bit of a mis-mash of everything used internally - from RFC specific encoders and decoders, to c-type like functions for processing mail text, structured object lists to highly optimized content-encoding translators.

Many of these functions are highly optimized and very robust; they represent 5+ years of production mail client testing. Many have obscure and hard-to-use interfaces; they are for internal use, in almost every case a more general interface exists to perform the same task.

This page will just summarize the functions available along with some internal information where appropriate, consult the source and header file camel/camel-mime-utils.c for more information.

Type functions

The type functions are used to detect character classes of individual characters. They are used in both parsing code as part of lexical analysis, and encoding functions which must determine how various fields may be escaped. The assumption is that a simple array lookup will create more maintainable and efficient code than checking each character class separately.

It is driven by a 16-bit array of bit-fields used to determine the types. This array is initialized from static data.

The bits are a combination of RFC given types, like atom or CTRL, and other more code-friendly types like <code>CAMEL_MIME_IS_PSAFE</code> which indicates the character is safe to use in an encoded word in a phrase.

 enum {
        CAMEL_MIME_IS_CTRL      = 1<<0,
        CAMEL_MIME_IS_LWSP      = 1<<1,
        CAMEL_MIME_IS_TSPECIAL  = 1<<2,
        CAMEL_MIME_IS_SPECIAL   = 1<<3,
        CAMEL_MIME_IS_SPACE     = 1<<4,
        CAMEL_MIME_IS_DSPECIAL  = 1<<5,
        CAMEL_MIME_IS_QPSAFE    = 1<<6,
        CAMEL_MIME_IS_ESAFE     = 1<<7,
        CAMEL_MIME_IS_PSAFE     = 1<<8,
        CAMEL_MIME_IS_ATTRCHAR  = 1<<9,
 };

The table itself is generated by a perl script - <tt>gentables.pl</tt> which is driven by the Makefile; any changes to the tables must be made there.

The actual type lookup is normally performed by a set of macros, but the above bit definitions can be used to create custom tests efficiently.

Again, these map directly to the RFC character classes, where they exist.

 #define camel_mime_is_ctrl(x) ((camel_mime_special_table[(unsigned char)(x)] & CAMEL_MIME_IS_CTRL) != 0)
 #define camel_mime_is_lwsp(x) ((camel_mime_special_table[(unsigned char)(x)] & CAMEL_MIME_IS_LWSP) != 0)
 #define camel_mime_is_tspecial(x) ((camel_mime_special_table[(unsigned char)(x)] & CAMEL_MIME_IS_TSPECIAL) != 0)
 #define camel_mime_is_ttoken(x) ((camel_mime_special_table[(unsigned char)(x)] & (CAMEL_MIME_IS_TSPECIAL|CAMEL_MIME_IS_LWSP|CAMEL_MIME_IS_CTRL)) == 0)
 #define camel_mime_is_atom(x) ((camel_mime_special_table[(unsigned char)(x)] & (CAMEL_MIME_IS_SPECIAL|CAMEL_MIME_IS_SPACE|CAMEL_MIME_IS_CTRL)) == 0)
 #define camel_mime_is_dtext(x) ((camel_mime_special_table[(unsigned char)(x)] & CAMEL_MIME_IS_DSPECIAL) == 0)
 #define camel_mime_is_fieldname(x) ((camel_mime_special_table[(unsigned char)(x)] & (CAMEL_MIME_IS_CTRL|CAMEL_MIME_IS_SPACE)) == 0)
 #define camel_mime_is_qpsafe(x) ((camel_mime_special_table[(unsigned char)(x)] & CAMEL_MIME_IS_QPSAFE) != 0)
 #define camel_mime_is_especial(x) ((camel_mime_special_table[(unsigned char)(x)] & CAMEL_MIME_IS_ESPECIAL) != 0)
 #define camel_mime_is_psafe(x) ((camel_mime_special_table[(unsigned char)(x)] & CAMEL_MIME_IS_PSAFE) != 0)
 #define camel_mime_is_attrchar(x) ((camel_mime_special_table[(unsigned char)(x)] & CAMEL_MIME_IS_ATTRCHAR) != 0)

is_type can be used to check a custom bitmask.

 #define camel_mime_is_type(x, t) ((camel_mime_special_table[(unsigned char)(x)] & (t)) != 0)

Tokenisers and Encoders

A number of token parsing and creation functions exist. Most of them are internal but some are exposed to client code.

The internal functions for parsing token types generally take binary data - that is, a pointer and a length. In addition, the pointer is passed by reference, so that the parsing function will parse that token, and advance the pointer if it found one.

The external tokenising functions just work on c-string.

The names of these functions could use some work, they are pretty inconsistent, again due to differnet contributors.

 char *camel_header_token_decode(const char *in);
 int camel_header_decode_int(const char **in);

For "string" headers, RFC2047 strings.

 char *camel_header_decode_string(const char *in, const char *default_charset);
 char *camel_header_encode_string(const unsigned char *in);

For strings which may include comments, which are stripped. Commonly used to parse otherwise structured headers in a non-structured way purely for display purposes; which is why it is called, rather incorrectly, format_ctext.

 char *camel_header_format_ctext(const char *in, const char *default_charset);

Encode a phrase part, e.g. the real-name part of an email address.

 char *camel_header_encode_phrase(const unsigned char *in);

Content-Transfer encoding functions

These functions are the lowest-level base64, quoted-printable, and uuencode encoder and decoder functions used in Camel. They were supposed to all have the same form and function, however inconsistencies have crept in due to lazy programmers trying to extend the functionality in unecessary ways.

But basically, there is an encode_step and encode_close function, which maintains any inter-block state using a pair of integer pointers. On a stream of data encode_step is called for each block and then encode_close is called on the last block of data.

Then there is a decode_step function which works similarly, but has no matching decode_close function - any incompletely encoded data is silently dropped.

Note that you would normally use Camel.MimeFilter#Camel.MimeFilterBasic instead of these low-level interfaces. But these all have API documentation which should be consulted for the details if required.

 size_t camel_base64_decode_step(unsigned char *in, size_t len, unsigned char *out, int *state, unsigned int *save);
 size_t camel_base64_encode_step(unsigned char *in, size_t len, gboolean break_lines, unsigned char *out, int *state, int *save);
 size_t camel_base64_encode_close(unsigned char *in, size_t len, gboolean break_lines, unsigned char *out, int *state, int *save);
 
 size_t camel_uudecode_step(unsigned char *in, size_t len, unsigned char *out, int *state, guint32 *save);
 size_t camel_uuencode_step(unsigned char *in, size_t len, unsigned char *out, unsigned char *uubuf, int *state, guint32 *save);
 size_t camel_uuencode_close(unsigned char *in, size_t len, unsigned char *out, unsigned char *uubuf, int *state, guint32 *save);
 
 size_t camel_quoted_decode_step(unsigned char *in, size_t len, unsigned char *out, int *savestate, int *saveme);
 size_t camel_quoted_encode_step(unsigned char *in, size_t len, unsigned char *out, int *state, int *save);
 size_t camel_quoted_encode_close(unsigned char *in, size_t len, unsigned char *out, int *state, int *save);

And because it is used to often, there are one-line helpers for base64 strings.

 char *camel_base64_encode_simple (const char *data, size_t len);
 size_t camel_base64_decode_simple (char *data, size_t len);

Raw Headers

A suite of functions exist for maintaining lists of headers. These are used by Camel.MimeParser and the Camel.DataWrapper implementations to maintain their internal header lists, in some cases exposed to client code.

 void camel_header_raw_append(struct _camel_header_raw **list, const char *name, const char *value, int offset);
 void camel_header_raw_append_parse(struct _camel_header_raw **list, const char *header, int offset);
 const char *camel_header_raw_find(struct _camel_header_raw **list, const char *name, int *offset);
 const char *camel_header_raw_find_next(struct _camel_header_raw **list, const char *name, int *offset, const char *last);
 void camel_header_raw_replace(struct _camel_header_raw **list, const char *name, const char *value, int offset);
 void camel_header_raw_remove(struct _camel_header_raw **list, const char *name);
 void camel_header_raw_fold(struct _camel_header_raw **list);
 void camel_header_raw_clear(struct _camel_header_raw **list);

This is a special function which scans a set of headers for mailing list markers - used to create the <code>mlist</code> token used for vFolder searches.

 char *camel_header_raw_check_mailing_list (struct _camel_header_raw **list);

Structured Headers

All the functions to decode, and perhaps encode, each of the structured headers in email and news messages are also defined in this module.

Simple Headers

A bunch of headers just consist of some content with some optional comments.

MIME:

 void camel_header_mime_decode(const char *in, int *maj, int *min);

Content-Location:

 char *camel_header_location_decode(const char *in);

Content-Id:

 char *camel_header_contentid_decode(const char *in);

Message-Id:

 char *camel_header_msgid_decode(const char *in);
 char *camel_header_msgid_generate(void);

Content-Transfer-Encoding, the first functions should really be encode/decode. The camel_content_transfer_encoding_decode function just parses an atom. Perhaps it should be removed if it has no reason to exist apart from that?

Apart from the basic standardised encodings, x-uuencode is also supported for convenience.

 typedef enum _CamelTransferEncoding {
        CAMEL_TRANSFER_ENCODING_DEFAULT,
        CAMEL_TRANSFER_ENCODING_7BIT,
        CAMEL_TRANSFER_ENCODING_8BIT,
        CAMEL_TRANSFER_ENCODING_BASE64,
        CAMEL_TRANSFER_ENCODING_QUOTEDPRINTABLE,
        CAMEL_TRANSFER_ENCODING_BINARY,
        CAMEL_TRANSFER_ENCODING_UUENCODE,
        CAMEL_TRANSFER_NUM_ENCODINGS
 } CamelTransferEncoding;
 
 const char *camel_transfer_encoding_to_string(CamelTransferEncoding encoding);
 CamelTransferEncoding camel_transfer_encoding_from_string(const char *string);
 
 char *camel_content_transfer_encoding_decode(const char *in);

Date headers

Dates are stored in RFC822 format - although try telling some shareware authors that. The decode_date will try to work around some of the more common date formatting bugs out in the wild, including using a ctime(3) string, but it is pointless and wasteful for it to attempt to decode everything.

Internally, dates are used as a GMT relative time + an offset. The offset is a sort of binary coded decimal. That is, 930 means the timezone +0930, not 930 anything. This is so dates can be reliably compared (using the GMT format), and reliably reconstructed for locale and timezone-specific display.

 time_t camel_header_decode_date(const char *in, int *saveoffset);
 char *camel_header_format_date(time_t time, int offset);

Address Headers

A comprehensive set of utility functions for parsing, encoding, and formatting RFC822 address lists also exists. This interface maintains the group syntax which Camel.Address#Camel.InternetAddress does not.

 typedef enum _camel_header_address_t {
        CAMEL_HEADER_ADDRESS_NONE,
        CAMEL_HEADER_ADDRESS_NAME,
        CAMEL_HEADER_ADDRESS_GROUP
 } camel_header_address_t;
 
 struct _camel_header_address {
        struct _camel_header_address *next;
        camel_header_address_t type;
        char *name;
        union {
                char *addr;
                struct _camel_header_address *members;
        } v;
        unsigned int refcount;
 };

These functions are used to create lists of address, including groups.

 struct _camel_header_address *camel_header_address_new(void);
 struct _camel_header_address *camel_header_address_new_name(const char *name, const char *addr);
 struct _camel_header_address *camel_header_address_new_group(const char *name);
 void camel_header_address_ref(struct _camel_header_address *addrlist);
 void camel_header_address_unref(struct _camel_header_address *addrlist);
 void camel_header_address_set_name(struct _camel_header_address *addrlist, const char *name);
 void camel_header_address_set_addr(struct _camel_header_address *addrlist, const char *addr);
 void camel_header_address_set_members(struct _camel_header_address *addrlist, struct _camel_header_address *group);
 void camel_header_address_add_member(struct _camel_header_address *addrlist, struct _camel_header_address *member);
 void camel_header_address_list_append_list(struct _camel_header_address **addrlistp, struct _camel_header_address **addrs);
 void camel_header_address_list_append(struct _camel_header_address **addrlistp, struct _camel_header_address *addr);
 void camel_header_address_list_clear(struct _camel_header_address **addrlistp);

And converting those lists to text form.

 char *camel_header_address_list_encode(struct _camel_header_address *addrlist);
 char *camel_header_address_list_format(struct _camel_header_address *addrlist);

These are the primary entry points, for parsing raw address headers. charset provides a character set override for decoding incorrectly encoded headers.

 struct _camel_header_address *camel_header_address_decode(const char *in, const char *charset);
 struct _camel_header_address *camel_header_mailbox_decode(const char *in, const char *charset);

Header parameters

Because some structured headers consist of some fixed, header-specific portion, followed by a list of optional parameters, functions for processing parameters are available separately.

These conform to RFC2184.

Typed lists are used to make them easier to manage than using GLists, etc.

 struct _camel_header_param {
        struct _camel_header_param *next;
        char *name;
        char *value;
 };

 struct _camel_header_param *camel_header_param_list_decode(const char *in);
 char *camel_header_param(struct _camel_header_param *params, const char *name);
 struct _camel_header_param *camel_header_set_param(struct _camel_header_param **paramsp, const char *name, const char *value);
 void camel_header_param_list_format_append(GString *out, struct _camel_header_param *params);
 char *camel_header_param_list_format(struct _camel_header_param *params);
 void camel_header_param_list_free(struct _camel_header_param *params);

Content-Type

Content-Type is complex enough to have it's own object which has accessors to the type, subtype, and parameters.

 typedef struct {
        char *type;
        char *subtype;
        struct _camel_header_param *params;
        unsigned int refcount;
 } CamelContentType;
 
 CamelContentType *camel_content_type_new(const char *type, const char *subtype);
 CamelContentType *camel_content_type_decode(const char *in);
 void camel_content_type_unref(CamelContentType *content_type);
 void camel_content_type_ref(CamelContentType *content_type);
 const char *camel_content_type_param(CamelContentType *content_type, const char *name);
 void camel_content_type_set_param(CamelContentType *content_type, const char *name, const char *value);

Always use this function to check for content-type matches. It will perform the tests properly case-insensitive, handle missing type and subtypes in the header, and you can pass "*" to subtype to match only against the major type.

 int camel_content_type_is(CamelContentType *content_type, const char *type, const char *subtype);

This should really be called <code>encode</code> to be consistent with other functions.

 char *camel_content_type_format(CamelContentType *content_type);
 char *camel_content_type_simple(CamelContentType *content_type);

And even a debugging function to printing out the parsed structure.

 void camel_content_type_dump(CamelContentType *content_type);

Content-Disposition

This is similar to the Content-Type interface, although it is lacking the accessor functions; use the camel_header_param functions to access the parameters instead.

 typedef struct _CamelContentDisposition {
        char *disposition;
        struct _camel_header_param *params;
        unsigned int refcount;
 } CamelContentDisposition;
 
 CamelContentDisposition *camel_content_disposition_decode(const char *in);
 void camel_content_disposition_ref(CamelContentDisposition *disposition);
 void camel_content_disposition_unref(CamelContentDisposition *disposition);

Again, this should be encode.

 char *camel_content_disposition_format(CamelContentDisposition *disposition);

References headers

In-Reply-To, and the News References headers also get special treatment. The id will be the Message-Id with any < or > or quotes stripped.

 struct _camel_header_references {
        struct _camel_header_references *next;
        char *id;
 };
 
 struct _camel_header_references *camel_header_references_inreplyto_decode(const char *in);
 struct _camel_header_references *camel_header_references_decode(const char *in);
 void camel_header_references_list_clear(struct _camel_header_references **list);

This strange interface steals ref; to avoid pointless strdup calls.

 void camel_header_references_list_append_asis(struct _camel_header_references **list, char *ref);

This should be length.

 int camel_header_references_list_size(struct _camel_header_references **list);

This should probably be clone.

 struct _camel_header_references *camel_header_references_dup (const struct _camel_header_references *list);

Newsgroups

The Newsgroups header again has its own typed functions, although only a simplified API exists; Camel.Address#Camel.NNTPAddress is more complete.

 struct _camel_header_newsgroup {
        struct _camel_header_newsgroup *next;
        char *newsgroup;
 };
 
 struct _camel_header_newsgroup *camel_header_newsgroups_decode(const char *in);
 void camel_header_newsgroups_free(struct _camel_header_newsgroup *ng);

Utility functions

And finally a few utility functions which don't fall into other categories.

Header folding

Normally headers are unfolded automatically by Camel.MimeParser, but in some cases you might need to unfold headers directly. A special version exists for address line folding as well.

 char *camel_header_address_fold(const char *in, size_t headerlen);
 char *camel_header_fold(const char *in, size_t headerlen);
 char *camel_header_unfold(const char *in);

In almost all cases, what Camel does with folding is pretty awful. Unfortunately, to get around a completely unrelated display-issue bug, the code which retained folded headers was removed from the mime parser; the legacy of that UI related change is that Camel should store all headers in the 'raw' form, in a completely unfolded state. This is bad. Altering the headers in this way is not ideal, raw headers should truly be raw.

Some attempts have been made to fix this but the code is a little touchy; so it remains to this day.

Since folding is required for SMTP transport, which has a maximum line length of 998 characters, the above functions exist for manually folding lines; however they do not (properly) take into account structured headers, so are quite poor implementations.

Ideally, raw headers should be stored folded. Any functions which create raw header text (i.e. the encode functions) should create folded headers, and so on.

Apps/Evolution/Camel.MimeUtils (last edited 2013-08-08 22:50:06 by WilliamJonMcCann)