MIME filters are like cheese.

Actually they're nothing like cheese. Maybe a cheese grater.

Ok, let's start again. MIME filters are stream processing modules. They can be used individually to process data blocks, attached to streams using Camel.Stream#Camel.StreamFilter, or attached to Camel.MimeParser for pipelined decoding. The base class handles various memory related issues, and each implementation only needs to implement three simple virtual methods.

There are quite a number of built-in camel stream processing modules, and it is easy to add more if your application requires them.

Base class

There are 3 client interfaces of importance, which map to the virtual functions, but not quite directly:

 void camel_mime_filter_filter(CamelMimeFilter *filter,
                              char *in, size_t len, size_t prespace,
                              char **out, size_t *outlen, size_t *outprespace);
 void camel_mime_filter_complete(CamelMimeFilter *filter,
                                char *in, size_t len, size_t prespace,
                                char **out, size_t *outlen, size_t *outprespace);
 void camel_mime_filter_reset(CamelMimeFilter *filter);

The first is used to filter chunks of a stream, and the second is used for the last chunk of the stream, to indicate that the input is finished. After this is called any pending data must also be 'flushed' to the output buffer. <code>camel_mime_filter_reset</code> is used if the filter is to be re-used for another data stream, flushing any queued data.

The arguments require some explanation, and at first, they do not make obvious sense:


  • A pointer to the start of the input buffer. Note that this is not a const, the buffer must be writable, and may be altered by the call.


  • How much data is present in the input buffer.


  • How much data is available for writing at the start of the input buffer. i.e. ideally there should be some number of bytes beyond the start of the buffer, available for use by the filter stage.


  • After filtering, a pointer to a buffer containing the converted data. It may end up pointing to part of the input buffer.


  • How much data was created by the filter.


  • How much space is available beyond the start of the output buffer.

The reason for the prespace arguments is so that limited backup of input can be achieved efficiently. For example, if you are looking for the 6 characters "\nFrom " in the input stream, but only have the 3 characters "\nFr" available before the end of the buffer, you need to back up those 3 characters for the next pass. By having 3 characters available as pre-space, the base class camel_mime_filter_filter function will be able to simply copy 3 characters there, rather than having to copy the whole buffer to a new one. Lets just say that with a little bit of care it is easy to add efficient multistage pipelined processing without requiring redundant data copying, and the filter stages don't need to worry about keeping track of where they were.

Note that if the filter is guaranteed to at most keep the data the same size, then it can just re-use the in pointer for the out pointer, and modify the data 'in-place'. For example, a quoted-printable decoder could just decode the data in place. Whereas a quoted-printable encoder would potentially need extra space to decode into.

To aid this process, there are a couple of helper functions.

 void camel_mime_filter_backup(CamelMimeFilter *filter, const char *data, size_t length);

This indicates that <code>length</code> bytes of information at <code>data</code> must be kept, and re-attached to the start of the buffer at the next pass. This can be used inside of a <code>filter</code> method implementation to save the data.

 void camel_mime_filter_set_size(CamelMimeFilter *filter, size_t size, int keep);

If the buffer needs to copy the content of the input buffer to another buffer, then it should use this function to specify how much space is needs in the output buffer. This will just automatically allocate and/or resize the internal buffer available using the outbuf member. The outpre, and outsize members will also be adjust accordingly. If keep is TRUE, then any data already in the buffer will be maintained, and the various pointers updated accordingly; only keep the data if you're growing a partially processed buffer, otherwise if you calculate the maximum potential data required before you start any processing, you don't have anything you need to keep (for efficiency).

For example, a quoted-printable encoder may at most increase the data size by 3, the encoder could therefore pre-allocate 3xdata size always, and then it doesn't need to perform any size checking during it's processing - leading to a significantly more efficient inner loop.

Using MimeFilters

Filters can be used in a variety of ways, directly, for streams or buffers of data, or attached as processing elements to other objects.

Procssing individual blocks

To process individual blocks of data or strings, then <code>camel_mime_filter_complete</code> just needs to be called on the block.

Example: base64 a simple string

Converting a simple buffer is quite simple, you just call the complete function on the buffer.

 char *base64string(const char *in)
        CamelMimeFilter *base64;
        char *buffer, *out;
        size_t len = strlen(in), outpre;
        base64 = camel_mime_filter_basic_new(CAMEL_MIME_FILTER_BASIC_BASE64_ENC);
        buffer = alloca(len+1);
        strcpy(buyffer,  in);
        camel_mime_filter_complete(base64, buffer, len, 0, &buffer, &len, &outpre);
        out = g_malloc(len+1);
        out[len] = 0;
        memcpy(out, buffer, len);
        return out;

Obviously, you would not normally use it in this way, since you cannot know if the conversion function needs a writable string or not. Thats why there are often utility functions to perform basic operations on strings, that know if the conversion function needs a writable string or not, and can avoid the redundant copy.

The real power comes in stream processing.

Processing Data Streams

To process a data stream directly, you go through the following steps:

# Create the filter object (or objects) # Read a block of your data # Pass it to <code>camel_mime_filter_filter</code> (for each object in turn) #* Output from each pass is available after each call # If at end of file, call <code>camel_mime_filter_complete</code> with the last block of data # Clean up

Example: gzip a file using posix i/o

This will gzip in to out at compression level 9. Note that error checking isn't present for clarity.

 void gzipfile(const char *in, const char *out)
        int fdin, fdout;
        char buffer[4096+64];
        char *out;
        size_t outlen, outpre;
        ssize_t len;
        CamelMimeFilter *gzip;
        gzip = camel_mime_filter_gzip_new(CAMEL_MIME_FILTER_GZIP_MODE_ZIP, 9);
        fdin = open(in, O_RDONLY);
        fdout = open(out, O_WRONLY|O_CREAT|O_TRUNC, 0666);
        while ((len = read(fdin, buffer+64, 4096)) > 0) {
                camel_mime_filter_filter(gzip, buffer+64, (size_t)len, 64, &out, &outlen, &outpre);
                write(fdout, out, outlen);
        camel_mime_filter_complete(gzip, buffer+64, 0, 64, &out, &outlen, &outpre);
        write(fdout, out, outlen);

Example: convert character set and gzip a file

Now to a bigger example, this one shows how multiple filters can be cascaded very easily. This example takes in converts the character set from ISO-8859-1 to UTF-8, compresses it using gzip, and then writes the result to out.

 void gzipconvertfile(const char *in, const char *out)
        int fdin, fdout;
        char buffer[4096+64];
        char *out;
        size_t outlen, outpre;
        ssize_t len;
        CamelMimeFilter *gzip, *charset;
        charset = camel_mime_filter_charset_new("iso-8859-1", "utf-8"); 
        gzip = camel_mime_filter_gzip_new(CAMEL_MIME_FILTER_GZIP_MODE_ZIP, 9);
        fdin = open(in, O_RDONLY);
        fdout = open(out, O_WRONLY|O_CREAT|O_TRUNC, 0666);
        while ((len = read(fdin, buffer+64, 4096)) > 0) {
                camel_mime_filter_filter(charset, buffer+64, (size_t)len, 64, &out, &outlen, &outpre);
                camel_mime_filter_filter(gzip, out, outlen, outpre, &out, &outlen, &outpre);
                write(fdout, out, outlen);
        camel_mime_filter_complete(charset, buffer+64, 0, 64, &out, &outlen, &outpre);
        camel_mime_filter_complete(gzip, out, outlen, outpre, &out, &outlen, &outpre);
        write(fdout, out, outlen);

Processing Camel Streams

Because of the fiddly details of error checking, and how common this is, there is a stream processing stream which can greatly simplify common use of these stream processors: Camel.Stream#Camel.StreamFilter.

To use this, you create an output stream, then create a filter stream and add whatever processing elements you want, and then just write to it. It also works in the read direction.

Example: gzip a file using Camel streams

The logic is much simpler, even though not many lines of code are saved.

 void gzipfileCamel(const char *in, const char *out)
        CamelStream *outstream, *instream;
        CamelStreamFilter *filter;
        CamelMimeFilter *gzip;
        instream = camel_stream_fs_new_with_name(in, O_RDONLY, 0);
        outstream = camel_stream_fs_new_with_name(out, O_WRONLY|O_CREAT|O_TRUNC, 0666);
        gzip = camel_mime_filter_gzip_new(CAMEL_MIME_FILTER_GZIP_MODE_ZIP, 9);
        filter = camel_stream_filter_new_with_stream(outstream);
        camel_stream_filter_add(filter, gzip);
        camel_stream_write_to_stream(instream, filter);

Example: convert character set and gzip a file

And for completeness, here is the two-stage pipeline conversion example:

 void gzipconvertfileCamel(const char *in, const char *out)
        CamelStream *outstream, *instream;
        CamelStreamFilter *filter;
        CamelMimeFilter *gzip, *charset;
        instream = camel_stream_fs_new_with_name(in, O_RDONLY, 0);
        outstream = camel_stream_fs_new_with_name(out, O_WRONLY|O_CREAT|O_TRUNC, 0666);
        gzip = camel_mime_filter_gzip_new(CAMEL_MIME_FILTER_GZIP_MODE_ZIP, 9);
        charset = camel_mime_filter_charset_new("iso-8859-1", "utf-8"); 
        filter = camel_stream_filter_new_with_stream(outstream);
        camel_stream_filter_add(filter, gzip);
        camel_stream_filter_add(filter, charset);
        camel_stream_write_to_stream(instream, filter);

Processing MIMEParser data

This is pretty straightforward and works similarly to the CamelStreams example above.

See Camel.MimeParser for more information.

Stream processing Filters

That's really about it for the base class, on to describe which filters are available, and there are quite a few, some more re-usable than others.


This is a filter which implements a few of the basic encodings used in MIME transport, plus a few other related ones. It does encoding and decoding of Base64, Quoted-Printable, and UUEncoded data.

 typedef enum {
 } CamelMimeFilterBasicType;
 CamelMimeFilterBasic *camel_mime_filter_basic_new_type(CamelMimeFilterBasicType type);

There's not much that needs to be said about this, it is pretty well as advertised.


This is an end-of-line canonicalisation filter, it has a few options.

 enum {
        CAMEL_MIME_FILTER_CANON_CRLF = (1<<0), /* canoncialise end of line to crlf, otherwise canonicalise to lf only */
        CAMEL_MIME_FILTER_CANON_FROM = (1<<1), /* escape "^From " using quoted-printable semantics into "=46rom " */
        CAMEL_MIME_FILTER_CANON_STRIP = (1<<2), /* strip trailing space */
 CamelMimeFilter *camel_mime_filter_canon_new(guint32 flags);


  • If set, then all line endings are canonicalised to \r\n. If not set, then they are all canonicalised to \n. Unix \n, Macintosh \r, and network \r\n line endings are all considered as non-canonical line endings.


  • Any "^From " lines present are escaped in quoted-printable format. i.e. "From " becomes "-46from ".


  • Any trailing white-space is stripped from lines.

Of course, any combination of the above flags may be supplied at the same time.


This is a character set conversion filter, which uses iconv(3) to convert a stream of data in one character set into another character set. It tries to recover as best it can from invalid input; it tries to drop unconvertible characters, to ensure the output data is always in a valid format. Note that whether it is successful at this depends highly on the systems underling iconv() implementation.

 CamelMimeFilterCharset *camel_mime_filter_charset_new_convert(const char *from_charset, const char *to_charset);


This is another line-ending canonicalisation filter, that should probably have its extra functionality merged into the #Camel.MimeFilterCanon filter. But it has one extra feature, the ability to encode lines suitable for SMTP and NNTP transmission. i.e. escaping leading '.' characters with an extra '.'.

 typedef enum {
 } CamelMimeFilterCRLFDirection;
 typedef enum {
 } CamelMimeFilterCRLFMode;
 CamelMimeFilter *camel_mime_filter_crlf_new(CamelMimeFilterCRLFDirection direction, CamelMimeFilterCRLFMode mode);


  • Will ensure the output contains line endings using \r\n always. It only detects network \r\n and unix \n line endings.


  • Converts any instance of \r\n into \n. Individual \r's are copied directly to the output.


  • Escape any '\n.' into '\n..'. This is used for at least SMTP data when sending a message, since '\r\n.\r\n' is used to finish the message content.


  • Uh, don't do the above.


This filter will convert the rather rare and obsolete text/enriched or text/richtext into text/html, for display purposes. This is merely here because many of the mail examples in the early rfc's uses it as an example. Nobody uses it anymore.

 CamelMimeFilter *camel_mime_filter_enriched_new(guint32 flags);

CAMEL_MIME_FILTER_ENRICHED_IS_RICHTEXT alters the processing slightly to handle the text/enriched format.

And because this is so often used(!), there is a helper function to operate on a single string at a time:

 char *camel_enriched_to_html(const char *in, guint32 flags);


This is a one-way filter for performing Berkeley Mailbox "From" line munging. In that process, lines begging with "From " are converted to ">From ".

 CamelMimeFilterFrom *camel_mime_filter_from_new(void);

Note that this is not the same as some variants which exist these days, some of which are reversible. i.e. they are escaping and not munging. Note that there is no standard defined for any of this, as such, this just does the lowest-common demoninator version, although a good argument can be made for implementing a reversing process.

This is used by any code which needs to create a Berkeley Mailbox file - e.g. the mbox backend, or the "Save as" function.


As it's name implies, this is a compression filter. It is doubtful this is used anywhere, it was written as an example and an exercise of writing such a filter.

 typedef enum {
 } CamelMimeFilterGZipMode;
 CamelMimeFilter *camel_mime_filter_gzip_new(CamelMimeFilterGZipMode mode, int level);

The API should be pretty self-explanatory.


This could more correctly be called HTMLStrip. This is a simple filter strips tags from HTML content and converts it to text, decoding any entities. It doesn't do any fancy formatting and may not do all entities, but is used as part of the body-content indexing code.

 CamelMimeFilterHTML *camel_mime_filter_html_new(void);


This filter will perform word-aligned line-wrapping of the intput.

 CamelMimeFilter *camel_mime_filter_linewrap_new (guint preferred_len, guint max_len, char indent_char);


  • If a word-alined block is longer than this, than insert a \n into the stream and continue.


  • If the word is longer than max_len, then insert a \n followed by <code>indent_char</code> and then continue.


  • The character to insert for lines exceeding max_len.

I am not sure exactly how re-usable this is.


This is a specialised filter which strips the non-content information from an OpenPGP (aka 'inline pgp') signed message.

 CamelMimeFilter *camel_mime_filter_pgp_new(void);


This is a pretty comprehensive filter used to convert plain text into HTML. The input text must be UTF8 format.

It has numerous options.

 #define CAMEL_MIME_FILTER_TOHTML_PRE               (1 << 0)
 #define CAMEL_MIME_FILTER_TOHTML_CONVERT_NL        (1 << 1)
 #define CAMEL_MIME_FILTER_TOHTML_ESCAPE_8BIT       (1 << 6)
 #define CAMEL_MIME_FILTER_TOHTML_CITE              (1 << 7)
 CamelMimeFilter *camel_mime_filter_tohtml_new (guint32 flags, guint32 colour);


  • The output should be in a <PRE> block.


  • Any newlines (\n) in the input are converted to <br>


  • Any spaces or tabs in the input are converted to &nbsp;


  • Detectable urls present are converted to <a HREF=...> references; Current url's detected are listed in the source file. Certain url's are slightly mis-handled, and 8 bit characters in url names are not properly converted.


  • Any lines begging with ">[> ]*" are assumed to be quoted messages, and are highlighted in a different colour (as supplied by the <code>colour</code> argument to the <code>new</code> method).


  • Email addresses are converted to mailto: links.


  • Any unknown 8-bit characters are converted to '?' characters.


  • All lines are preceeded with '> ' characters. This option is mutually exclusive with CAMEL_MIME_FILTER_TO_HTML_MARK_CITATION.


  • Isn't implemented, unknown origin.


  • Isn't implemented either.

Because this is actually used quite often, there is also a helper function used to convert a single string at a time:

 char *camel_text_to_html(const char *in, guint32 flags, guint32 colour);


"y" encoding is a particularly brain-dead, completely non-standard, badly designed transport encoding sometimes used by news readers. It's purpose is a pretty awful attempt to re-invent the 'binary' encoding type, with less sense involved.

 typedef enum {
 } CamelMimeFilterYencDirection;
 CamelMimeFilter *camel_mime_filter_yenc_new(CamelMimeFilterYencDirection direction);

Don't ever use it to encode data. It was only ever implemented so completely as an exercise.

Stream calculation Filters

A second class of filters are ones which do not alter the data, but perform some sort of processing on the data which passes through them. Most of these are used by creating them, adding them to a stream processor, then passing the stream through them completely. Afterwards, the results are read from the object, and then the whole lot discarded.


This filter can be used to evaluate what is the best type of encoding to use for a given stream. It has a few specialized options specifically for processing mail intended for storage in Berkeley Mailbox format, or SMTP transport.

 typedef enum _CamelBestencRequired {
        CAMEL_BESTENC_LF_IS_CRLF = 1<<8,
        CAMEL_BESTENC_NO_FROM = 1<<9,
 } CamelBestencRequired;
 CamelMimeFilterBestenc *camel_mime_filter_bestenc_new(unsigned int flags);
 void camel_mime_filter_bestenc_set_flags(CamelMimeFilterBestenc *filter, unsigned int flags);

set_flags can be used to re-use the filter object if it has been reset, and another stream with different options is required.


  • The processor should calculate the ideal type of Content-Transfer-Encoding for this data stream. i.e. 'binary', 'base64', 'quoted-printable', etc.


  • The processor should try to calculate the best character set for this stream of data. This is a more complicated task, but it works generally ok for most 8 bit character sets. For most others, or for mixed characters, it will fall back to UTF-8.


  • If set, then line feed (\n) should be treated the same as a canonical crlf (\r\n). Basically, if this isn't set, then the data must be in network format, i.e. CRLF terminated. If any stray LF's are detected, then it will force some form of encoding to ensure they are properly transmitted to the receiving end.


  • If set, then any lines of the form "From ", will enforce at least base-64 encoding, to ensure no From lines are ever present. This is generally not used, but the [[#Camel.MimeFilterCanon]] filter is used instead, which uses a cleaner quoted-printable encoding for "From " lines. "^From " lines are used to separate messages in Berkeley Mailbox format messages, which is why this feature is required.

Once all of the data has passed through the filter, the following functions can be used to query the results:

 typedef enum _CamelBestencEncoding {
        CAMEL_BESTENC_TEXT = 1<<8,
 } CamelBestencEncoding;

 CamelTransferEncoding camel_mime_filter_bestenc_get_best_encoding(CamelMimeFilterBestenc *filter, CamelBestencEncoding required);
 const char *camel_mime_filter_bestenc_get_best_charset(CamelMimeFilterBestenc *filter);

These will only make sense if the appropriate flag was passed to the new method. get_best_charset is obvious, NULL indicates US-ASCII, otherwise some RFC2046-appropriate language code will be returned.

get_best_encoding takes more optional values of what type of encoding is required for the output. It will then calculate the minimum encoding required to satisfy both the type of data present, and the required maximum level of output.


  • A fully 7-bit comptabile encoding is required. e.g. normal SMTP. The result may be anything from '7bit' to 'base64'.


  • 8-bit encoding is available, which is fully 8bit data, with no NUL characters present. e.g. SMTP with 8 bit available.


  • Binary encoding is available, which is 8bit data with NUL characters allowed. It will fall back to 8bit if no NUL characters are present.


  • An optional flag which applies to the previous types. If set, then the data is intended for use as textual content. Internally this means it will prefer to use quoted-printable rather than base64, where quoted-printable will not expand the data appreciably bigger than base64 will. For non-textual data, it will prefer base64, if that is appropriate. This is because some quoted-printable decoders are a bit relaxed about some of the end of line rules, and may corrupt binary data.


This filter will take the input characters, adding them to a CamelIndex full content indexer. The indexer will split the content into words as appropriate and add it to the index.

 CamelMimeFilterIndex      *camel_mime_filter_index_new_index(struct _CamelIndex *index);
 void camel_mime_filter_index_set_name (CamelMimeFilterIndex *filter, struct _CamelIndexName *name);
 void camel_mime_filter_index_set_index (CamelMimeFilterIndex *filter, struct _CamelIndex *index);

The name or index may be set to NULL to cause no operation. If both are set, then the words in the stream are indexed under name, which inside camel will be the uid of the message.


This filter will save any content passing through it to a new stream. This may be used to save partial results in a filter pipeline, or just as another way to write to a stream.

 CamelMimeFilter *camel_mime_filter_save_new_with_stream(CamelStream *stream);


Certain Windoze based mailers produce incorrectly labelled emails. Mails will claim to be one of the 8 bit iso- character sets, but will contain glyphs from the upper-control set, which are used by nearly-compatible Windoze based Windows-CP125x character sets.

 CamelMimeFilter *camel_mime_filter_windows_new (const char *claimed_charset);

The claimed_charset is the character set defined in the Content-Type header for this section of data.

After the filter has processed the data, it can be queried:

 gboolean camel_mime_filter_windows_is_windows_charset (CamelMimeFilterWindows *filter);
 const char *camel_mime_filter_windows_real_charset (CamelMimeFilterWindows *filter);

Apps/Evolution/Camel.MimeFilter (last edited 2013-08-08 22:50:02 by WilliamJonMcCann)