Camel.URL

CamelURL is a URL (infact, URI) abstraction object (it isn't an object either, it's just a struct). It provides an URI parser, encoder and logic to handle merging of relative URI's with base URI's.

The api has a few niggles, but this is a robust and reliable URI object which should be used whenever URI's are processed by Camel code or clients.

Each URI string when parsed is split into it's constituent parts and stored in a decoded form. Note that URI query segments are not decoded however.

 typedef struct _CamelURL {
        char  *protocol;
        char  *user;
        char  *authmech;
        char  *passwd;
        char  *host;
        int    port;
        char  *path;
        GData *params;
        char  *query;
        char  *fragment;
 } CamelURL;

It would probably be nicer if the params list was a typed structure rather than the amorphous GData type.

All but protocol may be NULL or unset. A <code>port</code> of 0 (zero) indicates no port is present.

The camel_url_new function requires a url string. The exception it is passes is pretty redundant, and only signifies a missing protocol. So, if you want to create a completely new CamelURL to to create a URI string, you need to pass at least the protocol and a ':' to camel_url_new.

camel_url_copy will copy a CamelURL creating a new object; it should more consistently be called camel_url_clone.

 CamelURL *camel_url_new(const char *url_string, CamelException *ex);
 CamelURL *camel_url_copy(const CamelURL *in);
 void camel_url_free(CamelURL *url);

Since it is only a structure, it has it's own free function.

Additionally, you can create a new CamelURL which is a base URI + a relative URI string. This will handle all the RFC cases properly and a few buggy common cases as well, forming a new URI from the combination.

 CamelURL *camel_url_new_with_base(CamelURL *base, const char *url_string);

A couple of utility functions exist to help store URL's directly in a GHashTable. These only take into account certain fields of the URL: protocol, user, authmech, host, path, query and port. All fields are considered case sensitively.

 guint camel_url_hash(const void *v);
 int camel_url_equal(const void *v, const void *v2);

Once a URL is created, it may be accessed and modified. All fields apart from the URI parameters are read by reading the member variables of the structure. These fields should never be written to; use the set accessors instead.

 const char *camel_url_get_param (CamelURL *url, const char *name);
 
 void camel_url_set_protocol(CamelURL *url, const char *protocol);
 void camel_url_set_user(CamelURL *url, const char *user);
 void camel_url_set_authmech(CamelURL *url, const char *authmech);
 void camel_url_set_passwd(CamelURL *url, const char *passwd);
 void camel_url_set_host(CamelURL *url, const char *host);
 void camel_url_set_port(CamelURL *url, int port);
 void camel_url_set_path(CamelURL *url, const char *path);
 void camel_url_set_param(CamelURL *url, const char *name, const char *value);
 void camel_url_set_query(CamelURL *url, const char *query);
 void camel_url_set_fragment(CamelURL *url, const char *fragment);

Note that set_path should take an absolute path. Setting values to NULL will remove them.

When a URL is setup, the normal thing you want to do with it is convert it into a simple string that can be saved or passed to other functions. camel_url_to_string will accomplish this task, optionally hiding some potentially sensitive information in the output.

 #define CAMEL_URL_HIDE_PASSWORD        (1 << 0)
 #define CAMEL_URL_HIDE_PARAMS  (1 << 1)
 #define CAMEL_URL_HIDE_AUTH    (1 << 2)
 
 #define CAMEL_URL_HIDE_ALL (CAMEL_URL_HIDE_PASSWORD | CAMEL_URL_HIDE_PARAMS | CAMEL_URL_HIDE_AUTH)
 
 char *camel_url_to_string(CamelURL *url, guint32 flags);

And finally there are some static utility functions, the first for applying and removing 'URL encoding' from string fragments. This is where characters outside of a specific 8-bit range are escaped using the sequence '%XX' , where XX are two hexadecimal nibbles represending the character code of the character octet.

 char *camel_url_encode(const char *part, const char *escape_extra);
 void camel_url_decode(char *part);

Internal Note

By convention, a CamelURL related object is named <code>url</code>, and the string representation of one is named <code>uri</code>, although no hard rule enforces this.

If the <code>host</code> is not present, then the URI created will be of the form <tt>protocol://path</tt>, and not the common form of <tt>protocol:///path</tt>. This is so that the conversion process is fully reversible. If you require a URI that has the 3 /'s, then call <code>camel_url_set_host(url, "")</code> to ensure the host is set.

Remember, don't ever use anything other than CamelURL to create or parse URI's in Camel.

Example: Creating a file URL

In many places, code will just use a <code>sprintf</code>(3) or similar function for creating a filename URL. This is extremely bad practice as you must make sure all of the encoding rules are properly followed. CamelURL simplifies this task.

        CamelURL *url;
 
        /* No exception - this cannot fail */
        url = camel_url_new("file:", NULL);
        camel_url_set_path(url, "/home/notzed/Documents/Work Document.txt");
        uri = camel_url_to_string(url, 0);
        camel_url_free(url);

In this case, <code>uri</code> will be set to <tt>'file://home/notzed/Documents/Work%20Documents.txt'</tt>.

Example: Parsing an incoming URL

Parsing a URL is simple:

        CamelURL *url;
 
        url = camel_url_new("imap://notzed@somewhere.com/Inbox", NULL);
        if (url == NULL)
                /* error ... */
                return;
 
        printf("protocol: %s\n", url->protocol);
        printf("host: %s\n", url->host?url->host:"&lt;unset&gt;");
        printf("path: %s\n", url->path?url->path:"&lt;unset&gt;");

When this snippet is run, it should produce:

 protocol: imap
 host: somewhere.com
 path: /Inbox

Example: Resolving a relative URL

This is a HTML-related helper function. If you have for example a document located at "<nowiki>http://somewhere.com/examples/example3.html</nowiki>" and it has an anchor inside it of "/examples/example4.html", the following code snippet can be used to calculate the new URI of "<nowiki>http://somewhere.com/examples/example4.html</nowiki>". It of course handles all of the various other rules for these, as described in RFC1808.

        CamelURL *url, *base;
 
        base = camel_url_new("<nowiki>http://somewhere.com/examples/example3.html</nowiki>");
        url = camel_url_new_with_base(base, "/examples/example4.html");
        uri = camel_url_to_string(url, 0);
        printf("uri: %s\n", uri);
        g_free(uri);
        camel_url_free(url);
 
        url = camel_url_new_with_base(base, "example4.html");
        uri = camel_url_to_string(url, 0);
        printf("uri: %s\n", uri);
        g_free(uri);
        camel_url_free(url);

Should produce:

 <nowiki>uri: http://somewhere.com/examples/example4.html</nowiki>
 <nowiki>uri: http://somewhere.com/examples/example4.html</nowiki>

Since the relative URI's provided should resolve to the same location.

Apps/Evolution/Camel.URL (last edited 2013-08-08 22:50:04 by WilliamJonMcCann)