GXml is an introspectable GObject API for XML programming. Most functionality is provided by wrapping libxml2. It endeavours to comply as best it can to the DOM Level 1 Core API (and eventually Level 2, Level 3, etc). In the future, it may provide its own native implementation, as well as provide a SAX API, an XPath API, etc; patches welcome. This summer, my mentor is Owen Taylor.
This summer is dedicated to polishing GXml and addressing issues that prevent users from using it. Planned tasks include
fixing memory management, and move to a model where the Document alone owns all nodes it creates or within its tree. DONE
fixing names (API breaks!), most prominently GXmlDomNode -> GXmlNode DONE
fixing error handling, rather than continue to misuse GError, we'll switch to printing G_WARNINGs when a program tries something it shouldn't DONE
improve documentation (fix doc bugs; add more useful information) DONE
patch select projects (identified as yelp, glade, and libgdata) to further exercise the library in progress
squash bugzilla bugs mostly DONE
measure performance DONE
GXml's reference handling was previously a mess. It's written in Vala, which makes it a little easy to be thoughtless. Memory issues have been addressed in two parts
1. Valgrind and tests
Valgrind and gdb were used to better understand where memory was used, and what was the fault of GXml (e.g. reference cycles, and failing to free a lot of libxml2 allocated memory) and what was just a byproduct of GType et al. It has been used to create a collection of .supp suppression files that exist under tests/valgrind/ that help locate actual memory leaks. Consequently, most libxml2 memory leaks were squashed early on this summer.
2. Changing our ownership model
Previously, if you created a node, the caller owned it. If you inserted it into the Document's tree, then you both had a reference to it. You'd have to unref both the document and your created node. However, for the purposes of GXml, nodes only make sense while their document exists (yes, we can imagine situations where you might want Just A Node or part of a tree), so to greatly simplify reference handling, with input from my mentor Owen Taylor, the Document alone now owns references to GXml nodes. The user only unrefs the GXmlDocument that owns them. Methods that used to return strong references now return weak references.
Also GXmlDocument is responsible for freeing libxml2 memory we allocate. (Just makes sense.)
Decided against a gxml_document_delete_node () operation.
Name changes, API changes
GXmlDomNode -> GXmlNode
It used to be GXmlDomNode for the benefit of Vala programming. Because there's a GLib.Node, you'd always have to write GXml.Node. Changing away from Node allowed us to avoid having to explicitly specify the namespace. However, that resulted in an uglier API for C, which already requires more typing, and you'd end up with common, but ugly, functions like gxml_dom_node_get_node_name (); So, a small cosmetic benefit that also helps our node class better comply with the DOM Level 1 Core spec.
DomError -> DomException
Mostly due to DOM spec compliance. Also, it was DomError just to fit in with GError naming, but because it's not a GError domain anymore (see error handling changes below), it just follows the proper name.
Near the start, one question that was raised was error handling. GErrors are meant for capturing runtime problems and not programmer errors. Reviewing the possible DOM Exceptions specified, most seemed like programmer errors, so it was decided to switch to g_warning ()s instead. This also has the added benefit of shrinking the API for C users.
completing error checking
Most exception cases from the spec weren't actually being properly checked or checked at all yet. Consequently, a bunch of work has now been done to check those cases. They were identified in a bug and I believe all but one are now checked and generate a warning. They generally do not instantly fail or return NULL though, as in many cases, libxml2 actually supports the errant behaviour. There is also a mechanism to determine what the last error reported was: GXml.last_error, which is a DomException value (where DomException is an enum). Developers can test GXml.last_error to see if everything is alright (== DomException.NONE) or, if there is an error, find out what type. Classic unix.
- Still want more negative tests.
Usage was expanded, and corrected. C developers should have a better idea of how to manage memory. JS developers should see more usage of properties.
Documentation is generated with Valadoc. The generated gtk-doc has issues. Some information found in the valadoc is absent from the gtk-doc (like hierarchy). Also, there are strange formatting issues (the table of contents is missing dashes for some classes). Some fixes have been made so far. Also, we want to link to the part of the DOM spec that each element is defined from, so a developer can refer back to it. This has been widely done this summer, but some of the links go to non-unique anchors and need to be updated.
Limitations of valadoc has prevented format pickiness.
To popularise GXml, we'll provide patches! So far, tentative patches have been written for three projects
The patches are not yet suitable due to API instability in GXml and changes in semantics (memory in particular). Mostly likely, the current patches will be replaced by new ones soon.
- revise patches and push them upstream
A bug-hunting I do go.
In particular, there are two patches floating around, one for XPath support by Adam Ples, and one for a new approach to serialization (Daniel Espinosa). Sadly, both of these will probably need changes due to changes described above. :P
To see how GXml performs compared to libxml2 and what the penalty for wrapping it is, look here: http://blog.kosmokaryote.org/2013/09/gnome-final-report-for-gxml-in-2013.html
In short, memory sees 15-20% increase, and time often varies based on the size of the file. The smaller the file, the more noticeable a time difference is, even around 50% for loading smaller files. Larger files obscure the difference due to all the time libxml2 takes to parse them.