New Document Loading

What's this about?

Using the new filters (gconverter and utf8 validator) we can convert/validate the text in an async way. This means that we don't need an intermediate buffer anymore (or maybe yes, explained bellow), less memory etc. Also the thing is that unsetting the buffer from the view, loading the buffer there and setting it back makes the loading a lot faster. A file of 100k lines with the current approach takes about 30 seconds and doing it in the other way 0.30.

Ideas

Create a new document

We create another document when loading the document, and we set it to the view when the document is loaded. Problems:

  • We were used to connect to the document when the tab was created.
  • We would need another signal, document-created or something like that to know that the document has changed.
  • This breaks the current approach, so probably lots of plugins and a big part of the core would need change.

Ref the doc, unset it from the view and reset it when loaded

Problems:

  • The main problem here is that lots of things are listening to insert-text when we create the tab so while the file is loading we are emitting the insert-text signal all the time, which means that we are screwed.
  • If we could stop the insert-text signal globally maybe this could serve, but probably could have another side effects internally

Use an intermediate buffer

This would be a mix of what we have right now and the previous one. Problems:

  • Not sure if it would work as at least one insert-text would be emitted before the buffer would be set to the view.
  • We would use the double of memory.
  • It would be slower.

Avoid firing the signals

You can either stop the signal as it is being handled or just avoid firing it altogether.

The first way is to stop the signal. This is probably the most future-proof as it uses existing API, but not the most efficient/clean:

 handler = g_signal_connect("insert-text", myhack)
 insert_some_text
 g_signal_disconnect(handler)
 
 myhack(obj, text, data) {
   g_signal_stop_emission ("insert-text");
   return GTK_TEXT_BUFFER_GET_CLASS (obj)->insert_signal (obj, text);
 }

This will have the effect of blocking all the signal handlers but the default one, which is what you want as the default one is the one who actually inserts the text (we chain to it in myhack as you can see). I (SteveFrécinaux) think this should work (but it won't stop the signal machinery to be fired)

The second way is to cut and paste the gtk_text_buffer_insert implementation into gedit, under a name like gedit_text_buffer_no_signal(), and make it call class->insert_text directly instead of firing the signal. Then the loader can use this new function directly. After looking at the source, gtk_text_buffer_insert () only calls emit_insert() which in turns calls g_signal_emit if the length of the text is > 0 and some other sanity checks which we will need to reproduce.

Third way would be to modify gtk and add a thaw_insertion_signal method in gtktextbuffer, similar to thaw_notify. Not sure it would fit in gtk anyway as it could make code that doesn't expect it go crazy I think (due to state not being consistent between the signal listener and the buffer after an insert-text signal hasn't been fired.)

So I guess we could just take the easy way and have a gedit_text_document_insert_no_signal ().

Note that we might need to actually fire a "insert-text" signal afterwards to make the plugins sync if they need to. This could be done by making our default handler inoperant somehow, but is probably not needed at first...

Don't do anything

Just load the file async and wait for someone else to fix it :)

Probably the first solution is the one I like more, but It should go in for the 3.0 release.

Conclusions

After trying all the proposed ideas I've the next:

  1. Using a new document it is fast but breaks the API/ABI (About 30 seconds on loading 10 million lines file)
  2. Unsetting the document. It is not a solution as when it is unset it seems that the properties get notified. So we try to get the data from then and we get assertions. Also the insert-text signal is emitted and we try to get the doc that it is NULL so we get assertions for it too.
  3. Use the hack that Steve wrote. It could be a solution but some plugins manage the insert-text signal which means that it doesn't get emitted in the loading process of the document. Another problem is that we make this hack on relation to an implementation detail of gtk+.
  4. This is in relation to 3. Maybe we should add some api in gtktextbuffer so we are able to insert text without emitting the insert-text and changed signals in this way we avoid the implementation detail. Also one thing that makes things slow, is that gtk_text_buffer_insert validates the text and we are already validating it with the utf8 validator, so maybe this new func should have a validate param to validate it or not when inserting the text.
  5. 2 and 3 together speed up things a lot. We get the doc loaded in 16 seconds but we would have still the problem of the property assertions.

As a main conclusion, none of the posibilities gives us a clear solution to not break API or the behavior we currently have. Probably we should wait till 3.0 to incorporate one of them. I'm also leaving here the function I've used for the insert_no_signal so it doesn't get lost. I've also tried the blocking solution provided by Steve for the changed signal but it doesn't seem to work. Probably stopping the that signal would speed up things too.

/**
 * gedit_document_insert_no_signal:
 * @buffer: a #GtkTextBuffer
 * @iter: a position in the buffer
 * @text: UTF-8 format text to insert
 * @len: length of text in bytes, or -1
 *
 * Same as gtk_text_buffer_insert() but without the emission of the insert-text
 * signal and without the utf8 validation. This function is only meant to be
 * called in the document-loader. The reason we don't validate here is because
 * it is also validated in the loader and the reason to use this function is
 * because we don't want the insert-text and the changed signals emitted when
 * loading the file. Look that we make kind of a hack to block the emission
 * of the changed signal but this is because the default handler for the
 * insert-text signal emits this signal.
 *
 * DO NOT TOUCH THIS unless you really know what you are doing!!
 */
void
_gedit_document_insert_no_signal (GeditDocument *doc,
                                  GtkTextIter   *iter,
                                  const gchar   *text,
                                  gint           len)
{
        g_return_if_fail (GTK_IS_TEXT_BUFFER (doc));
        g_return_if_fail (iter != NULL);
        g_return_if_fail (text != NULL);
        g_return_if_fail (gtk_text_iter_get_buffer (iter) == GTK_TEXT_BUFFER (doc));

        if (len < 0)
                len = strlen (text);

        GTK_TEXT_BUFFER_GET_CLASS (doc)->insert_text (GTK_TEXT_BUFFER (doc),
                                                  iter, text, len);
}

Apps/Gedit/NewDocumentLoading (last edited 2013-08-08 17:01:50 by WilliamJonMcCann)