Student: Yevgen Muntyan

The plan

See details below.

  1. Make syntax highlighting work.
  2. Syntax highlighting styles.
  3. Undo manager.
  4. Indentation.
  5. Code completion.


  • Syntax highlighting engine as it is now does not handle multiple modifications in the buffer, and needs to be fixed.

    What I am doing for this (see the irc log for details and background): the context tree is left as it is with the following changes:

    • Offsets are relative to the previous context or to the parent if it's the first context.
    • The modifications are stored right in the context tree. For inserted text, new context node (without actual context) is inserted into appropriate parent node; for deleted text, deleted contexts are removed from the tree, new zero-length node is inserted to denote the place where text was deleted. When adding such a fake node, all ancestors are marked dirty (maybe they should store the number of children marked dirty, for performance reasons, though I doubt it).
    • Then, update_syntax() walks the tree to find first invalid place, and processes it in pretty much the same way as the engine does now. The only essential difference is that it does not give up trying to merge trees after analyzing a batch.

* Highlighting needs to be integrated with code folding. It probably needs to be an extra attribute for contexts, like what Marco proposed. If it's an extra attribute or something similar, it may be added later, otherwise it must be there from start, or folding will go to the next century again.

* Miscellaneous stuff.

  • Add something like includes-end-of-line attribute to syntax rules, so that it's possible to have "trailing backslash" rule without relying on having exact text "\\\n" in the buffer (c.lang does so but it won't work for windows "\\\r\n").

    I think it should look like <regex pattern="\\$" includes-end-of-line="True"> - "match backslash in the end of line, and include line end into the match", so the enclosing context jumps to the next line automatically.

  • Kate has special LineContinue rule for trailing backslash. It sounds reasonable since trailing backslash rule is needed in lot of languages and contexts.

The code

Code is in soc-2006 branch of gtksourceview module in gnome CVS. It can be installed simultaneously with gtksourceview-1 (current), since it has changed major version (it's 2.0 now). It's not fully compatible with old gtksourceview, so applications may or may not build with it.

Random stuff

Here are problems/questions/notes/ideas/whatever, without any particular order.

  • What does tag_table_changed_cb() in gtksourcecontextengine.c do? Looks like it toggles highlighting off if the engine has some tag. ??
  • Lang files format
    1. Add extensions: mime types system is often broken, and doesn't know what user wants; it doesn't work on windows.
    2. Matching brackets, folding marks, word chars, comment symbols, etc.
  • Style schemes
    1. Keep the interface thing? Let GtkSourceStyleScheme be a normal class, perhaps private.

  • Remove GtkSourceTagTable. Highlighting engine must know better what tags it created.

  • GtkSourceView::char_inserted signal It's needed for those who needs "user entered a char" event. Using insert-text and measuring length of text doesn't work since it can be paste, or search-and-replace, or whatever.

  • RelaxNG thing in lang parser: why validate lang file every time it's parsed? If one wants to break his lib, let him do it; just provide a script which will validate lang file (xmllint should do it just fine).
  • Why parse lang file every time the language is requested?
    • Regarding these two: I have an example of language with 7000 keywords (and it's not me who made that lang file).
  • Python bindings: let's move them into the library, or at least into a separate module. gnome-* is no good for a library which doesn't depend on gnome, and going to work on windows (gtksourceview is, right?)
  • Line ends in line reader - it must use pango_find_paragraph_boundary, what it does now is broken.
  • Broken cross-references in lang files.
  • Leading and trainling spaces are stripped from patterns in lang files.

irc logs


Projects/GtkSourceView/SummerOfCode2006 (last edited 2016-01-28 12:18:58 by SébastienWilmet)