Attachment '2006.06.11.txt'

Download

   1 **** BEGIN LOGGING AT Sun Jun 11 09:03:03 2006
   2 
   3 Jun 11 09:03:03 *	Now talking on #gsv
   4 Jun 11 09:03:07 <muntyan>	hi there
   5 Jun 11 09:03:08 <paolo>	hi
   6 Jun 11 09:03:18 <paolo>	I'd like to speak a bit about SoC
   7 Jun 11 09:03:29 <muntyan>	that's cool :)
   8 Jun 11 09:03:35 <paolo>	can you update me on the progresses?
   9 Jun 11 09:03:48 <paolo>	so we can take some decision about next steps
  10 Jun 11 09:04:29 <pbor>	hi muntyan 
  11 Jun 11 09:04:59 <muntyan>	well, no progress really
  12 Jun 11 09:05:40 <muntyan>	i didn't have time yet to do what we decided to do
  13 Jun 11 09:05:45 <paolo>	hmm... plans, thoughts, etc.  
  14 Jun 11 09:05:56 <muntyan>	the sorting + processing changes that fall into the batch
  15 Jun 11 09:06:12 <muntyan>	um, did you read our conversation with pbor and barisione?
  16 Jun 11 09:06:19 <paolo>	no
  17 Jun 11 09:06:23 <muntyan>	oh, ok
  18 Jun 11 09:06:27 <paolo>	I'm going to read it now
  19 Jun 11 09:06:35 <muntyan>	then, the plan and stuff is:
  20 Jun 11 09:06:39 <paolo>	pbor: is it already on l.g.o?
  21 Jun 11 09:07:23 <pbor>	attaching
  22 Jun 11 09:07:28 <muntyan>	there is a problem with highlighting which seems (seems, nobody knows really why it happens) to occur because sorting the modifications queue is not enough
  23 Jun 11 09:07:38 <muntyan>	so the plan is:
  24 Jun 11 09:07:59 <pbor>	http://live.gnome.org/GtkSourceView/SummerOfCode2006?action=AttachFile&do=view&target=2006-06-07.txt
  25 Jun 11 09:08:00 <paolo>	is it me or l.g.o has broken CSS today?
  26 Jun 11 09:08:24 <muntyan>	make the function which processes a batch take all the modifications from queue which intersect with the batch
  27 Jun 11 09:08:37 <muntyan>	then see what happens - whether the problem is fixed or not
  28 Jun 11 09:09:03 <muntyan>	if the problem is fixed, then it means highlighting basically works (hopefully), and we can think of speed issues and whatnot
  29 Jun 11 09:09:33 <muntyan>	i guess it wasn't really clear :)
  30 Jun 11 09:09:43 <muntyan>	paolo: how much do you know about how the engine works?
  31 Jun 11 09:10:02 <paolo>	little, so I need details to understand the problems
  32 Jun 11 09:10:25 *	paolo is reading the old IRC log 
  33 Jun 11 09:10:25 <muntyan>	okay, i'll try to explain it
  34 Jun 11 09:11:01 <muntyan>	what engine does is: 1) it analyzes the buffer from the beginning to the end; 2) it updates syntax after buffer changes
  35 Jun 11 09:11:17 <muntyan>	1) is done by processing batch by batch, in an obvius way
  36 Jun 11 09:11:21 <muntyan>	now, about modifications:
  37 Jun 11 09:12:01 <muntyan>	when text is inserted or deleted, engine records the modification as offset where it happened, and the length of text inserted/removed
  38 Jun 11 09:12:11 <muntyan>	then it puts this modification into the queue
  39 Jun 11 09:12:47 <muntyan>	then it queues updating (or processes modification immediately, but it's irrelevant)
  40 Jun 11 09:13:30 <muntyan>	idle_worker (for simplicity assume it's always idle_worker who does the analysis and everything) does the following:
  41 Jun 11 09:14:06 <muntyan>	it peeks the first modification from the queue, and updates syntax in a batch which includes changed region
  42 Jun 11 09:14:34 <muntyan>	if it can't merge old and new syntax regions in this one batch, it analyzes the remainder of buffer as it does it with new buffer
  43 Jun 11 09:15:01 <muntyan>	if there are no modifications in the queue, it processes not analyzed remainder of the buffer
  44 Jun 11 09:15:14 <muntyan>	so, there are bunch of problems here:
  45 Jun 11 09:16:28 <muntyan>	1) it needs to sort modifications and update them. for example, if you delete text at position 100, and then delete text at position 10, then the at-pos-10 modification must be processed first, and the second modification should have offset corrected (it was 100, but after second deletion it should be 100-(how many chars deleted))
  46 Jun 11 09:17:29 <muntyan>	2) what seems to be the problem now: when engine processes a modification, it takes a big batch, and potentially it changes syntax in whole batch, so the changed area may include modifications which are still in the queue
  47 Jun 11 09:17:54 <muntyan>	the plan for the nearest future is to fix 2) and see what happen
  48 Jun 11 09:17:59 <muntyan>	the end.
  49 Jun 11 09:18:00 <muntyan>	:)
  50 Jun 11 09:18:37 <paolo>	first question: how are batches computed? What is a batch in this context?
  51 Jun 11 09:18:46 <muntyan>	4000 chars, roughly
  52 Jun 11 09:18:56 <paolo>	ok
  53 Jun 11 09:19:05 <muntyan>	it calculates the length according to how fast it processed previous batches, but that's details
  54 Jun 11 09:19:25 <paolo>	yep
  55 Jun 11 09:20:33 <paolo>	you said the whole buffer is analyzed, batch by batch? Is this a sync operation?
  56 Jun 11 09:20:51 <muntyan>	no, idle_worker processes one batch
  57 Jun 11 09:21:11 <paolo>	what does it happens if a change is queued before the whole buffer has been analyzed for the first time?
  58 Jun 11 09:21:37 <muntyan>	it keeps worker_last_offset - how much of the buffer has been analyzed
  59 Jun 11 09:21:49 <muntyan>	if the modification is after last_offset, it's just ignored
  60 Jun 11 09:22:09 <paolo>	what does it happen if I'm going to "show" a part of the buffer that still need to be analyzed?
  61 Jun 11 09:22:54 <muntyan>	if it's before, then the modification is processed; if it could merge old and new in one batch, it processes the region after last_offset as usual (in next idle run); if not, then it sets last_offset to the end of the batch, and then proceeds as usual
  62 Jun 11 09:23:50 <muntyan>	when part of buffer is exposed, then it's asked to apply tags in the exposed area; if it's not analyzed yet, there are no tags
  63 Jun 11 09:24:48 <muntyan>	i looked at the sources, and:
  64 Jun 11 09:25:44 <muntyan>	it checks last_offset. if exposed area is before last_offset, it applied tags; if not, it records the region to apply tags later, then it will be analyzed
  65 Jun 11 09:26:25 <pbor>	yes, the code there is pretty much what current sourceview does as far as I can tell
  66 Jun 11 09:26:39 <pbor>	(except for some details I don't like)
  67 Jun 11 09:26:44 <muntyan>	also, when it updates syntax somewhere, it emits signal so that view can queue redraw in the updated area; then expose will ask to apply correct tags
  68 Jun 11 09:26:49 <muntyan>	seems to be correct
  69 Jun 11 09:26:54 <paolo>	how fast is the first analysis of the whole buffer (for example on a 10000 lines C file)
  70 Jun 11 09:27:44 <muntyan>	good question
  71 Jun 11 09:28:08 <paolo>	worker_last_offset is the last analyzed offset, right?
  72 Jun 11 09:28:19 <pbor>	yep
  73 Jun 11 09:29:01 <muntyan>	it looks "fast enough", i didn't measure it
  74 Jun 11 09:29:11 <pbor>	the analisys of the whole doc on open seems of acceptable performance to me too
  75 Jun 11 09:30:10 <pbor>	one of the issues with the queue instead is that if you have something like d'n'd, an expose event forces to apply tags between the remove and the add part of the dnd
  76 Jun 11 09:30:21 <pbor>	which leads to flicker
  77 Jun 11 09:30:41 <paolo>	I was thinking about making a very fast first analysis that only splits the document in first level contexts and the re-analyze on demands the current context before displaying it to apply the correct tags
  78 Jun 11 09:30:44 <muntyan>	pbor: no, if you process more than one modification at a time
  79 Jun 11 09:30:57 <muntyan>	this is what's needed to be done
  80 Jun 11 09:31:19 <pbor>	yes, I am just exposing one of the issues there are now
  81 Jun 11 09:31:26 <pbor>	didn't say it's unfixable
  82 Jun 11 09:31:27 <pbor>	:)
  83 Jun 11 09:31:40 <muntyan>	paolo: that would work only if first-level contexts don't need children to know where they end
  84 Jun 11 09:31:49 <muntyan>	this is one of things i wanted to talk about
  85 Jun 11 09:32:23 <paolo>	hmm... what do you mean?
  86 Jun 11 09:33:36 <muntyan>	1) child context may decide to terminate its parent
  87 Jun 11 09:34:09 <muntyan>	2) it seems to be nicer if many rules may decide when to terminate context
  88 Jun 11 09:34:28 <muntyan>	at the moment contexts have start and end conditions (regexes or chars, or something)
  89 Jun 11 09:34:39 <muntyan>	i think they should have more than that
  90 Jun 11 09:34:54 <muntyan>	e.g. here i have a lang file which has #pop#pop
  91 Jun 11 09:35:12 <muntyan>	it means it terminates itself and parent
  92 Jun 11 09:35:19 <muntyan>	did you understand something?
  93 Jun 11 09:36:04 <paolo>	yep, but can you report a specific use case for this?
  94 Jun 11 09:36:29 <muntyan>	    <!-- CaseIn is called when the construct 'case ... in' has been found. -->
  95 Jun 11 09:36:29 <muntyan>	    <context style="Normal Text" eol-context="#stay" name="CaseIn">
  96 Jun 11 09:36:29 <muntyan>	      <Regex style="Keyword" context="#pop#pop" pattern="\besac(?=$|[\s;)])" endRegion="case" />
  97 Jun 11 09:36:47 <muntyan>	some crazy thing from kate
  98 Jun 11 09:36:51 <muntyan>	's bash lang file
  99 Jun 11 09:38:03 <paolo>	Is it so important to support this feature? Are there other ways to support the same feature?
 100 Jun 11 09:38:25 <muntyan>	i have no idea
 101 Jun 11 09:38:37 <paolo>	the .lang files by barisione support this feature?
 102 Jun 11 09:38:44 <muntyan>	but i don't like that contexts have end attribute
 103 Jun 11 09:38:52 <muntyan>	paolo: the engine doesn't know about such a thing
 104 Jun 11 09:39:10 <muntyan>	context has end which is defined by regex, and that's it
 105 Jun 11 09:39:46 <muntyan>	besides, what happens when context and its child are both terminated by same thing, e.g. end of line?
 106 Jun 11 09:40:16 <muntyan>	like
 107 Jun 11 09:40:19 <muntyan>	#ewfwe //ewfwef
 108 Jun 11 09:40:22 <muntyan>	in a C file
 109 Jun 11 09:40:45 <muntyan>	(not with gtksourceview's c.lang, it doesn't know that preprocessor directive is whole line, not only word after #)
 110 Jun 11 09:40:54 <paolo>	it seems the .lang files by barisione do not require an "end" attribute for contexts
 111 Jun 11 09:41:16 <muntyan>	they don't reuiqre it for "simple" contexts
 112 Jun 11 09:41:30 <muntyan>	those simple contexts are just regions matched by certain regex
 113 Jun 11 09:41:49 <muntyan>	but container contexts do require end attribute
 114 Jun 11 09:42:00 <muntyan>	(at least in my reading code and lang files :)
 115 Jun 11 09:42:24 <paolo>	ok
 116 Jun 11 09:43:00 <paolo>	(this is a question to ask to barisione, please collect all the open questions in a mail to send later)
 117 Jun 11 09:43:18 <muntyan>	ok
 118 Jun 11 09:43:21 <pbor>	mmm... isn't the end regex of a container context done by ORing all the possible ends of the child context which terminate the parent?
 119 Jun 11 09:44:09 <paolo>	no
 120 Jun 11 09:44:15 <muntyan>	it can't work, can it?
 121 Jun 11 09:44:33 <muntyan>	you can't just match anything, contexts may jump like crazy
 122 Jun 11 09:44:33 <paolo>	well, yes
 123 Jun 11 09:44:58 <muntyan>	for this to work you need *everything* be controlled by end regexes
 124 Jun 11 09:45:05 <muntyan>	e.g. ends-at-line-end breaks it
 125 Jun 11 09:45:07 <muntyan>	i think
 126 Jun 11 09:45:09 <paolo>	but you need to use a stack of context also during the fast analysis
 127 Jun 11 09:45:55 <paolo>	ends-at-line-end it is one of the ORed possible ends
 128 Jun 11 09:46:24 <muntyan>	but child may opt not to terminate there
 129 Jun 11 09:46:28 <muntyan>	while parent may temrinate
 130 Jun 11 09:46:36 <muntyan>	it's how trailing \ is done
 131 Jun 11 09:46:39 <paolo>	ok
 132 Jun 11 09:47:12 <muntyan>	the parent (preprocessor) ends at line-end, but it has child context that consumes end of line, so the parent won't see it
 133 Jun 11 09:47:14 <paolo>	we can analyse this problem later
 134 Jun 11 09:47:24 <paolo>	let us return to the current problems
 135 Jun 11 09:47:39 <paolo>	it seems to me the mother of all the problems is the queue of changes
 136 Jun 11 09:47:50 <muntyan>	(regular expressions can encapsulate arbitrarily complex thing, but noone will understand it)
 137 Jun 11 09:47:54 <paolo>	I have a question... do we really need it?
 138 Jun 11 09:48:06 <muntyan>	well, yes if we don't want to touch contexts tree
 139 Jun 11 09:48:17 <muntyan>	context tree is the mother of the problems
 140 Jun 11 09:48:29 <muntyan>	it keeps offsets in the nodes
 141 Jun 11 09:48:31 <paolo>	what is the "context tree"?
 142 Jun 11 09:48:39 <paolo>	ok
 143 Jun 11 09:48:46 <paolo>	stupid question
 144 Jun 11 09:48:56 <muntyan>	not really stupid
 145 Jun 11 09:48:59 <paolo>	so the problem is not the context tree per se
 146 Jun 11 09:49:25 <muntyan>	well, it depends on how you look at it
 147 Jun 11 09:49:29 <muntyan>	let me explain
 148 Jun 11 09:49:40 <muntyan>	there are two things in here:
 149 Jun 11 09:50:08 <muntyan>	1) the 'state machine', the tree of contexts which represents how contexts may be there, how parents include children and such
 150 Jun 11 09:50:42 <muntyan>	2) the sequence of segments of text, where each segment corresponds to some state of that state machine, or to some node of the contexts tree
 151 Jun 11 09:50:58 <muntyan>	in the engine these two are one thing, the "contexts tree"
 152 Jun 11 09:51:25 <muntyan>	if it's right, then the queue is the problem, but it can't be removed
 153 Jun 11 09:51:36 <muntyan>	if it's not right, then the contexts tree is the problem
 154 Jun 11 09:51:37 <muntyan>	:)
 155 Jun 11 09:52:54 <muntyan>	the *real* problem is that the engine has no idea about the fact that buffer can be modified many times and in any order; the rest is implementation details ;)
 156 Jun 11 09:53:11 <muntyan>	and we are free to solve it in any way we like
 157 Jun 11 09:53:31 <muntyan>	(saying this just to make sure noone is afraid of removeing queue or something)
 158 Jun 11 09:54:09 <paolo>	well, it seems to me we need a contexts tree (or syntax tree)
 159 Jun 11 09:54:17 <paolo>	do all agree on this?
 160 Jun 11 09:55:11 <paolo>	and if I have understand the problem, the current algorithms does not support multiple changes
 161 Jun 11 09:55:25 <paolo>	but only single changes, right?
 162 Jun 11 09:55:46 <pbor>	well, I guess there are other data structures which could hold the syntax informations
 163 Jun 11 09:56:10 <muntyan>	we do need one tree, for sure
 164 Jun 11 09:56:16 <pbor>	in particular all the regex work is based on analyzing lines
 165 Jun 11 09:56:28 <pbor>	while the context tree has no idea about lines
 166 Jun 11 09:56:35 <muntyan>	but we don't necessarily need to have syntax segments in the contexts tree
 167 Jun 11 09:57:13 <muntyan>	and yes, current algorithms do not really support multiple changes
 168 Jun 11 09:57:39 <paolo>	ok, explain me how do you see the prefect context tree 
 169 Jun 11 09:58:01 <paolo>	I mean we have some code, it is not working for several reasons
 170 Jun 11 09:58:07 <muntyan>	i wish i knew what perfect context tree is :)
 171 Jun 11 09:58:14 <muntyan>	i guess i'd start implementing it
 172 Jun 11 09:58:19 <paolo>	there are not millions of lines
 173 Jun 11 09:58:34 <paolo>	so we can rewrite part of it without too much worrying
 174 Jun 11 09:58:47 <muntyan>	i can describe what i have in my engine:
 175 Jun 11 09:58:52 <paolo>	ok
 176 Jun 11 09:59:21 <muntyan>	1) the contexts tree, which is just a tree of contexts and their children, nodes are added when corresponding context is met in the buffer
 177 Jun 11 09:59:22 <paolo>	may be explain also how it is different from the gsv one
 178 Jun 11 10:00:06 <muntyan>	2) the text part is kept in a separate structure: each line has list of segments, each segment keeps pointer to a node of contexts tree
 179 Jun 11 10:00:45 <muntyan>	this makes it easy to invalidate parts of buffer - just erase segments on the lines
 180 Jun 11 10:01:06 <muntyan>	it's still not convenient for modifications inside of the line though
 181 Jun 11 10:01:20 <muntyan>	err, this is a bit not what i am describing :)
 182 Jun 11 10:01:52 <paolo>	well, if you invalidate part of the buffer you also need to remove nodes from the tree, right?
 183 Jun 11 10:01:54 <muntyan>	so, the main thing is: contexts are the states in the state machine; segments in texts are segments which "are in this state"
 184 Jun 11 10:02:01 <muntyan>	paolo: no
 185 Jun 11 10:02:08 <paolo>	why not?
 186 Jun 11 10:02:18 <muntyan>	i have one node for top-level comments, for example
 187 Jun 11 10:02:29 <muntyan>	regardless of how many comments are in the buffer
 188 Jun 11 10:02:46 <muntyan>	i have many text segments that point to "top-level C comment" node
 189 Jun 11 10:03:17 <pbor>	basically your tree describes the possible state transitions for a language
 190 Jun 11 10:04:01 <muntyan>	right
 191 Jun 11 10:04:17 <muntyan>	(not all possible, only those that actually occur in the text)
 192 Jun 11 10:04:25 <paolo>	it contains a subset of the state machine for the specific language
 193 Jun 11 10:04:31 <muntyan>	right
 194 Jun 11 10:05:01 <muntyan>	what's good in this (or bad in gtksourceview's) is the following:
 195 Jun 11 10:05:17 <muntyan>	in mine, i can merge two segments simply by comparing the nodes
 196 Jun 11 10:05:32 <muntyan>	in gtksourceview's, it needs nodes to be the same with all parents
 197 Jun 11 10:06:17 <muntyan>	not sure it's really very good, but it is simple
 198 Jun 11 10:06:54 <muntyan>	there is a potential problem in gtksourceview, related to this:
 199 Jun 11 10:07:24 <muntyan>	when text is modified, the engine takes a batch of text, and reanalyzes it, trying to merge old and new nodes
 200 Jun 11 10:07:44 <muntyan>	if it can't do it in this batch, it simply reanalyzes whole remainder of the buffer
 201 Jun 11 10:08:01 <muntyan>	even if contexts can be merged in the next batch
 202 Jun 11 10:08:23 <paolo>	hmmm... actually I think we need what is normally called a "concrete syntax tree"
 203 Jun 11 10:08:32 <muntyan>	what's that?
 204 Jun 11 10:08:36 <paolo>	see http://en.wikipedia.org/wiki/Concrete_syntax_tree
 205 Jun 11 10:09:24 <muntyan>	it looks like the tree in gtksourcecontextengine
 206 Jun 11 10:09:39 <pbor>	yep, that's what the context tree is right now
 207 Jun 11 10:09:50 <muntyan>	the problem is that it's not modification-friendly
 208 Jun 11 10:10:11 <pbor>	that is a tree that represents the current state of the whole buffer
 209 Jun 11 10:10:12 <muntyan>	just coloring is only half of work; we need to recolor as well :)
 210 Jun 11 10:11:45 <pbor>	building a tree like muntyan's one would also give us the chance to share it across files
 211 Jun 11 10:11:48 <muntyan>	hm, does colorer support modifications?
 212 Jun 11 10:11:56 <pbor>	that is one tree for all C files open
 213 Jun 11 10:12:12 <pbor>	muntyan: yes, it's the engine used in eclipse I think
 214 Jun 11 10:13:10 <paolo>	colorer should support modifications
 215 Jun 11 10:13:14 <paolo>	it is not used in eclipse
 216 Jun 11 10:13:44 <paolo>	when barisione started his work we planned to rewrite colorer in a glib friendly way
 217 Jun 11 10:14:01 <paolo>	but it seems it was not a good solution
 218 Jun 11 10:14:08 <paolo>	I don't remember the details
 219 Jun 11 10:15:19 <pbor>	paolo: http://colorer.sourceforge.net/sshots/e2.png
 220 Jun 11 10:15:30 <pbor>	paolo: though maybe it's just an additional plugin
 221 Jun 11 10:15:39 <paolo>	there is a plugin for ecplise based on colorer
 222 Jun 11 10:17:03 <pbor>	muntyan: can you tell us more about the other data structure you use? that is the one that holds the segments...
 223 Jun 11 10:17:22 <paolo>	it is not clear to me why a tree like the one I said should not work
 224 Jun 11 10:17:28 <pbor>	muntyan: I know it's a btree, but I don't totally grok why
 225 Jun 11 10:17:30 <paolo>	?
 226 Jun 11 10:17:42 <muntyan>	paolo: it's already there, and it doesn't work :)
 227 Jun 11 10:17:55 <muntyan>	paolo: the tree itself will work, why not? but it needs some stuff to support it
 228 Jun 11 10:18:10 <paolo>	yep, I agree on this
 229 Jun 11 10:18:36 <paolo>	well, it could also work without additional data structures
 230 Jun 11 10:18:40 <muntyan>	pbor: btree because it has easy insertion and deletion
 231 Jun 11 10:18:50 <muntyan>	paolo: how to deal woth modifications?
 232 Jun 11 10:18:54 <muntyan>	wih
 233 Jun 11 10:18:56 <muntyan>	with
 234 Jun 11 10:19:22 <paolo>	invalidating part of the tree when a modification happens
 235 Jun 11 10:19:37 <muntyan>	don't forget about tags
 236 Jun 11 10:19:54 <muntyan>	'tags are applied in the buffer, and they need to be removed when new highlighting is ready
 237 Jun 11 10:20:00 <pbor>	muntyan: I mean, what do you cache in the nodes? for instance GtkTextBTree allows to know if there is a tag without checking all the leafs/segments
 238 Jun 11 10:20:18 <muntyan>	at the moment tree keeps offsets, and with modifications queue you always can tell where tags are applied
 239 Jun 11 10:20:41 <muntyan>	pbor: let me look at it :)
 240 Jun 11 10:21:30 <paolo>	I don't think tags are a problem here
 241 Jun 11 10:21:52 <paolo>	suppose you invalidate a part of the tree (a subtree)
 242 Jun 11 10:21:59 <muntyan>	pbor: i keep number of line marks in nodes; and leaf nodes contain highlighting stuff and the marks
 243 Jun 11 10:22:06 <paolo>	a subtree represent a part of the buffer
 244 Jun 11 10:22:13 <pbor>	paolo: tags are not a problem, the problem is having offsets in the tree nodes
 245 Jun 11 10:22:21 <paolo>	you have to remove the tags from that part of the buffer
 246 Jun 11 10:22:34 <muntyan>	paolo: but you can't do it until you have new tags
 247 Jun 11 10:22:48 <paolo>	muntyan: why not?
 248 Jun 11 10:23:02 <muntyan>	because there is nothing to delete until then
 249 Jun 11 10:23:07 <paolo>	pbor: since we need to use relative offset
 250 Jun 11 10:23:12 <pbor>	paolo: yes, but how do you map a subtree to an area of the buffer?
 251 Jun 11 10:23:13 <muntyan>	if you delete text, there is no region to delete tags from
 252 Jun 11 10:23:13 <paolo>	pbor: not absolute ones
 253 Jun 11 10:23:22 <muntyan>	you need to analyze it first
 254 Jun 11 10:23:34 <muntyan>	or if you insert text, same thing
 255 Jun 11 10:24:02 <muntyan>	you can mark position as "invalid", but you don't know in advance what *area* really changed
 256 Jun 11 10:24:42 <paolo>	hmm... it is difficult to explain without figures
 257 Jun 11 10:25:09 <paolo>	let me try step by step
 258 Jun 11 10:25:16 <paolo>	don't think to the current implementation
 259 Jun 11 10:25:23 <muntyan>	hm, actually i don't see why what i'm saying makes sense
 260 Jun 11 10:25:38 <paolo>	suppose you have a syntax tree
 261 Jun 11 10:25:57 <paolo>	like the one in the wikipedia figure
 262 Jun 11 10:27:41 <paolo>	suppose you add a piece of text between "the" and "ball", then you invalidate a part of the tree
 263 Jun 11 10:28:12 <paolo>	if the text does not create a new context, then you only need to add a new node (without invalidating other nodes)
 264 Jun 11 10:28:31 <paolo>	otherwise you will have to invalidate part (or all) the tree
 265 Jun 11 10:29:34 <paolo>	let us speak about text offset now
 266 Jun 11 10:29:57 <paolo>	ATM we use absolute offset
 267 Jun 11 10:30:04 <muntyan>	i'll be back in ten minutes, son is hungry
 268 Jun 11 10:30:36 <paolo>	so "Jonh" is [0-4], "hit" is [5-8]" and so on
 269 Jun 11 10:30:55 <paolo>	I think we should use relative offset
 270 Jun 11 10:31:08 <paolo>	where relative is according to the "father" node
 271 Jun 11 10:31:46 <paolo>	in this way it will be easier to make changes without visiting all the tree
 272 Jun 11 10:32:10 <paolo>	and the queue will contain "nodes" and not text segments
 273 Jun 11 10:32:16 <paolo>	what do you think?
 274 Jun 11 10:32:34 <pbor>	using relative offsets may be better for the tree merging, but I don't see how it helps us:
 275 Jun 11 10:32:57 <pbor>	the fact is that the real analisys happens async, in the idle
 276 Jun 11 10:33:25 <pbor>	so a second modification can happen before you analyzed the first one
 277 Jun 11 10:33:47 <paolo>	wait
 278 Jun 11 10:33:51 <pbor>	and you can't put the nodes in queue, since it means analyzing right away
 279 Jun 11 10:34:19 <paolo>	we can split analysis in a sync part (i.e. updating or invalidating the tree)
 280 Jun 11 10:34:45 <paolo>	and a second aync part "rebuilding" the invalidate part of the tree
 281 Jun 11 10:35:48 <paolo>	i,e. always having a fast analysis computing only the "first" level of the subtree
 282 Jun 11 10:36:04 <pbor>	well, that depends on how fast the tree invalidation is...
 283 Jun 11 10:36:37 <paolo>	it only works on the tree, so I think it should be fast enough
 284 Jun 11 10:37:26 <pbor>	it may also have other problems for performance (though here is all moot without numbers)
 285 Jun 11 10:38:05 <pbor>	for instance a 'paste' inserts many chuncks of text (one for each tag toggle)
 286 Jun 11 10:38:26 <pbor>	so you would do many invalidations before reanalizying
 287 Jun 11 10:38:45 <muntyan>	how do you invalidate part between contexts?
 288 Jun 11 10:38:59 <muntyan>	e.g. top-level no-contexts text
 289 Jun 11 10:39:14 <muntyan>	2) merging is still hard
 290 Jun 11 10:39:29 <paolo>	merging will be always hard :)
 291 Jun 11 10:39:43 <muntyan>	not if you don't keep segments in the tree
 292 Jun 11 10:39:49 <paolo>	invalidating part  between contexts means adding a new context
 293 Jun 11 10:40:03 <paolo>	and updating the offset of the following siblings
 294 Jun 11 10:40:30 <paolo>	it is the easiest update
 295 Jun 11 10:41:41 <paolo>	what I mean is that syntax trees are historically the most important data structure in language analysis
 296 Jun 11 10:41:57 <paolo>	and I don't think we have to invent new structures
 297 Jun 11 10:42:19 <muntyan>	well, let me play angry:
 298 Jun 11 10:42:57 <muntyan>	i have working highlighting without any knowledge of history; and there is broken engine which didn't invent new data structures
 299 Jun 11 10:42:57 <paolo>	What about trying to look at eclipse code and at colorer to see what they are using?)
 300 Jun 11 10:43:03 *	muntyan stops playing angry
 301 Jun 11 10:43:22 <paolo>	hehe :)
 302 Jun 11 10:43:30 <muntyan>	what i want to say is that we need something that will work and is good. "history" is zero weight argument here, imho
 303 Jun 11 10:44:37 <muntyan>	sheesh, color tarball doesn't have single toplevel dir!
 304 Jun 11 10:44:41 <muntyan>	colorer
 305 Jun 11 10:44:45 <muntyan>	good it's in /tmp
 306 Jun 11 10:44:45 <paolo>	well, history is always important... if you want call it a "design pattern" :)
 307 Jun 11 10:45:03 <muntyan>	i don't give a shit to that, it means too little these days
 308 Jun 11 10:45:11 <muntyan>	everybody uses "design patterns" ;)
 309 Jun 11 10:45:35 <pbor>	paolo: well, the fact that a tree can be used to represent a syntax is no news, but the real question is what do we need the tree for?
 310 Jun 11 10:45:40 <muntyan>	anyway, if a tree is good, it's good. but we need to know for sure what we choose, and why
 311 Jun 11 10:46:43 <pbor>	I mean, the engine is keeping a tree of the whole buffer syntax up to date, but as far as I can see it's not really using it
 312 Jun 11 10:47:22 <pbor>	for highlighting it just wants to know which tag apply to each segment
 313 Jun 11 10:47:38 <pbor>	and to do this you have to walk the whole segments
 314 Jun 11 10:47:57 <pbor>	be it be going trough a list, walking a tree or whatever
 315 Jun 11 10:49:00 <muntyan>	i personally worry about modifications, they have proven to be the hard part ;)
 316 Jun 11 10:49:27 <pbor>	maybe the whole syntax tree can be very useful for stuff like folding, or for indenters... but I am missing the compelling reason to have a full tree representation of the syntax of the text in the buffer
 317 Jun 11 10:50:04 <paolo>	to avoid to re-analyze part of the buffer during modifications
 318 Jun 11 10:50:18 <paolo>	and put the basis for further analysis
 319 Jun 11 10:51:22 <pbor>	but for hl I just want to iter through the text and know at which point of the syntax 'stack' I am in
 320 Jun 11 10:52:08 <pbor>	that is, I just need to know the parent context, not all the ancestors
 321 Jun 11 10:52:31 <pbor>	(except for the #pop#pop case muntyan said before)
 322 Jun 11 10:52:32 <muntyan>	in either case we have a tree, just maybe implicitly; question is whether we really need a big tree with number of nodes equal number of syntax regions in the buffer
 323 Jun 11 10:53:00 <muntyan>	pbor: we do need all the ancestors; but once we have parent, we have its parent, and so on
 324 Jun 11 10:53:29 <pbor>	muntyan: yes, sure
 325 Jun 11 10:54:21 <pbor>	anyway, I think that invalidating the tree right away could solve the issues with the queue
 326 Jun 11 10:54:38 <pbor>	what has to be seen is if we can do that fast enough
 327 Jun 11 10:55:28 <muntyan>	should be fast
 328 Jun 11 10:58:08 <pbor>	as far as I can see that is also orthogonal to the relative offset issue
 329 Jun 11 10:59:10 <muntyan>	what do you mean?
 330 Jun 11 10:59:45 <pbor>	that you can invalidate a part of a tree even with absolute offsets
 331 Jun 11 10:59:49 <pbor>	can't you?
 332 Jun 11 11:00:36 <paolo>	yep, they are relative problems
 333 Jun 11 11:00:44 <paolo>	I mean ortogonal
 334 Jun 11 11:00:54 <muntyan>	but if offsets are absoulte, you need to walk all the nodes?
 335 Jun 11 11:01:17 <paolo>	using relative offset enable in some cases very fast changes without the need to walking all the subtree
 336 Jun 11 11:01:35 <muntyan>	relative offsets rule
 337 Jun 11 11:01:41 <muntyan>	absolute offsets suck
 338 Jun 11 11:01:43 <muntyan>	:)
 339 Jun 11 11:02:47 <pbor>	yes, we all agree that relative offsets are better
 340 Jun 11 11:03:01 <paolo>	I'd say: can we try to write down the "modification" algorithm in pseudo code (so we can also use it for documentation)?
 341 Jun 11 11:04:25 <paolo>	I think a modification can only have 3 effects:
 342 Jun 11 11:04:31 <paolo>	- adding a context
 343 Jun 11 11:05:15 <paolo>	- invalidating a subtree of contexts
 344 Jun 11 11:05:50 <paolo>	well they are only 2
 345 Jun 11 11:06:33 <paolo>	actually we could reduce "adding a context" as a sub-case of "invalidating a subtree"
 346 Jun 11 11:06:34 <muntyan>	comments in colorer look like it reparses everything after modification
 347 Jun 11 11:07:09 <paolo>	pbor: do you still have the thesis of barisione?
 348 Jun 11 11:07:15 <pbor>	paolo: yes
 349 Jun 11 11:07:21 <muntyan>	what does "invalidating subtree" mean? invalidating whole subtree and reanalyzing it?
 350 Jun 11 11:07:28 <paolo>	are therec omments con colorer?
 351 Jun 11 11:08:28 <paolo>	it means removing it from the context tree and mark the linked buffer segment as "to-be-analyzed" 
 352 Jun 11 11:09:13 <pbor>	paolo: not really... just a small paragraph where it says why implementing sourceview instead of using colorer
 353 Jun 11 11:09:29 <paolo>	well, no, it means:
 354 Jun 11 11:09:37 <paolo>	- removing the subtree
 355 Jun 11 11:09:41 <pbor>	paolo: and the arguments are: overcomplex, C++, ugly lang files
 356 Jun 11 11:09:59 <muntyan>	i am thinking about inserting text at the top level
 357 Jun 11 11:10:01 <paolo>	- leaving the root (marked as "to-be-analyzed")
 358 Jun 11 11:10:18 <muntyan>	you don't want to remove anything, i.e. adding a node is not invalidating a subtree
 359 Jun 11 11:10:50 <muntyan>	http://rafb.net/paste/results/bXZKT725.html
 360 Jun 11 11:10:55 <paolo>	ok, I agree
 361 Jun 11 11:11:12 <muntyan>	that's from colorer header, looks rather nasty
 362 Jun 11 11:11:36 <paolo>	ok
 363 Jun 11 11:11:53 <muntyan>	anyway, who is colorer?
 364 Jun 11 11:11:59 <muntyan>	does it work in any decent editor?
 365 Jun 11 11:12:12 <paolo>	see the homepage :)
 366 Jun 11 11:12:24 <paolo>	I have never tried the cited editors
 367 Jun 11 11:12:31 <muntyan>	eclipse is shit, mc doesn't have live updating on modifications
 368 Jun 11 11:12:38 <muntyan>	dunno about far
 369 Jun 11 11:13:44 <paolo>	eclipse is great :) 
 370 Jun 11 11:13:51 <paolo>	but it is not using colorer
 371 Jun 11 11:15:00 <muntyan>	eclipse is great? you have strong nerves then :)
 372 Jun 11 11:15:08 <muntyan>	or using java
 373 Jun 11 11:15:52 <paolo>	using java
 374 Jun 11 11:16:04 <muntyan>	it must be great then :)
 375 Jun 11 11:18:14 <paolo>	eclipse is a great IDE for Java... but it sucks for C/C++
 376 Jun 11 11:21:50 <pbor>	so, to sum things up, do we have a plan?
 377 Jun 11 11:24:04 <pbor>	muntyan: how much effort is to 'fix' the queue for the 'debug thing'? if it's not much maybe it would be nice to fix it just to see if something else bites us before touching the context tree...
 378 Jun 11 11:25:25 <paolo>	I think the plan is:
 379 Jun 11 11:25:31 <muntyan>	lot of effort
 380 Jun 11 11:25:59 <paolo>	1. writing the modification algorithm in pseudo-code
 381 Jun 11 11:26:10 <paolo>	2. validating it by hand
 382 Jun 11 11:26:39 <paolo>	3. modifying the current context tree according to 1 (using relative offsets)
 383 Jun 11 11:26:45 <paolo>	do you agree?
 384 Jun 11 11:26:52 <muntyan>	3. is really really hard
 385 Jun 11 11:27:06 <paolo>	why?
 386 Jun 11 11:27:07 <muntyan>	for me it besically means rewriting everything
 387 Jun 11 11:27:22 <paolo>	it could be
 388 Jun 11 11:27:56 <paolo>	trying to preserve most code is possible
 389 Jun 11 11:28:19 <paolo>	or cutting and pasting in the new code most old-code is possible
 390 Jun 11 11:28:38 <paolo>	I'm thinking to the code performing parsing
 391 Jun 11 11:28:51 <muntyan>	one needs to understand the code first ;)
 392 Jun 11 11:28:55 <paolo>	do you guys agree in my action plans?
 393 Jun 11 11:29:07 <muntyan>	honestly, i don't like it
 394 Jun 11 11:29:14 <paolo>	why?
 395 Jun 11 11:29:18 <muntyan>	we did not discuss all the issues
 396 Jun 11 11:29:31 <paolo>	let us discuss them, then
 397 Jun 11 11:29:42 <muntyan>	and this plan is a big plan, it's not something "for now to see what happens"
 398 Jun 11 11:30:09 <muntyan>	how to merge old and new trees?
 399 Jun 11 11:30:24 <muntyan>	do we want to give up after N characters and reanalyze reminder of the buffer?
 400 Jun 11 11:31:45 <pbor>	well, that is something that need measuring
 401 Jun 11 11:32:01 <pbor>	it really depends on how much slower is try to merge the old tree
 402 Jun 11 11:32:20 <muntyan>	it's important
 403 Jun 11 11:32:25 <paolo>	well, this is what point 1 needs to 
 404 Jun 11 11:32:35 <muntyan>	if we want not to give up, we need to keep new and old stuff around
 405 Jun 11 11:33:41 <paolo>	what do you mean with "merging old and new trees"?
 406 Jun 11 11:34:42 <muntyan>	when some text is modified, it potentially changes highlighting after the modification place
 407 Jun 11 11:35:01 <muntyan>	so we need to merge what we get after analyzing modified text and what we had before
 408 Jun 11 11:35:18 <pbor>	paolo: the current algorith is: "when a modification is made find the upper context affected, detach the subtree, fix its offsets. then start reanalyzin. at each step see if the old tree merges, if yes use it and stop reanalyizing"
 409 Jun 11 11:35:29 <muntyan>	what happens is that we have old tree, covering all the buffer; and we have new branches growing from the modified region
 410 Jun 11 11:35:45 <muntyan>	we need to glue those branches and old tree, and throw away wrong old nodes
 411 Jun 11 11:36:30 <pbor>	well, "invalidating" really means marking an 'old' subtree in the tree
 412 Jun 11 11:36:37 <paolo>	why it is useful merging with the old tree? Why we cannot simple replace it?
 413 Jun 11 11:36:53 <muntyan>	paolo: we can't simply reanalyze all the reminder of the buffer
 414 Jun 11 11:36:54 <pbor>	paolo: because analyzing is slow
 415 Jun 11 11:37:01 <muntyan>	insert one char, for example
 416 Jun 11 11:37:07 <muntyan>	you don't want to analyze whole file
 417 Jun 11 11:37:53 <muntyan>	yeah, if it weren't slow, we wouldn't have any problem :)
 418 Jun 11 11:38:57 <paolo>	ok, I see
 419 Jun 11 11:39:23 <paolo>	but still this is what point 1 is needed for
 420 Jun 11 11:39:39 <paolo>	we need to try to write down the algorithm
 421 Jun 11 11:39:45 <muntyan>	point 1 is useful in any situation
 422 Jun 11 11:39:52 <muntyan>	but we need to know the algorythm first
 423 Jun 11 11:40:10 <paolo>	point 1 is "design the real-algorithm"
 424 Jun 11 11:40:41 <paolo>	we have discussed about an intuition, about guide-lines, we need to design the details
 425 Jun 11 11:40:45 <muntyan>	you are assuming that the algorithm is exactly the tree as you described it, and exactly doing what you said
 426 Jun 11 11:40:55 <muntyan>	we didn't decide we do it, did we?
 427 Jun 11 11:41:20 <paolo>	well, it seems to me we decided about two points:
 428 Jun 11 11:41:26 <paolo>	- using relative offset
 429 Jun 11 11:41:56 <paolo>	- using a "concrete syntax tree"
 430 Jun 11 11:42:17 <muntyan>	err, it seems you really decided to do it :)
 431 Jun 11 11:42:19 <paolo>	now we have to figure how the syntax tree must be modified when a modification happens
 432 Jun 11 11:42:46 <pbor>	well, the algorithm is asaics not much different from the current one: leaving relative offset aside, currently we put the modifications in a queue, then the idle runs peeks a modifications find the affected tree node and do the analyze/merge magic
 433 Jun 11 11:43:03 <pbor>	with the new idea we just dump the queue
 434 Jun 11 11:43:12 <paolo>	exactly
 435 Jun 11 11:43:17 <pbor>	do the 'find affected node' sync
 436 Jun 11 11:43:35 <pbor>	and the idle walks the tree and finds the first node that needs updating
 437 Jun 11 11:43:51 <paolo>	the queue becomes a list of "nodes" to be analyzed in a more detailed way
 438 Jun 11 11:44:10 <pbor>	well, we don't even need such queu IMHO
 439 Jun 11 11:44:18 <paolo>	may be no
 440 Jun 11 11:44:24 <pbor>	the idle handler can simply walk the tree
 441 Jun 11 11:44:56 <pbor>	and find the upmost node that needs analisys
 442 Jun 11 11:45:04 <pbor>	that is that is marked invalid
 443 Jun 11 11:45:47 <pbor>	'detaches' the old tree at that node and then do the analyze/merge pretty much as it does now
 444 Jun 11 11:46:38 <pbor>	the problem is that the current analyze algo use the modification 'delt'
 445 Jun 11 11:46:41 <pbor>	*delta
 446 Jun 11 11:47:03 <muntyan>	it uses delta only to update offsets, i think; it's not a problem
 447 Jun 11 11:47:12 <pbor>	okay even better
 448 Jun 11 11:47:21 <muntyan>	but merging won't magically work!
 449 Jun 11 11:47:30 <pbor>	no it won't
 450 Jun 11 11:47:32 <muntyan>	and that question is still open
 451 Jun 11 11:47:43 <muntyan>	do we want to give up after analyzing one batch?
 452 Jun 11 11:48:27 <paolo>	why is this so important?
 453 Jun 11 11:48:30 <pbor>	but as far as I can see the merging works at the moment, doesn't it (apart from the offsets screwed by the async queue)?
 454 Jun 11 11:48:47 <muntyan>	it does work now
 455 Jun 11 11:48:56 <muntyan>	and it will work
 456 Jun 11 11:49:10 <paolo>	how is this going to influence the algorithm we are trying to design? It seems to me an implementation detail we can evaluate later. Am I wrong?
 457 Jun 11 11:49:19 <muntyan>	yes
 458 Jun 11 11:49:51 <pbor>	if you are more confortable you can make it always try to merge
 459 Jun 11 11:49:56 <muntyan>	or no, but it's an extremely important detail
 460 Jun 11 11:50:04 <pbor>	and we can see how it goes
 461 Jun 11 11:50:05 <muntyan>	which takes half of the code
 462 Jun 11 11:50:20 <pbor>	well, not really as far as I can see
 463 Jun 11 11:50:28 <paolo>	sure it is important, but we can see it in a second step
 464 Jun 11 11:50:35 <muntyan>	pbor: the code is simple now because it does give up
 465 Jun 11 11:50:38 <muntyan>	giving up is easy
 466 Jun 11 11:50:45 <muntyan>	"simple"
 467 Jun 11 11:51:09 <paolo>	well, is giving after a batch up so costly?
 468 Jun 11 11:51:59 <muntyan>	it depends on how good the batch is
 469 Jun 11 11:52:23 <muntyan>	with single modification, giving up is right, assuming the tree is a good idea
 470 Jun 11 11:52:39 <muntyan>	i.e. with the tree as it is, it's too expensive not to give up
 471 Jun 11 11:52:46 <muntyan>	and in most cases it works fine
 472 Jun 11 11:52:59 <pbor>	well, giving up at some point seems like a correct euristic to me... if you can't merge after X lines the probability of merging later goes down
 473 Jun 11 11:53:08 <pbor>	what needs tweaking is X
 474 Jun 11 11:53:24 <pbor>	that is how hard try before giving up
 475 Jun 11 11:53:34 <muntyan>	multiple modifications break this euristic :)
 476 Jun 11 11:53:41 <pbor>	but that's just a knob in the Reader objectt
 477 Jun 11 11:54:09 <muntyan>	you are assuming that the modification is in the beginning of the batch, and there is nothing modified after it
 478 Jun 11 11:54:18 <muntyan>	it's the case for hand-editing
 479 Jun 11 11:54:52 <pbor>	if now we mark 'invalid' in the tree directly we should be able to be smarter
 480 Jun 11 11:54:53 <muntyan>	(though even in this case, you have #if 0 ... #endif on 1001 lines, and batch of 1000 line, and boom)
 481 Jun 11 11:55:18 <pbor>	we can see if two modifications are for indipendent branches of the tree
 482 Jun 11 11:55:50 <muntyan>	it doesn't matter, in such bad cases
 483 Jun 11 11:56:19 <muntyan>	thing is that you change one char, and it can modify arbitrarily big part of the buffer
 484 Jun 11 11:56:24 <paolo>	I don't think the algorithm for all seasons exists
 485 Jun 11 11:56:26 <muntyan>	it can kill all the contexts after it
 486 Jun 11 11:56:30 <paolo>	so there are trade off
 487 Jun 11 11:56:34 <muntyan>	it does exist
 488 Jun 11 11:56:43 <paolo>	we can start giving up after a batch
 489 Jun 11 11:56:50 <muntyan>	not having such a tree is such an algorithm :)
 490 Jun 11 11:56:52 <paolo>	and then implement a better algo in a second step
 491 Jun 11 11:57:38 <paolo>	I really don't see why your data structure is better
 492 Jun 11 11:57:54 <muntyan>	because merging is easy
 493 Jun 11 11:57:59 <muntyan>	nothing to merge
 494 Jun 11 11:58:10 <paolo>	why not?
 495 Jun 11 11:58:19 <muntyan>	because there are no nodes to merge
 496 Jun 11 11:58:22 <pbor>	a bit far fetched but it's something like DOM vs SAX, isn't it? :)
 497 Jun 11 11:58:28 <muntyan>	you don't operate with nodes, you operate with segments
 498 Jun 11 11:59:16 <paolo>	ok, but then you cannot use the info for code folding or other context sensitive analysis
 499 Jun 11 11:59:35 <muntyan>	why can't you?
 500 Jun 11 11:59:47 <muntyan>	the state machine is there, you have the tree of contexts
 501 Jun 11 12:00:03 <muntyan>	um, got it
 502 Jun 11 12:00:15 <muntyan>	yes, you need another tree for that
 503 Jun 11 12:00:24 <muntyan>	(though, say for code folding you need it anyway)
 504 Jun 11 12:00:58 <muntyan>	indeed, hacing a tree is good
 505 Jun 11 12:01:04 <muntyan>	but merging is awful
 506 Jun 11 12:01:34 <muntyan>	besides, does it play well with those context-sensitive things? (if we consider it as an argument, then we need to consider what happens when we invalidate nodes)
 507 Jun 11 12:01:52 <paolo>	I'm still not convinced about the need of merging, but may be I'm missing some details
 508 Jun 11 12:02:13 <muntyan>	okay, what happens when you insert a character?
 509 Jun 11 12:02:25 <muntyan>	you invalidate a node, what then?
 510 Jun 11 12:05:26 <pbor>	that's it, you invalidate the node instead of pushing the modification in the queue
 511 Jun 11 12:05:33 <paolo>	if it creates a new context, then invalidate all the following siblings
 512 Jun 11 12:05:35 <muntyan>	yes
 513 Jun 11 12:05:40 <muntyan>	what does the analyzer do
 514 Jun 11 12:06:00 <muntyan>	simple case
 515 Jun 11 12:06:06 <muntyan>	you are typing in:
 516 Jun 11 12:06:08 <muntyan>	"wefwef
 517 Jun 11 12:06:12 <muntyan>	and then you insert "
 518 Jun 11 12:06:14 <muntyan>	in a C file
 519 Jun 11 12:06:36 <muntyan>	you must merge stuff
 520 Jun 11 12:06:51 <paolo>	you invalidate the parent
 521 Jun 11 12:07:09 <paolo>	and then it will be re-analyzed in an async way
 522 Jun 11 12:07:12 <muntyan>	parent is NULL
 523 Jun 11 12:07:20 <muntyan>	so, how it will be reanalyzed?
 524 Jun 11 12:07:30 <muntyan>	okay, let me give more details
 525 Jun 11 12:07:33 <paolo>	no, parent cannot be null
 526 Jun 11 12:07:35 <muntyan>	you have a C file:
 527 Jun 11 12:07:36 <muntyan>	wait
 528 Jun 11 12:07:37 <muntyan>	so
 529 Jun 11 12:07:43 <muntyan>	---------------
 530 Jun 11 12:07:44 <muntyan>	"efwefwefwef
 531 Jun 11 12:07:51 <muntyan>	int a (void)
 532 Jun 11 12:07:52 <muntyan>	------------
 533 Jun 11 12:08:02 <muntyan>	and then you insert " after ewfwef
 534 Jun 11 12:08:17 <muntyan>	you invalidate the top top context, fine
 535 Jun 11 12:08:37 <muntyan>	what will idle_worker do after it closes the string context, what will it do on the next line?
 536 Jun 11 12:08:48 <muntyan>	you don't want to reanalyze the next line, there is nothing to reanalyze
 537 Jun 11 12:09:04 <muntyan>	so you need to merge what was before and what you have now
 538 Jun 11 12:10:30 <pbor>	in the idle handler you find the invalidated node in the tree and split the tree there,
 539 Jun 11 12:10:38 <pbor>	then you analize the first line
 540 Jun 11 12:10:42 <muntyan>	pbor: i am trying to convince we need to do it
 541 Jun 11 12:10:49 <pbor>	then you try to merge the split node
 542 Jun 11 12:10:50 <muntyan>	pbor: the splitting and merging
 543 Jun 11 12:10:57 <muntyan>	err, to convince paolo
 544 Jun 11 12:11:11 <pbor>	yes, we defintely need to split and merge
 545 Jun 11 12:11:53 <paolo>	if you add a ", you have to:
 546 Jun 11 12:12:08 <pbor>	'invalidating in the tree' is just a different way to tell where to split instead of using an offset
 547 Jun 11 12:12:12 <paolo>	- invalidate the parent node (toplevel root in this case)
 548 Jun 11 12:12:50 <paolo>	and analyze the part of the buffer associated with the node
 549 Jun 11 12:12:57 <paolo>	in this case the entire buffer
 550 Jun 11 12:13:06 <paolo>	hmm... no
 551 Jun 11 12:13:30 <paolo>	since in C with have string ending with end of line
 552 Jun 11 12:13:54 <paolo>	so you need to:
 553 Jun 11 12:14:01 <paolo>	- invalidate the root (as before)
 554 Jun 11 12:14:03 <muntyan>	right, and it could be anything ending anywhere, say after char "Z"
 555 Jun 11 12:14:23 <paolo>	- analyzing the node
 556 Jun 11 12:15:10 <paolo>	ok... I see what you mean with merging
 557 Jun 11 12:15:41 <paolo>	but I don't see why it is so difficult to achieve
 558 Jun 11 12:18:08 <muntyan>	because ten nodes go to trash, and you end up with node of depth 8 on the left, and node of depth 4 on the right, having different top-level ancestors
 559 Jun 11 12:18:19 <muntyan>	or vice versa
 560 Jun 11 12:18:24 <muntyan>	it's really nasty
 561 Jun 11 12:18:42 <muntyan>	oh, forgot about important detail
 562 Jun 11 12:18:58 <muntyan>	you also have no idea if you can merge nodes after ten next chars
 563 Jun 11 12:19:18 <muntyan>	i.e. you don't know if you need to throw away everything (give up) or look a bit further
 564 Jun 11 12:19:34 <muntyan>	dunno, it's very dirty business
 565 Jun 11 12:20:36 <muntyan>	besides, about folding
 566 Jun 11 12:20:51 <muntyan>	imagine you have a big function body, longer than a batch
 567 Jun 11 12:21:10 <paolo>	well, this is due to the fact you want to manage merging in a very smart way
 568 Jun 11 12:22:05 <paolo>	I think we should try to manage simple cases as special cases
 569 Jun 11 12:22:14 <muntyan>	good ppint
 570 Jun 11 12:22:16 <muntyan>	point
 571 Jun 11 12:22:19 <paolo>	and then use a generic brutal approach for the general case
 572 Jun 11 12:22:40 <muntyan>	still, it's not obvious to me that cases like " after C string or > in xml file are really easy
 573 Jun 11 12:22:52 *	muntyan draws
 574 Jun 11 12:22:52 <paolo>	I mean adding a single character is a special very frequent case
 575 Jun 11 12:22:58 <paolo>	other are less frequents
 576 Jun 11 12:23:06 <muntyan>	no!
 577 Jun 11 12:23:12 <muntyan>	in gedit, maybe
 578 Jun 11 12:23:19 <muntyan>	it
 579 Jun 11 12:23:29 <muntyan>	s very important to handle search/replace, for example
 580 Jun 11 12:23:40 <muntyan>	or my active strings, i like my active strings
 581 Jun 11 12:23:46 <muntyan>	hm, snippets?
 582 Jun 11 12:24:22 <paolo>	sure, but how frequently are the user using it? May be a more brutal generic approach is fast enough for that cases
 583 Jun 11 12:24:23 <muntyan>	err, "less" is true, but "less frequent" doesn't mean "not frequent"
 584 Jun 11 12:24:40 <muntyan>	paolo: often :)
 585 Jun 11 12:25:08 <pbor>	well, we really need a way to mark 'interactive' changes in a way different from 'one-char-lenght'...
 586 Jun 11 12:25:14 <paolo>	BTW, merging is an important part of the algorithm implementation
 587 Jun 11 12:25:21 <pbor>	but that's a separate issue IMHO
 588 Jun 11 12:25:26 <paolo>	I still would like to see "the" algorithm
 589 Jun 11 12:26:04 <muntyan>	i would LOVE to see it :)
 590 Jun 11 12:26:34 <muntyan>	so, is it the final decision to have the tree as you described it?
 591 Jun 11 12:29:20 <paolo>	the final decision is up to you too
 592 Jun 11 12:29:33 <paolo>	you are the guy, I'm only the "observer" :)
 593 Jun 11 12:29:35 <muntyan>	well...
 594 Jun 11 12:29:40 <muntyan>	you are the boss :)
 595 Jun 11 12:29:51 <paolo>	hehe
 596 Jun 11 12:29:59 <muntyan>	i still do not see why "tree" is better than "no tree"
 597 Jun 11 12:30:11 <muntyan>	historical and "no new stuff" arguments do not count
 598 Jun 11 12:30:32 <muntyan>	the folding and such are good arguments, but it's not clear why the tree will be *really* good for those
 599 Jun 11 12:30:59 <paolo>	since you can decide to fold at the "node" level
 600 Jun 11 12:31:06 <muntyan>	note, i am not saying that my idea is better. if i thought so, i'd say it. but i want to know for sure what i am doing
 601 Jun 11 12:31:17 <muntyan>	paolo: you can do it with no-tree too
 602 Jun 11 12:31:23 <muntyan>	paolo: since the tree is still there
 603 Jun 11 12:31:34 <muntyan>	paolo: and, it's not clear how folding will play with modifications
 604 Jun 11 12:31:38 <paolo>	exactly, this is why I don't see no-tree is better
 605 Jun 11 12:31:39 <muntyan>	in either case
 606 Jun 11 12:31:53 <paolo>	you have the same info in a different, IMO less clean, way
 607 Jun 11 12:32:06 <muntyan>	i'd say it is more clean
 608 Jun 11 12:32:18 <muntyan>	because you don't mix syntax and text segments
 609 Jun 11 12:32:29 <muntyan>	text segments know about syntax, but syntax tree doesn't
 610 Jun 11 12:32:32 <muntyan>	and it doesn't need to
 611 Jun 11 12:32:41 <muntyan>	i.e. it depends on point of view :0
 612 Jun 11 12:32:50 <muntyan>	main thing is that merging is easy with no-tree
 613 Jun 11 12:33:05 <paolo>	why?
 614 Jun 11 12:33:18 <paolo>	explain the above example with your data structure
 615 Jun 11 12:33:44 <muntyan>	you insert a char, you mark line as invalid
 616 Jun 11 12:33:49 <muntyan>	then, you reanalyze the line
 617 Jun 11 12:34:06 <muntyan>	then, if the expected-on-next-line context is the same as the actual one, you stop
 618 Jun 11 12:34:17 <paolo>	and what does it happen if the context span multiple lines?
 619 Jun 11 12:34:28 <muntyan>	same thing, you invalidate next line
 620 Jun 11 12:34:36 <muntyan>	until you have won
 621 Jun 11 12:34:54 <paolo>	honestly, I don't see why it is different
 622 Jun 11 12:35:07 <muntyan>	merging is extremely easy
 623 Jun 11 12:35:14 <muntyan>	no splitting, no merging
 624 Jun 11 12:35:21 <paolo>	actually I think it is less efficient since you always invalidate the entire line
 625 Jun 11 12:35:28 <muntyan>	paolo: you always have to
 626 Jun 11 12:35:41 <muntyan>	in either approach
 627 Jun 11 12:35:55 <muntyan>	you can optimize it by looking at regexes, also with either approach
 628 Jun 11 12:35:58 <paolo>	why?
 629 Jun 11 12:36:02 <paolo>	suppose you have
 630 Jun 11 12:36:04 <muntyan>	there is look-ahead and look-behind
 631 Jun 11 12:36:10 <paolo>	 /* ciao
 632 Jun 11 12:36:13 <muntyan>	you always have to reanalyze whole line
 633 Jun 11 12:36:13 <paolo>	ciao ciao
 634 Jun 11 12:36:45 <paolo>	 */ ciao /  ciao */
 635 Jun 11 12:36:54 <paolo>	if you add a * after /
 636 Jun 11 12:37:02 <paolo>	you don't have to invalidate the entire line
 637 Jun 11 12:37:18 <muntyan>	only because *you* *know* C syntax
 638 Jun 11 12:37:28 <muntyan>	in general case you must analyze whole line
 639 Jun 11 12:37:39 <paolo>	because you are using a tree
 640 Jun 11 12:37:46 <muntyan>	(and it can be optimized, by looking at children, regexes and whatnot)
 641 Jun 11 12:37:48 <muntyan>	paolo: nope
 642 Jun 11 12:37:50 <paolo>	and so you know the syntax
 643 Jun 11 12:38:13 <muntyan>	i know the syntax with my approach, i have the same information as with the tree
 644 Jun 11 12:39:03 <muntyan>	what i meant by "you know" is "you, paolo, know"
 645 Jun 11 12:39:16 <muntyan>	you know that the C syntax doesn't have fancy things in that case
 646 Jun 11 12:39:28 <muntyan>	but in general, one char can change *everything*
 647 Jun 11 12:39:32 <muntyan>	whole buffer
 648 Jun 11 12:39:39 <muntyan>	(in weird cases :)
 649 Jun 11 12:40:21 <muntyan>	you have to analyze the line (note that "analyze" doesn't mean "remove and apply tags", it means "check what's already there")
 650 Jun 11 12:40:49 <paolo>	anyway, I don't see why it is different
 651 Jun 11 12:40:50 <muntyan>	say, shebang line
 652 Jun 11 12:40:52 <muntyan>	#!/eiohfierwog
 653 Jun 11 12:40:57 <muntyan>	! changes whole line
 654 Jun 11 12:41:09 <muntyan>	paolo: the difference is in the merging process
 655 Jun 11 12:41:17 <muntyan>	paolo: with my approach, no merging process
 656 Jun 11 12:41:30 <muntyan>	paolo: with the tree, it's fairly complicated
 657 Jun 11 12:41:58 <muntyan>	ask pbor :)
 658 Jun 11 12:42:42 <paolo>	ATM you invalidate the entire line, ok
 659 Jun 11 12:42:52 <paolo>	why can't you do the same with the tree?
 660 Jun 11 12:43:00 <paolo>	you invalidate all the nodes of the line
 661 Jun 11 12:43:06 <muntyan>	invalidating is easy
 662 Jun 11 12:43:11 <muntyan>	merging is hard
 663 Jun 11 12:43:11 <paolo>	you have the same effect
 664 Jun 11 12:43:38 <paolo>	does you code perform merging?
 665 Jun 11 12:43:59 <muntyan>	it doesn't have to
 666 Jun 11 12:44:16 <muntyan>	it checks whether expected next-line context matches one which is already theer
 667 Jun 11 12:44:17 <muntyan>	there
 668 Jun 11 12:44:27 <muntyan>	if it matches, we are done; if not, analyze next line
 669 Jun 11 12:44:31 <paolo>	why you cannot do it with the tree?
 670 Jun 11 12:45:12 <muntyan>	you can, actually
 671 Jun 11 12:45:18 <muntyan>	but, tree doesn't know about lines
 672 Jun 11 12:45:20 <paolo>	if next sibling matches you are done, if not give invalidate it, and analyze next one
 673 Jun 11 12:46:35 <muntyan>	let me smoke a bit, looks like everything is fine, and i really can throw away the merging part
 674 Jun 11 12:47:08 <paolo>	argh.... smoke is dangerous :)
 675 Jun 11 12:47:19 <muntyan>	yeah, it kills
 676 Jun 11 12:47:43 <paolo>	I hope not before the end of SoC ;)
 677 Jun 11 12:47:55 <muntyan>	hm, if we split all contexts on line breaks, it should work
 678 Jun 11 12:48:40 <muntyan>	but how to do that? do we need line breaks?
 679 Jun 11 12:48:51 <muntyan>	the merging part is what is really killing me
 680 Jun 11 12:48:57 <muntyan>	i hate it, and it hates me
 681 Jun 11 12:49:26 <paolo>	I don't see why you are so "bind" to line breaks... they are chars like the other ones
 682 Jun 11 12:49:39 <muntyan>	no, splitting contexts on line breaks may do all this stuff every inefficient
 683 Jun 11 12:49:45 <paolo>	we you need to split context at line breaks if not needed
 684 Jun 11 12:49:45 <muntyan>	paolo: they are not
 685 Jun 11 12:49:53 <muntyan>	paolo: "\r\n" vs "\n"
 686 Jun 11 12:50:18 <muntyan>	line breaks are very special places in the buffer, unfortunately
 687 Jun 11 12:50:30 <paolo>	why?
 688 Jun 11 12:50:37 <muntyan>	"\r\n" vs "\n"
 689 Jun 11 12:50:42 <muntyan>	they are not chars
 690 Jun 11 12:50:46 <muntyan>	they are special places
 691 Jun 11 12:51:07 <paolo>	ok, but they are not special from the "syntax hl" point of view
 692 Jun 11 12:51:09 <muntyan>	but why do i want to split contexts?
 693 Jun 11 12:51:14 <muntyan>	paolo: they are
 694 Jun 11 12:51:21 <paolo>	why?
 695 Jun 11 12:51:35 <muntyan>	paolo: every normal language knows what "line" is, and doesn't care about line terminators
 696 Jun 11 12:51:41 <muntyan>	or what do you mean?
 697 Jun 11 12:51:52 <paolo>	they can end some kind of contexts
 698 Jun 11 12:52:31 <muntyan>	^blah rules are not rare too
 699 Jun 11 12:52:35 <paolo>	but for other contexts they have no importance and so you don't need to threat them in a special case in those contexts
 700 Jun 11 12:52:44 <muntyan>	anyway, thing is that line end is not a char
 701 Jun 11 12:52:51 <paolo>	think to "multiline C comments"
 702 Jun 11 12:53:16 <paolo>	ok, they are places
 703 Jun 11 12:53:20 <muntyan>	yes, it's not right to split everything
 704 Jun 11 12:53:32 <paolo>	but their importance is related to the current context
 705 Jun 11 12:53:48 <paolo>	so there is no need to always split
 706 Jun 11 12:54:01 <pbor>	(sorry, /me reads the backlog)
 707 Jun 11 12:54:02 <paolo>	you split only if the current context can end with a line break
 708 Jun 11 12:54:23 *	paolo is trying to convert muntyan to the word of trees :)
 709 Jun 11 12:54:49 <paolo>	he will become a naturalist before the end of the project :)
 710 Jun 11 12:54:55 <muntyan>	my research area is groups of tree automorphisms :)
 711 Jun 11 12:55:07 <muntyan>	finite automata, and stuff
 712 Jun 11 12:55:24 <muntyan>	anyway, i don't see now why no-tree is better than tree
 713 Jun 11 12:55:28 <paolo>	so, why I'm speaking about
 714 Jun 11 12:55:37 <muntyan>	so i guess i agree with those points
 715 Jun 11 12:55:40 <paolo>	you should use trees everywhere 
 716 Jun 11 12:56:01 <muntyan>	well, my no-tree approach involves having a BTree :)
 717 Jun 11 12:56:08 <muntyan>	trees are everywhere :)
 718 Jun 11 12:57:42 <paolo>	to summarize (my wife called, so I need to go home, in order to avoid physical damages :)
 719 Jun 11 12:58:19 <paolo>	- we are going to use a tree with relative offset
 720 Jun 11 12:58:39 <paolo>	- probably merging is not so difficult as we thought
 721 Jun 11 12:59:01 <paolo>	- splitting should be not a problem
 722 Jun 11 12:59:50 <paolo>	- queue will not contain the buffer modifications, but (if we really need a queue) the invalidated nodes
 723 Jun 11 13:00:06 <muntyan>	i think queue is not needed at all
 724 Jun 11 13:00:27 <muntyan>	we can have two flags - invalid, and some-child-is-invalid
 725 Jun 11 13:00:31 <muntyan>	and then just walk the tree
 726 Jun 11 13:00:48 <muntyan>	and maybe some first-invalid link, or index, or something
 727 Jun 11 13:00:54 *	muntyan hates the queue :)
 728 Jun 11 13:01:40 <paolo>	ok
 729 Jun 11 13:03:15 <muntyan>	there is one problem with lines, but it desrves an email, i think
 730 Jun 11 13:03:26 <muntyan>	in any case it's not quick, and doesn't change things to do now
 731 Jun 11 13:04:15 <paolo>	muntyan: can you please write down a little summary of the discussion we had today and save the log on l.g.o for future reference
 732 Jun 11 13:04:34 <paolo>	at least the conclusions
 733 Jun 11 13:04:46 <muntyan>	i am not a good writer...
 734 Jun 11 13:04:48 <paolo>	and send it by mail to me, pbor and barisione
 735 Jun 11 13:04:52 <muntyan>	ok
 736 Jun 11 13:04:52 <paolo>	a few lines
 737 Jun 11 13:05:36 <paolo>	them you can start designing/implementing the algorithm
 738 Jun 11 13:06:30 <muntyan>	ok
 739 Jun 11 13:07:35 <muntyan>	err, can we close the meeting? i'm afraid there will be two injured...
 740 Jun 11 13:08:08 <paolo>	yes
 741 Jun 11 13:08:46 <muntyan>	paolo, pbor, see you later then
 742 Jun 11 13:08:54 *	paolo didn't backup his notebook again
 743 Jun 11 13:09:03 <pbor>	later
 744 Jun 11 13:09:10 <paolo>	bye
 745 Jun 11 13:09:10 *	pbor needs to go too
 746 Jun 11 13:09:17 *	paolo has quit (Ex-Chat)
 747 Jun 11 13:10:00 <muntyan>	pbor: four hours!!!
 748 **** ENDING LOGGING AT Sun Jun 11 13:11:19 2006

Attached Files

To refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.
  • [get | view] (2021-02-25 09:59:57, 17.2 KB) [[attachment:2006-06-07.txt]]
  • [get | view] (2021-02-25 09:59:57, 55.0 KB) [[attachment:2006.06.11.txt]]
  • [get | view] (2021-02-25 09:59:57, 12.7 KB) [[attachment:2006.08.18-pbor-paolo.txt]]
  • [get | view] (2021-02-25 09:59:57, 9.1 KB) [[attachment:2006.08.18-pbor.txt]]
 All files | Selected Files: delete move to page copy to page

You are not allowed to attach a file to this page.