Attachment '2006.06.11.txt'
Download 1 **** BEGIN LOGGING AT Sun Jun 11 09:03:03 2006
2
3 Jun 11 09:03:03 * Now talking on #gsv
4 Jun 11 09:03:07 <muntyan> hi there
5 Jun 11 09:03:08 <paolo> hi
6 Jun 11 09:03:18 <paolo> I'd like to speak a bit about SoC
7 Jun 11 09:03:29 <muntyan> that's cool :)
8 Jun 11 09:03:35 <paolo> can you update me on the progresses?
9 Jun 11 09:03:48 <paolo> so we can take some decision about next steps
10 Jun 11 09:04:29 <pbor> hi muntyan
11 Jun 11 09:04:59 <muntyan> well, no progress really
12 Jun 11 09:05:40 <muntyan> i didn't have time yet to do what we decided to do
13 Jun 11 09:05:45 <paolo> hmm... plans, thoughts, etc.
14 Jun 11 09:05:56 <muntyan> the sorting + processing changes that fall into the batch
15 Jun 11 09:06:12 <muntyan> um, did you read our conversation with pbor and barisione?
16 Jun 11 09:06:19 <paolo> no
17 Jun 11 09:06:23 <muntyan> oh, ok
18 Jun 11 09:06:27 <paolo> I'm going to read it now
19 Jun 11 09:06:35 <muntyan> then, the plan and stuff is:
20 Jun 11 09:06:39 <paolo> pbor: is it already on l.g.o?
21 Jun 11 09:07:23 <pbor> attaching
22 Jun 11 09:07:28 <muntyan> there is a problem with highlighting which seems (seems, nobody knows really why it happens) to occur because sorting the modifications queue is not enough
23 Jun 11 09:07:38 <muntyan> so the plan is:
24 Jun 11 09:07:59 <pbor> http://live.gnome.org/GtkSourceView/SummerOfCode2006?action=AttachFile&do=view&target=2006-06-07.txt
25 Jun 11 09:08:00 <paolo> is it me or l.g.o has broken CSS today?
26 Jun 11 09:08:24 <muntyan> make the function which processes a batch take all the modifications from queue which intersect with the batch
27 Jun 11 09:08:37 <muntyan> then see what happens - whether the problem is fixed or not
28 Jun 11 09:09:03 <muntyan> if the problem is fixed, then it means highlighting basically works (hopefully), and we can think of speed issues and whatnot
29 Jun 11 09:09:33 <muntyan> i guess it wasn't really clear :)
30 Jun 11 09:09:43 <muntyan> paolo: how much do you know about how the engine works?
31 Jun 11 09:10:02 <paolo> little, so I need details to understand the problems
32 Jun 11 09:10:25 * paolo is reading the old IRC log
33 Jun 11 09:10:25 <muntyan> okay, i'll try to explain it
34 Jun 11 09:11:01 <muntyan> what engine does is: 1) it analyzes the buffer from the beginning to the end; 2) it updates syntax after buffer changes
35 Jun 11 09:11:17 <muntyan> 1) is done by processing batch by batch, in an obvius way
36 Jun 11 09:11:21 <muntyan> now, about modifications:
37 Jun 11 09:12:01 <muntyan> when text is inserted or deleted, engine records the modification as offset where it happened, and the length of text inserted/removed
38 Jun 11 09:12:11 <muntyan> then it puts this modification into the queue
39 Jun 11 09:12:47 <muntyan> then it queues updating (or processes modification immediately, but it's irrelevant)
40 Jun 11 09:13:30 <muntyan> idle_worker (for simplicity assume it's always idle_worker who does the analysis and everything) does the following:
41 Jun 11 09:14:06 <muntyan> it peeks the first modification from the queue, and updates syntax in a batch which includes changed region
42 Jun 11 09:14:34 <muntyan> if it can't merge old and new syntax regions in this one batch, it analyzes the remainder of buffer as it does it with new buffer
43 Jun 11 09:15:01 <muntyan> if there are no modifications in the queue, it processes not analyzed remainder of the buffer
44 Jun 11 09:15:14 <muntyan> so, there are bunch of problems here:
45 Jun 11 09:16:28 <muntyan> 1) it needs to sort modifications and update them. for example, if you delete text at position 100, and then delete text at position 10, then the at-pos-10 modification must be processed first, and the second modification should have offset corrected (it was 100, but after second deletion it should be 100-(how many chars deleted))
46 Jun 11 09:17:29 <muntyan> 2) what seems to be the problem now: when engine processes a modification, it takes a big batch, and potentially it changes syntax in whole batch, so the changed area may include modifications which are still in the queue
47 Jun 11 09:17:54 <muntyan> the plan for the nearest future is to fix 2) and see what happen
48 Jun 11 09:17:59 <muntyan> the end.
49 Jun 11 09:18:00 <muntyan> :)
50 Jun 11 09:18:37 <paolo> first question: how are batches computed? What is a batch in this context?
51 Jun 11 09:18:46 <muntyan> 4000 chars, roughly
52 Jun 11 09:18:56 <paolo> ok
53 Jun 11 09:19:05 <muntyan> it calculates the length according to how fast it processed previous batches, but that's details
54 Jun 11 09:19:25 <paolo> yep
55 Jun 11 09:20:33 <paolo> you said the whole buffer is analyzed, batch by batch? Is this a sync operation?
56 Jun 11 09:20:51 <muntyan> no, idle_worker processes one batch
57 Jun 11 09:21:11 <paolo> what does it happens if a change is queued before the whole buffer has been analyzed for the first time?
58 Jun 11 09:21:37 <muntyan> it keeps worker_last_offset - how much of the buffer has been analyzed
59 Jun 11 09:21:49 <muntyan> if the modification is after last_offset, it's just ignored
60 Jun 11 09:22:09 <paolo> what does it happen if I'm going to "show" a part of the buffer that still need to be analyzed?
61 Jun 11 09:22:54 <muntyan> if it's before, then the modification is processed; if it could merge old and new in one batch, it processes the region after last_offset as usual (in next idle run); if not, then it sets last_offset to the end of the batch, and then proceeds as usual
62 Jun 11 09:23:50 <muntyan> when part of buffer is exposed, then it's asked to apply tags in the exposed area; if it's not analyzed yet, there are no tags
63 Jun 11 09:24:48 <muntyan> i looked at the sources, and:
64 Jun 11 09:25:44 <muntyan> it checks last_offset. if exposed area is before last_offset, it applied tags; if not, it records the region to apply tags later, then it will be analyzed
65 Jun 11 09:26:25 <pbor> yes, the code there is pretty much what current sourceview does as far as I can tell
66 Jun 11 09:26:39 <pbor> (except for some details I don't like)
67 Jun 11 09:26:44 <muntyan> also, when it updates syntax somewhere, it emits signal so that view can queue redraw in the updated area; then expose will ask to apply correct tags
68 Jun 11 09:26:49 <muntyan> seems to be correct
69 Jun 11 09:26:54 <paolo> how fast is the first analysis of the whole buffer (for example on a 10000 lines C file)
70 Jun 11 09:27:44 <muntyan> good question
71 Jun 11 09:28:08 <paolo> worker_last_offset is the last analyzed offset, right?
72 Jun 11 09:28:19 <pbor> yep
73 Jun 11 09:29:01 <muntyan> it looks "fast enough", i didn't measure it
74 Jun 11 09:29:11 <pbor> the analisys of the whole doc on open seems of acceptable performance to me too
75 Jun 11 09:30:10 <pbor> one of the issues with the queue instead is that if you have something like d'n'd, an expose event forces to apply tags between the remove and the add part of the dnd
76 Jun 11 09:30:21 <pbor> which leads to flicker
77 Jun 11 09:30:41 <paolo> I was thinking about making a very fast first analysis that only splits the document in first level contexts and the re-analyze on demands the current context before displaying it to apply the correct tags
78 Jun 11 09:30:44 <muntyan> pbor: no, if you process more than one modification at a time
79 Jun 11 09:30:57 <muntyan> this is what's needed to be done
80 Jun 11 09:31:19 <pbor> yes, I am just exposing one of the issues there are now
81 Jun 11 09:31:26 <pbor> didn't say it's unfixable
82 Jun 11 09:31:27 <pbor> :)
83 Jun 11 09:31:40 <muntyan> paolo: that would work only if first-level contexts don't need children to know where they end
84 Jun 11 09:31:49 <muntyan> this is one of things i wanted to talk about
85 Jun 11 09:32:23 <paolo> hmm... what do you mean?
86 Jun 11 09:33:36 <muntyan> 1) child context may decide to terminate its parent
87 Jun 11 09:34:09 <muntyan> 2) it seems to be nicer if many rules may decide when to terminate context
88 Jun 11 09:34:28 <muntyan> at the moment contexts have start and end conditions (regexes or chars, or something)
89 Jun 11 09:34:39 <muntyan> i think they should have more than that
90 Jun 11 09:34:54 <muntyan> e.g. here i have a lang file which has #pop#pop
91 Jun 11 09:35:12 <muntyan> it means it terminates itself and parent
92 Jun 11 09:35:19 <muntyan> did you understand something?
93 Jun 11 09:36:04 <paolo> yep, but can you report a specific use case for this?
94 Jun 11 09:36:29 <muntyan> <!-- CaseIn is called when the construct 'case ... in' has been found. -->
95 Jun 11 09:36:29 <muntyan> <context style="Normal Text" eol-context="#stay" name="CaseIn">
96 Jun 11 09:36:29 <muntyan> <Regex style="Keyword" context="#pop#pop" pattern="\besac(?=$|[\s;)])" endRegion="case" />
97 Jun 11 09:36:47 <muntyan> some crazy thing from kate
98 Jun 11 09:36:51 <muntyan> 's bash lang file
99 Jun 11 09:38:03 <paolo> Is it so important to support this feature? Are there other ways to support the same feature?
100 Jun 11 09:38:25 <muntyan> i have no idea
101 Jun 11 09:38:37 <paolo> the .lang files by barisione support this feature?
102 Jun 11 09:38:44 <muntyan> but i don't like that contexts have end attribute
103 Jun 11 09:38:52 <muntyan> paolo: the engine doesn't know about such a thing
104 Jun 11 09:39:10 <muntyan> context has end which is defined by regex, and that's it
105 Jun 11 09:39:46 <muntyan> besides, what happens when context and its child are both terminated by same thing, e.g. end of line?
106 Jun 11 09:40:16 <muntyan> like
107 Jun 11 09:40:19 <muntyan> #ewfwe //ewfwef
108 Jun 11 09:40:22 <muntyan> in a C file
109 Jun 11 09:40:45 <muntyan> (not with gtksourceview's c.lang, it doesn't know that preprocessor directive is whole line, not only word after #)
110 Jun 11 09:40:54 <paolo> it seems the .lang files by barisione do not require an "end" attribute for contexts
111 Jun 11 09:41:16 <muntyan> they don't reuiqre it for "simple" contexts
112 Jun 11 09:41:30 <muntyan> those simple contexts are just regions matched by certain regex
113 Jun 11 09:41:49 <muntyan> but container contexts do require end attribute
114 Jun 11 09:42:00 <muntyan> (at least in my reading code and lang files :)
115 Jun 11 09:42:24 <paolo> ok
116 Jun 11 09:43:00 <paolo> (this is a question to ask to barisione, please collect all the open questions in a mail to send later)
117 Jun 11 09:43:18 <muntyan> ok
118 Jun 11 09:43:21 <pbor> mmm... isn't the end regex of a container context done by ORing all the possible ends of the child context which terminate the parent?
119 Jun 11 09:44:09 <paolo> no
120 Jun 11 09:44:15 <muntyan> it can't work, can it?
121 Jun 11 09:44:33 <muntyan> you can't just match anything, contexts may jump like crazy
122 Jun 11 09:44:33 <paolo> well, yes
123 Jun 11 09:44:58 <muntyan> for this to work you need *everything* be controlled by end regexes
124 Jun 11 09:45:05 <muntyan> e.g. ends-at-line-end breaks it
125 Jun 11 09:45:07 <muntyan> i think
126 Jun 11 09:45:09 <paolo> but you need to use a stack of context also during the fast analysis
127 Jun 11 09:45:55 <paolo> ends-at-line-end it is one of the ORed possible ends
128 Jun 11 09:46:24 <muntyan> but child may opt not to terminate there
129 Jun 11 09:46:28 <muntyan> while parent may temrinate
130 Jun 11 09:46:36 <muntyan> it's how trailing \ is done
131 Jun 11 09:46:39 <paolo> ok
132 Jun 11 09:47:12 <muntyan> the parent (preprocessor) ends at line-end, but it has child context that consumes end of line, so the parent won't see it
133 Jun 11 09:47:14 <paolo> we can analyse this problem later
134 Jun 11 09:47:24 <paolo> let us return to the current problems
135 Jun 11 09:47:39 <paolo> it seems to me the mother of all the problems is the queue of changes
136 Jun 11 09:47:50 <muntyan> (regular expressions can encapsulate arbitrarily complex thing, but noone will understand it)
137 Jun 11 09:47:54 <paolo> I have a question... do we really need it?
138 Jun 11 09:48:06 <muntyan> well, yes if we don't want to touch contexts tree
139 Jun 11 09:48:17 <muntyan> context tree is the mother of the problems
140 Jun 11 09:48:29 <muntyan> it keeps offsets in the nodes
141 Jun 11 09:48:31 <paolo> what is the "context tree"?
142 Jun 11 09:48:39 <paolo> ok
143 Jun 11 09:48:46 <paolo> stupid question
144 Jun 11 09:48:56 <muntyan> not really stupid
145 Jun 11 09:48:59 <paolo> so the problem is not the context tree per se
146 Jun 11 09:49:25 <muntyan> well, it depends on how you look at it
147 Jun 11 09:49:29 <muntyan> let me explain
148 Jun 11 09:49:40 <muntyan> there are two things in here:
149 Jun 11 09:50:08 <muntyan> 1) the 'state machine', the tree of contexts which represents how contexts may be there, how parents include children and such
150 Jun 11 09:50:42 <muntyan> 2) the sequence of segments of text, where each segment corresponds to some state of that state machine, or to some node of the contexts tree
151 Jun 11 09:50:58 <muntyan> in the engine these two are one thing, the "contexts tree"
152 Jun 11 09:51:25 <muntyan> if it's right, then the queue is the problem, but it can't be removed
153 Jun 11 09:51:36 <muntyan> if it's not right, then the contexts tree is the problem
154 Jun 11 09:51:37 <muntyan> :)
155 Jun 11 09:52:54 <muntyan> the *real* problem is that the engine has no idea about the fact that buffer can be modified many times and in any order; the rest is implementation details ;)
156 Jun 11 09:53:11 <muntyan> and we are free to solve it in any way we like
157 Jun 11 09:53:31 <muntyan> (saying this just to make sure noone is afraid of removeing queue or something)
158 Jun 11 09:54:09 <paolo> well, it seems to me we need a contexts tree (or syntax tree)
159 Jun 11 09:54:17 <paolo> do all agree on this?
160 Jun 11 09:55:11 <paolo> and if I have understand the problem, the current algorithms does not support multiple changes
161 Jun 11 09:55:25 <paolo> but only single changes, right?
162 Jun 11 09:55:46 <pbor> well, I guess there are other data structures which could hold the syntax informations
163 Jun 11 09:56:10 <muntyan> we do need one tree, for sure
164 Jun 11 09:56:16 <pbor> in particular all the regex work is based on analyzing lines
165 Jun 11 09:56:28 <pbor> while the context tree has no idea about lines
166 Jun 11 09:56:35 <muntyan> but we don't necessarily need to have syntax segments in the contexts tree
167 Jun 11 09:57:13 <muntyan> and yes, current algorithms do not really support multiple changes
168 Jun 11 09:57:39 <paolo> ok, explain me how do you see the prefect context tree
169 Jun 11 09:58:01 <paolo> I mean we have some code, it is not working for several reasons
170 Jun 11 09:58:07 <muntyan> i wish i knew what perfect context tree is :)
171 Jun 11 09:58:14 <muntyan> i guess i'd start implementing it
172 Jun 11 09:58:19 <paolo> there are not millions of lines
173 Jun 11 09:58:34 <paolo> so we can rewrite part of it without too much worrying
174 Jun 11 09:58:47 <muntyan> i can describe what i have in my engine:
175 Jun 11 09:58:52 <paolo> ok
176 Jun 11 09:59:21 <muntyan> 1) the contexts tree, which is just a tree of contexts and their children, nodes are added when corresponding context is met in the buffer
177 Jun 11 09:59:22 <paolo> may be explain also how it is different from the gsv one
178 Jun 11 10:00:06 <muntyan> 2) the text part is kept in a separate structure: each line has list of segments, each segment keeps pointer to a node of contexts tree
179 Jun 11 10:00:45 <muntyan> this makes it easy to invalidate parts of buffer - just erase segments on the lines
180 Jun 11 10:01:06 <muntyan> it's still not convenient for modifications inside of the line though
181 Jun 11 10:01:20 <muntyan> err, this is a bit not what i am describing :)
182 Jun 11 10:01:52 <paolo> well, if you invalidate part of the buffer you also need to remove nodes from the tree, right?
183 Jun 11 10:01:54 <muntyan> so, the main thing is: contexts are the states in the state machine; segments in texts are segments which "are in this state"
184 Jun 11 10:02:01 <muntyan> paolo: no
185 Jun 11 10:02:08 <paolo> why not?
186 Jun 11 10:02:18 <muntyan> i have one node for top-level comments, for example
187 Jun 11 10:02:29 <muntyan> regardless of how many comments are in the buffer
188 Jun 11 10:02:46 <muntyan> i have many text segments that point to "top-level C comment" node
189 Jun 11 10:03:17 <pbor> basically your tree describes the possible state transitions for a language
190 Jun 11 10:04:01 <muntyan> right
191 Jun 11 10:04:17 <muntyan> (not all possible, only those that actually occur in the text)
192 Jun 11 10:04:25 <paolo> it contains a subset of the state machine for the specific language
193 Jun 11 10:04:31 <muntyan> right
194 Jun 11 10:05:01 <muntyan> what's good in this (or bad in gtksourceview's) is the following:
195 Jun 11 10:05:17 <muntyan> in mine, i can merge two segments simply by comparing the nodes
196 Jun 11 10:05:32 <muntyan> in gtksourceview's, it needs nodes to be the same with all parents
197 Jun 11 10:06:17 <muntyan> not sure it's really very good, but it is simple
198 Jun 11 10:06:54 <muntyan> there is a potential problem in gtksourceview, related to this:
199 Jun 11 10:07:24 <muntyan> when text is modified, the engine takes a batch of text, and reanalyzes it, trying to merge old and new nodes
200 Jun 11 10:07:44 <muntyan> if it can't do it in this batch, it simply reanalyzes whole remainder of the buffer
201 Jun 11 10:08:01 <muntyan> even if contexts can be merged in the next batch
202 Jun 11 10:08:23 <paolo> hmmm... actually I think we need what is normally called a "concrete syntax tree"
203 Jun 11 10:08:32 <muntyan> what's that?
204 Jun 11 10:08:36 <paolo> see http://en.wikipedia.org/wiki/Concrete_syntax_tree
205 Jun 11 10:09:24 <muntyan> it looks like the tree in gtksourcecontextengine
206 Jun 11 10:09:39 <pbor> yep, that's what the context tree is right now
207 Jun 11 10:09:50 <muntyan> the problem is that it's not modification-friendly
208 Jun 11 10:10:11 <pbor> that is a tree that represents the current state of the whole buffer
209 Jun 11 10:10:12 <muntyan> just coloring is only half of work; we need to recolor as well :)
210 Jun 11 10:11:45 <pbor> building a tree like muntyan's one would also give us the chance to share it across files
211 Jun 11 10:11:48 <muntyan> hm, does colorer support modifications?
212 Jun 11 10:11:56 <pbor> that is one tree for all C files open
213 Jun 11 10:12:12 <pbor> muntyan: yes, it's the engine used in eclipse I think
214 Jun 11 10:13:10 <paolo> colorer should support modifications
215 Jun 11 10:13:14 <paolo> it is not used in eclipse
216 Jun 11 10:13:44 <paolo> when barisione started his work we planned to rewrite colorer in a glib friendly way
217 Jun 11 10:14:01 <paolo> but it seems it was not a good solution
218 Jun 11 10:14:08 <paolo> I don't remember the details
219 Jun 11 10:15:19 <pbor> paolo: http://colorer.sourceforge.net/sshots/e2.png
220 Jun 11 10:15:30 <pbor> paolo: though maybe it's just an additional plugin
221 Jun 11 10:15:39 <paolo> there is a plugin for ecplise based on colorer
222 Jun 11 10:17:03 <pbor> muntyan: can you tell us more about the other data structure you use? that is the one that holds the segments...
223 Jun 11 10:17:22 <paolo> it is not clear to me why a tree like the one I said should not work
224 Jun 11 10:17:28 <pbor> muntyan: I know it's a btree, but I don't totally grok why
225 Jun 11 10:17:30 <paolo> ?
226 Jun 11 10:17:42 <muntyan> paolo: it's already there, and it doesn't work :)
227 Jun 11 10:17:55 <muntyan> paolo: the tree itself will work, why not? but it needs some stuff to support it
228 Jun 11 10:18:10 <paolo> yep, I agree on this
229 Jun 11 10:18:36 <paolo> well, it could also work without additional data structures
230 Jun 11 10:18:40 <muntyan> pbor: btree because it has easy insertion and deletion
231 Jun 11 10:18:50 <muntyan> paolo: how to deal woth modifications?
232 Jun 11 10:18:54 <muntyan> wih
233 Jun 11 10:18:56 <muntyan> with
234 Jun 11 10:19:22 <paolo> invalidating part of the tree when a modification happens
235 Jun 11 10:19:37 <muntyan> don't forget about tags
236 Jun 11 10:19:54 <muntyan> 'tags are applied in the buffer, and they need to be removed when new highlighting is ready
237 Jun 11 10:20:00 <pbor> muntyan: I mean, what do you cache in the nodes? for instance GtkTextBTree allows to know if there is a tag without checking all the leafs/segments
238 Jun 11 10:20:18 <muntyan> at the moment tree keeps offsets, and with modifications queue you always can tell where tags are applied
239 Jun 11 10:20:41 <muntyan> pbor: let me look at it :)
240 Jun 11 10:21:30 <paolo> I don't think tags are a problem here
241 Jun 11 10:21:52 <paolo> suppose you invalidate a part of the tree (a subtree)
242 Jun 11 10:21:59 <muntyan> pbor: i keep number of line marks in nodes; and leaf nodes contain highlighting stuff and the marks
243 Jun 11 10:22:06 <paolo> a subtree represent a part of the buffer
244 Jun 11 10:22:13 <pbor> paolo: tags are not a problem, the problem is having offsets in the tree nodes
245 Jun 11 10:22:21 <paolo> you have to remove the tags from that part of the buffer
246 Jun 11 10:22:34 <muntyan> paolo: but you can't do it until you have new tags
247 Jun 11 10:22:48 <paolo> muntyan: why not?
248 Jun 11 10:23:02 <muntyan> because there is nothing to delete until then
249 Jun 11 10:23:07 <paolo> pbor: since we need to use relative offset
250 Jun 11 10:23:12 <pbor> paolo: yes, but how do you map a subtree to an area of the buffer?
251 Jun 11 10:23:13 <muntyan> if you delete text, there is no region to delete tags from
252 Jun 11 10:23:13 <paolo> pbor: not absolute ones
253 Jun 11 10:23:22 <muntyan> you need to analyze it first
254 Jun 11 10:23:34 <muntyan> or if you insert text, same thing
255 Jun 11 10:24:02 <muntyan> you can mark position as "invalid", but you don't know in advance what *area* really changed
256 Jun 11 10:24:42 <paolo> hmm... it is difficult to explain without figures
257 Jun 11 10:25:09 <paolo> let me try step by step
258 Jun 11 10:25:16 <paolo> don't think to the current implementation
259 Jun 11 10:25:23 <muntyan> hm, actually i don't see why what i'm saying makes sense
260 Jun 11 10:25:38 <paolo> suppose you have a syntax tree
261 Jun 11 10:25:57 <paolo> like the one in the wikipedia figure
262 Jun 11 10:27:41 <paolo> suppose you add a piece of text between "the" and "ball", then you invalidate a part of the tree
263 Jun 11 10:28:12 <paolo> if the text does not create a new context, then you only need to add a new node (without invalidating other nodes)
264 Jun 11 10:28:31 <paolo> otherwise you will have to invalidate part (or all) the tree
265 Jun 11 10:29:34 <paolo> let us speak about text offset now
266 Jun 11 10:29:57 <paolo> ATM we use absolute offset
267 Jun 11 10:30:04 <muntyan> i'll be back in ten minutes, son is hungry
268 Jun 11 10:30:36 <paolo> so "Jonh" is [0-4], "hit" is [5-8]" and so on
269 Jun 11 10:30:55 <paolo> I think we should use relative offset
270 Jun 11 10:31:08 <paolo> where relative is according to the "father" node
271 Jun 11 10:31:46 <paolo> in this way it will be easier to make changes without visiting all the tree
272 Jun 11 10:32:10 <paolo> and the queue will contain "nodes" and not text segments
273 Jun 11 10:32:16 <paolo> what do you think?
274 Jun 11 10:32:34 <pbor> using relative offsets may be better for the tree merging, but I don't see how it helps us:
275 Jun 11 10:32:57 <pbor> the fact is that the real analisys happens async, in the idle
276 Jun 11 10:33:25 <pbor> so a second modification can happen before you analyzed the first one
277 Jun 11 10:33:47 <paolo> wait
278 Jun 11 10:33:51 <pbor> and you can't put the nodes in queue, since it means analyzing right away
279 Jun 11 10:34:19 <paolo> we can split analysis in a sync part (i.e. updating or invalidating the tree)
280 Jun 11 10:34:45 <paolo> and a second aync part "rebuilding" the invalidate part of the tree
281 Jun 11 10:35:48 <paolo> i,e. always having a fast analysis computing only the "first" level of the subtree
282 Jun 11 10:36:04 <pbor> well, that depends on how fast the tree invalidation is...
283 Jun 11 10:36:37 <paolo> it only works on the tree, so I think it should be fast enough
284 Jun 11 10:37:26 <pbor> it may also have other problems for performance (though here is all moot without numbers)
285 Jun 11 10:38:05 <pbor> for instance a 'paste' inserts many chuncks of text (one for each tag toggle)
286 Jun 11 10:38:26 <pbor> so you would do many invalidations before reanalizying
287 Jun 11 10:38:45 <muntyan> how do you invalidate part between contexts?
288 Jun 11 10:38:59 <muntyan> e.g. top-level no-contexts text
289 Jun 11 10:39:14 <muntyan> 2) merging is still hard
290 Jun 11 10:39:29 <paolo> merging will be always hard :)
291 Jun 11 10:39:43 <muntyan> not if you don't keep segments in the tree
292 Jun 11 10:39:49 <paolo> invalidating part between contexts means adding a new context
293 Jun 11 10:40:03 <paolo> and updating the offset of the following siblings
294 Jun 11 10:40:30 <paolo> it is the easiest update
295 Jun 11 10:41:41 <paolo> what I mean is that syntax trees are historically the most important data structure in language analysis
296 Jun 11 10:41:57 <paolo> and I don't think we have to invent new structures
297 Jun 11 10:42:19 <muntyan> well, let me play angry:
298 Jun 11 10:42:57 <muntyan> i have working highlighting without any knowledge of history; and there is broken engine which didn't invent new data structures
299 Jun 11 10:42:57 <paolo> What about trying to look at eclipse code and at colorer to see what they are using?)
300 Jun 11 10:43:03 * muntyan stops playing angry
301 Jun 11 10:43:22 <paolo> hehe :)
302 Jun 11 10:43:30 <muntyan> what i want to say is that we need something that will work and is good. "history" is zero weight argument here, imho
303 Jun 11 10:44:37 <muntyan> sheesh, color tarball doesn't have single toplevel dir!
304 Jun 11 10:44:41 <muntyan> colorer
305 Jun 11 10:44:45 <muntyan> good it's in /tmp
306 Jun 11 10:44:45 <paolo> well, history is always important... if you want call it a "design pattern" :)
307 Jun 11 10:45:03 <muntyan> i don't give a shit to that, it means too little these days
308 Jun 11 10:45:11 <muntyan> everybody uses "design patterns" ;)
309 Jun 11 10:45:35 <pbor> paolo: well, the fact that a tree can be used to represent a syntax is no news, but the real question is what do we need the tree for?
310 Jun 11 10:45:40 <muntyan> anyway, if a tree is good, it's good. but we need to know for sure what we choose, and why
311 Jun 11 10:46:43 <pbor> I mean, the engine is keeping a tree of the whole buffer syntax up to date, but as far as I can see it's not really using it
312 Jun 11 10:47:22 <pbor> for highlighting it just wants to know which tag apply to each segment
313 Jun 11 10:47:38 <pbor> and to do this you have to walk the whole segments
314 Jun 11 10:47:57 <pbor> be it be going trough a list, walking a tree or whatever
315 Jun 11 10:49:00 <muntyan> i personally worry about modifications, they have proven to be the hard part ;)
316 Jun 11 10:49:27 <pbor> maybe the whole syntax tree can be very useful for stuff like folding, or for indenters... but I am missing the compelling reason to have a full tree representation of the syntax of the text in the buffer
317 Jun 11 10:50:04 <paolo> to avoid to re-analyze part of the buffer during modifications
318 Jun 11 10:50:18 <paolo> and put the basis for further analysis
319 Jun 11 10:51:22 <pbor> but for hl I just want to iter through the text and know at which point of the syntax 'stack' I am in
320 Jun 11 10:52:08 <pbor> that is, I just need to know the parent context, not all the ancestors
321 Jun 11 10:52:31 <pbor> (except for the #pop#pop case muntyan said before)
322 Jun 11 10:52:32 <muntyan> in either case we have a tree, just maybe implicitly; question is whether we really need a big tree with number of nodes equal number of syntax regions in the buffer
323 Jun 11 10:53:00 <muntyan> pbor: we do need all the ancestors; but once we have parent, we have its parent, and so on
324 Jun 11 10:53:29 <pbor> muntyan: yes, sure
325 Jun 11 10:54:21 <pbor> anyway, I think that invalidating the tree right away could solve the issues with the queue
326 Jun 11 10:54:38 <pbor> what has to be seen is if we can do that fast enough
327 Jun 11 10:55:28 <muntyan> should be fast
328 Jun 11 10:58:08 <pbor> as far as I can see that is also orthogonal to the relative offset issue
329 Jun 11 10:59:10 <muntyan> what do you mean?
330 Jun 11 10:59:45 <pbor> that you can invalidate a part of a tree even with absolute offsets
331 Jun 11 10:59:49 <pbor> can't you?
332 Jun 11 11:00:36 <paolo> yep, they are relative problems
333 Jun 11 11:00:44 <paolo> I mean ortogonal
334 Jun 11 11:00:54 <muntyan> but if offsets are absoulte, you need to walk all the nodes?
335 Jun 11 11:01:17 <paolo> using relative offset enable in some cases very fast changes without the need to walking all the subtree
336 Jun 11 11:01:35 <muntyan> relative offsets rule
337 Jun 11 11:01:41 <muntyan> absolute offsets suck
338 Jun 11 11:01:43 <muntyan> :)
339 Jun 11 11:02:47 <pbor> yes, we all agree that relative offsets are better
340 Jun 11 11:03:01 <paolo> I'd say: can we try to write down the "modification" algorithm in pseudo code (so we can also use it for documentation)?
341 Jun 11 11:04:25 <paolo> I think a modification can only have 3 effects:
342 Jun 11 11:04:31 <paolo> - adding a context
343 Jun 11 11:05:15 <paolo> - invalidating a subtree of contexts
344 Jun 11 11:05:50 <paolo> well they are only 2
345 Jun 11 11:06:33 <paolo> actually we could reduce "adding a context" as a sub-case of "invalidating a subtree"
346 Jun 11 11:06:34 <muntyan> comments in colorer look like it reparses everything after modification
347 Jun 11 11:07:09 <paolo> pbor: do you still have the thesis of barisione?
348 Jun 11 11:07:15 <pbor> paolo: yes
349 Jun 11 11:07:21 <muntyan> what does "invalidating subtree" mean? invalidating whole subtree and reanalyzing it?
350 Jun 11 11:07:28 <paolo> are therec omments con colorer?
351 Jun 11 11:08:28 <paolo> it means removing it from the context tree and mark the linked buffer segment as "to-be-analyzed"
352 Jun 11 11:09:13 <pbor> paolo: not really... just a small paragraph where it says why implementing sourceview instead of using colorer
353 Jun 11 11:09:29 <paolo> well, no, it means:
354 Jun 11 11:09:37 <paolo> - removing the subtree
355 Jun 11 11:09:41 <pbor> paolo: and the arguments are: overcomplex, C++, ugly lang files
356 Jun 11 11:09:59 <muntyan> i am thinking about inserting text at the top level
357 Jun 11 11:10:01 <paolo> - leaving the root (marked as "to-be-analyzed")
358 Jun 11 11:10:18 <muntyan> you don't want to remove anything, i.e. adding a node is not invalidating a subtree
359 Jun 11 11:10:50 <muntyan> http://rafb.net/paste/results/bXZKT725.html
360 Jun 11 11:10:55 <paolo> ok, I agree
361 Jun 11 11:11:12 <muntyan> that's from colorer header, looks rather nasty
362 Jun 11 11:11:36 <paolo> ok
363 Jun 11 11:11:53 <muntyan> anyway, who is colorer?
364 Jun 11 11:11:59 <muntyan> does it work in any decent editor?
365 Jun 11 11:12:12 <paolo> see the homepage :)
366 Jun 11 11:12:24 <paolo> I have never tried the cited editors
367 Jun 11 11:12:31 <muntyan> eclipse is shit, mc doesn't have live updating on modifications
368 Jun 11 11:12:38 <muntyan> dunno about far
369 Jun 11 11:13:44 <paolo> eclipse is great :)
370 Jun 11 11:13:51 <paolo> but it is not using colorer
371 Jun 11 11:15:00 <muntyan> eclipse is great? you have strong nerves then :)
372 Jun 11 11:15:08 <muntyan> or using java
373 Jun 11 11:15:52 <paolo> using java
374 Jun 11 11:16:04 <muntyan> it must be great then :)
375 Jun 11 11:18:14 <paolo> eclipse is a great IDE for Java... but it sucks for C/C++
376 Jun 11 11:21:50 <pbor> so, to sum things up, do we have a plan?
377 Jun 11 11:24:04 <pbor> muntyan: how much effort is to 'fix' the queue for the 'debug thing'? if it's not much maybe it would be nice to fix it just to see if something else bites us before touching the context tree...
378 Jun 11 11:25:25 <paolo> I think the plan is:
379 Jun 11 11:25:31 <muntyan> lot of effort
380 Jun 11 11:25:59 <paolo> 1. writing the modification algorithm in pseudo-code
381 Jun 11 11:26:10 <paolo> 2. validating it by hand
382 Jun 11 11:26:39 <paolo> 3. modifying the current context tree according to 1 (using relative offsets)
383 Jun 11 11:26:45 <paolo> do you agree?
384 Jun 11 11:26:52 <muntyan> 3. is really really hard
385 Jun 11 11:27:06 <paolo> why?
386 Jun 11 11:27:07 <muntyan> for me it besically means rewriting everything
387 Jun 11 11:27:22 <paolo> it could be
388 Jun 11 11:27:56 <paolo> trying to preserve most code is possible
389 Jun 11 11:28:19 <paolo> or cutting and pasting in the new code most old-code is possible
390 Jun 11 11:28:38 <paolo> I'm thinking to the code performing parsing
391 Jun 11 11:28:51 <muntyan> one needs to understand the code first ;)
392 Jun 11 11:28:55 <paolo> do you guys agree in my action plans?
393 Jun 11 11:29:07 <muntyan> honestly, i don't like it
394 Jun 11 11:29:14 <paolo> why?
395 Jun 11 11:29:18 <muntyan> we did not discuss all the issues
396 Jun 11 11:29:31 <paolo> let us discuss them, then
397 Jun 11 11:29:42 <muntyan> and this plan is a big plan, it's not something "for now to see what happens"
398 Jun 11 11:30:09 <muntyan> how to merge old and new trees?
399 Jun 11 11:30:24 <muntyan> do we want to give up after N characters and reanalyze reminder of the buffer?
400 Jun 11 11:31:45 <pbor> well, that is something that need measuring
401 Jun 11 11:32:01 <pbor> it really depends on how much slower is try to merge the old tree
402 Jun 11 11:32:20 <muntyan> it's important
403 Jun 11 11:32:25 <paolo> well, this is what point 1 needs to
404 Jun 11 11:32:35 <muntyan> if we want not to give up, we need to keep new and old stuff around
405 Jun 11 11:33:41 <paolo> what do you mean with "merging old and new trees"?
406 Jun 11 11:34:42 <muntyan> when some text is modified, it potentially changes highlighting after the modification place
407 Jun 11 11:35:01 <muntyan> so we need to merge what we get after analyzing modified text and what we had before
408 Jun 11 11:35:18 <pbor> paolo: the current algorith is: "when a modification is made find the upper context affected, detach the subtree, fix its offsets. then start reanalyzin. at each step see if the old tree merges, if yes use it and stop reanalyizing"
409 Jun 11 11:35:29 <muntyan> what happens is that we have old tree, covering all the buffer; and we have new branches growing from the modified region
410 Jun 11 11:35:45 <muntyan> we need to glue those branches and old tree, and throw away wrong old nodes
411 Jun 11 11:36:30 <pbor> well, "invalidating" really means marking an 'old' subtree in the tree
412 Jun 11 11:36:37 <paolo> why it is useful merging with the old tree? Why we cannot simple replace it?
413 Jun 11 11:36:53 <muntyan> paolo: we can't simply reanalyze all the reminder of the buffer
414 Jun 11 11:36:54 <pbor> paolo: because analyzing is slow
415 Jun 11 11:37:01 <muntyan> insert one char, for example
416 Jun 11 11:37:07 <muntyan> you don't want to analyze whole file
417 Jun 11 11:37:53 <muntyan> yeah, if it weren't slow, we wouldn't have any problem :)
418 Jun 11 11:38:57 <paolo> ok, I see
419 Jun 11 11:39:23 <paolo> but still this is what point 1 is needed for
420 Jun 11 11:39:39 <paolo> we need to try to write down the algorithm
421 Jun 11 11:39:45 <muntyan> point 1 is useful in any situation
422 Jun 11 11:39:52 <muntyan> but we need to know the algorythm first
423 Jun 11 11:40:10 <paolo> point 1 is "design the real-algorithm"
424 Jun 11 11:40:41 <paolo> we have discussed about an intuition, about guide-lines, we need to design the details
425 Jun 11 11:40:45 <muntyan> you are assuming that the algorithm is exactly the tree as you described it, and exactly doing what you said
426 Jun 11 11:40:55 <muntyan> we didn't decide we do it, did we?
427 Jun 11 11:41:20 <paolo> well, it seems to me we decided about two points:
428 Jun 11 11:41:26 <paolo> - using relative offset
429 Jun 11 11:41:56 <paolo> - using a "concrete syntax tree"
430 Jun 11 11:42:17 <muntyan> err, it seems you really decided to do it :)
431 Jun 11 11:42:19 <paolo> now we have to figure how the syntax tree must be modified when a modification happens
432 Jun 11 11:42:46 <pbor> well, the algorithm is asaics not much different from the current one: leaving relative offset aside, currently we put the modifications in a queue, then the idle runs peeks a modifications find the affected tree node and do the analyze/merge magic
433 Jun 11 11:43:03 <pbor> with the new idea we just dump the queue
434 Jun 11 11:43:12 <paolo> exactly
435 Jun 11 11:43:17 <pbor> do the 'find affected node' sync
436 Jun 11 11:43:35 <pbor> and the idle walks the tree and finds the first node that needs updating
437 Jun 11 11:43:51 <paolo> the queue becomes a list of "nodes" to be analyzed in a more detailed way
438 Jun 11 11:44:10 <pbor> well, we don't even need such queu IMHO
439 Jun 11 11:44:18 <paolo> may be no
440 Jun 11 11:44:24 <pbor> the idle handler can simply walk the tree
441 Jun 11 11:44:56 <pbor> and find the upmost node that needs analisys
442 Jun 11 11:45:04 <pbor> that is that is marked invalid
443 Jun 11 11:45:47 <pbor> 'detaches' the old tree at that node and then do the analyze/merge pretty much as it does now
444 Jun 11 11:46:38 <pbor> the problem is that the current analyze algo use the modification 'delt'
445 Jun 11 11:46:41 <pbor> *delta
446 Jun 11 11:47:03 <muntyan> it uses delta only to update offsets, i think; it's not a problem
447 Jun 11 11:47:12 <pbor> okay even better
448 Jun 11 11:47:21 <muntyan> but merging won't magically work!
449 Jun 11 11:47:30 <pbor> no it won't
450 Jun 11 11:47:32 <muntyan> and that question is still open
451 Jun 11 11:47:43 <muntyan> do we want to give up after analyzing one batch?
452 Jun 11 11:48:27 <paolo> why is this so important?
453 Jun 11 11:48:30 <pbor> but as far as I can see the merging works at the moment, doesn't it (apart from the offsets screwed by the async queue)?
454 Jun 11 11:48:47 <muntyan> it does work now
455 Jun 11 11:48:56 <muntyan> and it will work
456 Jun 11 11:49:10 <paolo> how is this going to influence the algorithm we are trying to design? It seems to me an implementation detail we can evaluate later. Am I wrong?
457 Jun 11 11:49:19 <muntyan> yes
458 Jun 11 11:49:51 <pbor> if you are more confortable you can make it always try to merge
459 Jun 11 11:49:56 <muntyan> or no, but it's an extremely important detail
460 Jun 11 11:50:04 <pbor> and we can see how it goes
461 Jun 11 11:50:05 <muntyan> which takes half of the code
462 Jun 11 11:50:20 <pbor> well, not really as far as I can see
463 Jun 11 11:50:28 <paolo> sure it is important, but we can see it in a second step
464 Jun 11 11:50:35 <muntyan> pbor: the code is simple now because it does give up
465 Jun 11 11:50:38 <muntyan> giving up is easy
466 Jun 11 11:50:45 <muntyan> "simple"
467 Jun 11 11:51:09 <paolo> well, is giving after a batch up so costly?
468 Jun 11 11:51:59 <muntyan> it depends on how good the batch is
469 Jun 11 11:52:23 <muntyan> with single modification, giving up is right, assuming the tree is a good idea
470 Jun 11 11:52:39 <muntyan> i.e. with the tree as it is, it's too expensive not to give up
471 Jun 11 11:52:46 <muntyan> and in most cases it works fine
472 Jun 11 11:52:59 <pbor> well, giving up at some point seems like a correct euristic to me... if you can't merge after X lines the probability of merging later goes down
473 Jun 11 11:53:08 <pbor> what needs tweaking is X
474 Jun 11 11:53:24 <pbor> that is how hard try before giving up
475 Jun 11 11:53:34 <muntyan> multiple modifications break this euristic :)
476 Jun 11 11:53:41 <pbor> but that's just a knob in the Reader objectt
477 Jun 11 11:54:09 <muntyan> you are assuming that the modification is in the beginning of the batch, and there is nothing modified after it
478 Jun 11 11:54:18 <muntyan> it's the case for hand-editing
479 Jun 11 11:54:52 <pbor> if now we mark 'invalid' in the tree directly we should be able to be smarter
480 Jun 11 11:54:53 <muntyan> (though even in this case, you have #if 0 ... #endif on 1001 lines, and batch of 1000 line, and boom)
481 Jun 11 11:55:18 <pbor> we can see if two modifications are for indipendent branches of the tree
482 Jun 11 11:55:50 <muntyan> it doesn't matter, in such bad cases
483 Jun 11 11:56:19 <muntyan> thing is that you change one char, and it can modify arbitrarily big part of the buffer
484 Jun 11 11:56:24 <paolo> I don't think the algorithm for all seasons exists
485 Jun 11 11:56:26 <muntyan> it can kill all the contexts after it
486 Jun 11 11:56:30 <paolo> so there are trade off
487 Jun 11 11:56:34 <muntyan> it does exist
488 Jun 11 11:56:43 <paolo> we can start giving up after a batch
489 Jun 11 11:56:50 <muntyan> not having such a tree is such an algorithm :)
490 Jun 11 11:56:52 <paolo> and then implement a better algo in a second step
491 Jun 11 11:57:38 <paolo> I really don't see why your data structure is better
492 Jun 11 11:57:54 <muntyan> because merging is easy
493 Jun 11 11:57:59 <muntyan> nothing to merge
494 Jun 11 11:58:10 <paolo> why not?
495 Jun 11 11:58:19 <muntyan> because there are no nodes to merge
496 Jun 11 11:58:22 <pbor> a bit far fetched but it's something like DOM vs SAX, isn't it? :)
497 Jun 11 11:58:28 <muntyan> you don't operate with nodes, you operate with segments
498 Jun 11 11:59:16 <paolo> ok, but then you cannot use the info for code folding or other context sensitive analysis
499 Jun 11 11:59:35 <muntyan> why can't you?
500 Jun 11 11:59:47 <muntyan> the state machine is there, you have the tree of contexts
501 Jun 11 12:00:03 <muntyan> um, got it
502 Jun 11 12:00:15 <muntyan> yes, you need another tree for that
503 Jun 11 12:00:24 <muntyan> (though, say for code folding you need it anyway)
504 Jun 11 12:00:58 <muntyan> indeed, hacing a tree is good
505 Jun 11 12:01:04 <muntyan> but merging is awful
506 Jun 11 12:01:34 <muntyan> besides, does it play well with those context-sensitive things? (if we consider it as an argument, then we need to consider what happens when we invalidate nodes)
507 Jun 11 12:01:52 <paolo> I'm still not convinced about the need of merging, but may be I'm missing some details
508 Jun 11 12:02:13 <muntyan> okay, what happens when you insert a character?
509 Jun 11 12:02:25 <muntyan> you invalidate a node, what then?
510 Jun 11 12:05:26 <pbor> that's it, you invalidate the node instead of pushing the modification in the queue
511 Jun 11 12:05:33 <paolo> if it creates a new context, then invalidate all the following siblings
512 Jun 11 12:05:35 <muntyan> yes
513 Jun 11 12:05:40 <muntyan> what does the analyzer do
514 Jun 11 12:06:00 <muntyan> simple case
515 Jun 11 12:06:06 <muntyan> you are typing in:
516 Jun 11 12:06:08 <muntyan> "wefwef
517 Jun 11 12:06:12 <muntyan> and then you insert "
518 Jun 11 12:06:14 <muntyan> in a C file
519 Jun 11 12:06:36 <muntyan> you must merge stuff
520 Jun 11 12:06:51 <paolo> you invalidate the parent
521 Jun 11 12:07:09 <paolo> and then it will be re-analyzed in an async way
522 Jun 11 12:07:12 <muntyan> parent is NULL
523 Jun 11 12:07:20 <muntyan> so, how it will be reanalyzed?
524 Jun 11 12:07:30 <muntyan> okay, let me give more details
525 Jun 11 12:07:33 <paolo> no, parent cannot be null
526 Jun 11 12:07:35 <muntyan> you have a C file:
527 Jun 11 12:07:36 <muntyan> wait
528 Jun 11 12:07:37 <muntyan> so
529 Jun 11 12:07:43 <muntyan> ---------------
530 Jun 11 12:07:44 <muntyan> "efwefwefwef
531 Jun 11 12:07:51 <muntyan> int a (void)
532 Jun 11 12:07:52 <muntyan> ------------
533 Jun 11 12:08:02 <muntyan> and then you insert " after ewfwef
534 Jun 11 12:08:17 <muntyan> you invalidate the top top context, fine
535 Jun 11 12:08:37 <muntyan> what will idle_worker do after it closes the string context, what will it do on the next line?
536 Jun 11 12:08:48 <muntyan> you don't want to reanalyze the next line, there is nothing to reanalyze
537 Jun 11 12:09:04 <muntyan> so you need to merge what was before and what you have now
538 Jun 11 12:10:30 <pbor> in the idle handler you find the invalidated node in the tree and split the tree there,
539 Jun 11 12:10:38 <pbor> then you analize the first line
540 Jun 11 12:10:42 <muntyan> pbor: i am trying to convince we need to do it
541 Jun 11 12:10:49 <pbor> then you try to merge the split node
542 Jun 11 12:10:50 <muntyan> pbor: the splitting and merging
543 Jun 11 12:10:57 <muntyan> err, to convince paolo
544 Jun 11 12:11:11 <pbor> yes, we defintely need to split and merge
545 Jun 11 12:11:53 <paolo> if you add a ", you have to:
546 Jun 11 12:12:08 <pbor> 'invalidating in the tree' is just a different way to tell where to split instead of using an offset
547 Jun 11 12:12:12 <paolo> - invalidate the parent node (toplevel root in this case)
548 Jun 11 12:12:50 <paolo> and analyze the part of the buffer associated with the node
549 Jun 11 12:12:57 <paolo> in this case the entire buffer
550 Jun 11 12:13:06 <paolo> hmm... no
551 Jun 11 12:13:30 <paolo> since in C with have string ending with end of line
552 Jun 11 12:13:54 <paolo> so you need to:
553 Jun 11 12:14:01 <paolo> - invalidate the root (as before)
554 Jun 11 12:14:03 <muntyan> right, and it could be anything ending anywhere, say after char "Z"
555 Jun 11 12:14:23 <paolo> - analyzing the node
556 Jun 11 12:15:10 <paolo> ok... I see what you mean with merging
557 Jun 11 12:15:41 <paolo> but I don't see why it is so difficult to achieve
558 Jun 11 12:18:08 <muntyan> because ten nodes go to trash, and you end up with node of depth 8 on the left, and node of depth 4 on the right, having different top-level ancestors
559 Jun 11 12:18:19 <muntyan> or vice versa
560 Jun 11 12:18:24 <muntyan> it's really nasty
561 Jun 11 12:18:42 <muntyan> oh, forgot about important detail
562 Jun 11 12:18:58 <muntyan> you also have no idea if you can merge nodes after ten next chars
563 Jun 11 12:19:18 <muntyan> i.e. you don't know if you need to throw away everything (give up) or look a bit further
564 Jun 11 12:19:34 <muntyan> dunno, it's very dirty business
565 Jun 11 12:20:36 <muntyan> besides, about folding
566 Jun 11 12:20:51 <muntyan> imagine you have a big function body, longer than a batch
567 Jun 11 12:21:10 <paolo> well, this is due to the fact you want to manage merging in a very smart way
568 Jun 11 12:22:05 <paolo> I think we should try to manage simple cases as special cases
569 Jun 11 12:22:14 <muntyan> good ppint
570 Jun 11 12:22:16 <muntyan> point
571 Jun 11 12:22:19 <paolo> and then use a generic brutal approach for the general case
572 Jun 11 12:22:40 <muntyan> still, it's not obvious to me that cases like " after C string or > in xml file are really easy
573 Jun 11 12:22:52 * muntyan draws
574 Jun 11 12:22:52 <paolo> I mean adding a single character is a special very frequent case
575 Jun 11 12:22:58 <paolo> other are less frequents
576 Jun 11 12:23:06 <muntyan> no!
577 Jun 11 12:23:12 <muntyan> in gedit, maybe
578 Jun 11 12:23:19 <muntyan> it
579 Jun 11 12:23:29 <muntyan> s very important to handle search/replace, for example
580 Jun 11 12:23:40 <muntyan> or my active strings, i like my active strings
581 Jun 11 12:23:46 <muntyan> hm, snippets?
582 Jun 11 12:24:22 <paolo> sure, but how frequently are the user using it? May be a more brutal generic approach is fast enough for that cases
583 Jun 11 12:24:23 <muntyan> err, "less" is true, but "less frequent" doesn't mean "not frequent"
584 Jun 11 12:24:40 <muntyan> paolo: often :)
585 Jun 11 12:25:08 <pbor> well, we really need a way to mark 'interactive' changes in a way different from 'one-char-lenght'...
586 Jun 11 12:25:14 <paolo> BTW, merging is an important part of the algorithm implementation
587 Jun 11 12:25:21 <pbor> but that's a separate issue IMHO
588 Jun 11 12:25:26 <paolo> I still would like to see "the" algorithm
589 Jun 11 12:26:04 <muntyan> i would LOVE to see it :)
590 Jun 11 12:26:34 <muntyan> so, is it the final decision to have the tree as you described it?
591 Jun 11 12:29:20 <paolo> the final decision is up to you too
592 Jun 11 12:29:33 <paolo> you are the guy, I'm only the "observer" :)
593 Jun 11 12:29:35 <muntyan> well...
594 Jun 11 12:29:40 <muntyan> you are the boss :)
595 Jun 11 12:29:51 <paolo> hehe
596 Jun 11 12:29:59 <muntyan> i still do not see why "tree" is better than "no tree"
597 Jun 11 12:30:11 <muntyan> historical and "no new stuff" arguments do not count
598 Jun 11 12:30:32 <muntyan> the folding and such are good arguments, but it's not clear why the tree will be *really* good for those
599 Jun 11 12:30:59 <paolo> since you can decide to fold at the "node" level
600 Jun 11 12:31:06 <muntyan> note, i am not saying that my idea is better. if i thought so, i'd say it. but i want to know for sure what i am doing
601 Jun 11 12:31:17 <muntyan> paolo: you can do it with no-tree too
602 Jun 11 12:31:23 <muntyan> paolo: since the tree is still there
603 Jun 11 12:31:34 <muntyan> paolo: and, it's not clear how folding will play with modifications
604 Jun 11 12:31:38 <paolo> exactly, this is why I don't see no-tree is better
605 Jun 11 12:31:39 <muntyan> in either case
606 Jun 11 12:31:53 <paolo> you have the same info in a different, IMO less clean, way
607 Jun 11 12:32:06 <muntyan> i'd say it is more clean
608 Jun 11 12:32:18 <muntyan> because you don't mix syntax and text segments
609 Jun 11 12:32:29 <muntyan> text segments know about syntax, but syntax tree doesn't
610 Jun 11 12:32:32 <muntyan> and it doesn't need to
611 Jun 11 12:32:41 <muntyan> i.e. it depends on point of view :0
612 Jun 11 12:32:50 <muntyan> main thing is that merging is easy with no-tree
613 Jun 11 12:33:05 <paolo> why?
614 Jun 11 12:33:18 <paolo> explain the above example with your data structure
615 Jun 11 12:33:44 <muntyan> you insert a char, you mark line as invalid
616 Jun 11 12:33:49 <muntyan> then, you reanalyze the line
617 Jun 11 12:34:06 <muntyan> then, if the expected-on-next-line context is the same as the actual one, you stop
618 Jun 11 12:34:17 <paolo> and what does it happen if the context span multiple lines?
619 Jun 11 12:34:28 <muntyan> same thing, you invalidate next line
620 Jun 11 12:34:36 <muntyan> until you have won
621 Jun 11 12:34:54 <paolo> honestly, I don't see why it is different
622 Jun 11 12:35:07 <muntyan> merging is extremely easy
623 Jun 11 12:35:14 <muntyan> no splitting, no merging
624 Jun 11 12:35:21 <paolo> actually I think it is less efficient since you always invalidate the entire line
625 Jun 11 12:35:28 <muntyan> paolo: you always have to
626 Jun 11 12:35:41 <muntyan> in either approach
627 Jun 11 12:35:55 <muntyan> you can optimize it by looking at regexes, also with either approach
628 Jun 11 12:35:58 <paolo> why?
629 Jun 11 12:36:02 <paolo> suppose you have
630 Jun 11 12:36:04 <muntyan> there is look-ahead and look-behind
631 Jun 11 12:36:10 <paolo> /* ciao
632 Jun 11 12:36:13 <muntyan> you always have to reanalyze whole line
633 Jun 11 12:36:13 <paolo> ciao ciao
634 Jun 11 12:36:45 <paolo> */ ciao / ciao */
635 Jun 11 12:36:54 <paolo> if you add a * after /
636 Jun 11 12:37:02 <paolo> you don't have to invalidate the entire line
637 Jun 11 12:37:18 <muntyan> only because *you* *know* C syntax
638 Jun 11 12:37:28 <muntyan> in general case you must analyze whole line
639 Jun 11 12:37:39 <paolo> because you are using a tree
640 Jun 11 12:37:46 <muntyan> (and it can be optimized, by looking at children, regexes and whatnot)
641 Jun 11 12:37:48 <muntyan> paolo: nope
642 Jun 11 12:37:50 <paolo> and so you know the syntax
643 Jun 11 12:38:13 <muntyan> i know the syntax with my approach, i have the same information as with the tree
644 Jun 11 12:39:03 <muntyan> what i meant by "you know" is "you, paolo, know"
645 Jun 11 12:39:16 <muntyan> you know that the C syntax doesn't have fancy things in that case
646 Jun 11 12:39:28 <muntyan> but in general, one char can change *everything*
647 Jun 11 12:39:32 <muntyan> whole buffer
648 Jun 11 12:39:39 <muntyan> (in weird cases :)
649 Jun 11 12:40:21 <muntyan> you have to analyze the line (note that "analyze" doesn't mean "remove and apply tags", it means "check what's already there")
650 Jun 11 12:40:49 <paolo> anyway, I don't see why it is different
651 Jun 11 12:40:50 <muntyan> say, shebang line
652 Jun 11 12:40:52 <muntyan> #!/eiohfierwog
653 Jun 11 12:40:57 <muntyan> ! changes whole line
654 Jun 11 12:41:09 <muntyan> paolo: the difference is in the merging process
655 Jun 11 12:41:17 <muntyan> paolo: with my approach, no merging process
656 Jun 11 12:41:30 <muntyan> paolo: with the tree, it's fairly complicated
657 Jun 11 12:41:58 <muntyan> ask pbor :)
658 Jun 11 12:42:42 <paolo> ATM you invalidate the entire line, ok
659 Jun 11 12:42:52 <paolo> why can't you do the same with the tree?
660 Jun 11 12:43:00 <paolo> you invalidate all the nodes of the line
661 Jun 11 12:43:06 <muntyan> invalidating is easy
662 Jun 11 12:43:11 <muntyan> merging is hard
663 Jun 11 12:43:11 <paolo> you have the same effect
664 Jun 11 12:43:38 <paolo> does you code perform merging?
665 Jun 11 12:43:59 <muntyan> it doesn't have to
666 Jun 11 12:44:16 <muntyan> it checks whether expected next-line context matches one which is already theer
667 Jun 11 12:44:17 <muntyan> there
668 Jun 11 12:44:27 <muntyan> if it matches, we are done; if not, analyze next line
669 Jun 11 12:44:31 <paolo> why you cannot do it with the tree?
670 Jun 11 12:45:12 <muntyan> you can, actually
671 Jun 11 12:45:18 <muntyan> but, tree doesn't know about lines
672 Jun 11 12:45:20 <paolo> if next sibling matches you are done, if not give invalidate it, and analyze next one
673 Jun 11 12:46:35 <muntyan> let me smoke a bit, looks like everything is fine, and i really can throw away the merging part
674 Jun 11 12:47:08 <paolo> argh.... smoke is dangerous :)
675 Jun 11 12:47:19 <muntyan> yeah, it kills
676 Jun 11 12:47:43 <paolo> I hope not before the end of SoC ;)
677 Jun 11 12:47:55 <muntyan> hm, if we split all contexts on line breaks, it should work
678 Jun 11 12:48:40 <muntyan> but how to do that? do we need line breaks?
679 Jun 11 12:48:51 <muntyan> the merging part is what is really killing me
680 Jun 11 12:48:57 <muntyan> i hate it, and it hates me
681 Jun 11 12:49:26 <paolo> I don't see why you are so "bind" to line breaks... they are chars like the other ones
682 Jun 11 12:49:39 <muntyan> no, splitting contexts on line breaks may do all this stuff every inefficient
683 Jun 11 12:49:45 <paolo> we you need to split context at line breaks if not needed
684 Jun 11 12:49:45 <muntyan> paolo: they are not
685 Jun 11 12:49:53 <muntyan> paolo: "\r\n" vs "\n"
686 Jun 11 12:50:18 <muntyan> line breaks are very special places in the buffer, unfortunately
687 Jun 11 12:50:30 <paolo> why?
688 Jun 11 12:50:37 <muntyan> "\r\n" vs "\n"
689 Jun 11 12:50:42 <muntyan> they are not chars
690 Jun 11 12:50:46 <muntyan> they are special places
691 Jun 11 12:51:07 <paolo> ok, but they are not special from the "syntax hl" point of view
692 Jun 11 12:51:09 <muntyan> but why do i want to split contexts?
693 Jun 11 12:51:14 <muntyan> paolo: they are
694 Jun 11 12:51:21 <paolo> why?
695 Jun 11 12:51:35 <muntyan> paolo: every normal language knows what "line" is, and doesn't care about line terminators
696 Jun 11 12:51:41 <muntyan> or what do you mean?
697 Jun 11 12:51:52 <paolo> they can end some kind of contexts
698 Jun 11 12:52:31 <muntyan> ^blah rules are not rare too
699 Jun 11 12:52:35 <paolo> but for other contexts they have no importance and so you don't need to threat them in a special case in those contexts
700 Jun 11 12:52:44 <muntyan> anyway, thing is that line end is not a char
701 Jun 11 12:52:51 <paolo> think to "multiline C comments"
702 Jun 11 12:53:16 <paolo> ok, they are places
703 Jun 11 12:53:20 <muntyan> yes, it's not right to split everything
704 Jun 11 12:53:32 <paolo> but their importance is related to the current context
705 Jun 11 12:53:48 <paolo> so there is no need to always split
706 Jun 11 12:54:01 <pbor> (sorry, /me reads the backlog)
707 Jun 11 12:54:02 <paolo> you split only if the current context can end with a line break
708 Jun 11 12:54:23 * paolo is trying to convert muntyan to the word of trees :)
709 Jun 11 12:54:49 <paolo> he will become a naturalist before the end of the project :)
710 Jun 11 12:54:55 <muntyan> my research area is groups of tree automorphisms :)
711 Jun 11 12:55:07 <muntyan> finite automata, and stuff
712 Jun 11 12:55:24 <muntyan> anyway, i don't see now why no-tree is better than tree
713 Jun 11 12:55:28 <paolo> so, why I'm speaking about
714 Jun 11 12:55:37 <muntyan> so i guess i agree with those points
715 Jun 11 12:55:40 <paolo> you should use trees everywhere
716 Jun 11 12:56:01 <muntyan> well, my no-tree approach involves having a BTree :)
717 Jun 11 12:56:08 <muntyan> trees are everywhere :)
718 Jun 11 12:57:42 <paolo> to summarize (my wife called, so I need to go home, in order to avoid physical damages :)
719 Jun 11 12:58:19 <paolo> - we are going to use a tree with relative offset
720 Jun 11 12:58:39 <paolo> - probably merging is not so difficult as we thought
721 Jun 11 12:59:01 <paolo> - splitting should be not a problem
722 Jun 11 12:59:50 <paolo> - queue will not contain the buffer modifications, but (if we really need a queue) the invalidated nodes
723 Jun 11 13:00:06 <muntyan> i think queue is not needed at all
724 Jun 11 13:00:27 <muntyan> we can have two flags - invalid, and some-child-is-invalid
725 Jun 11 13:00:31 <muntyan> and then just walk the tree
726 Jun 11 13:00:48 <muntyan> and maybe some first-invalid link, or index, or something
727 Jun 11 13:00:54 * muntyan hates the queue :)
728 Jun 11 13:01:40 <paolo> ok
729 Jun 11 13:03:15 <muntyan> there is one problem with lines, but it desrves an email, i think
730 Jun 11 13:03:26 <muntyan> in any case it's not quick, and doesn't change things to do now
731 Jun 11 13:04:15 <paolo> muntyan: can you please write down a little summary of the discussion we had today and save the log on l.g.o for future reference
732 Jun 11 13:04:34 <paolo> at least the conclusions
733 Jun 11 13:04:46 <muntyan> i am not a good writer...
734 Jun 11 13:04:48 <paolo> and send it by mail to me, pbor and barisione
735 Jun 11 13:04:52 <muntyan> ok
736 Jun 11 13:04:52 <paolo> a few lines
737 Jun 11 13:05:36 <paolo> them you can start designing/implementing the algorithm
738 Jun 11 13:06:30 <muntyan> ok
739 Jun 11 13:07:35 <muntyan> err, can we close the meeting? i'm afraid there will be two injured...
740 Jun 11 13:08:08 <paolo> yes
741 Jun 11 13:08:46 <muntyan> paolo, pbor, see you later then
742 Jun 11 13:08:54 * paolo didn't backup his notebook again
743 Jun 11 13:09:03 <pbor> later
744 Jun 11 13:09:10 <paolo> bye
745 Jun 11 13:09:10 * pbor needs to go too
746 Jun 11 13:09:17 * paolo has quit (Ex-Chat)
747 Jun 11 13:10:00 <muntyan> pbor: four hours!!!
748 **** ENDING LOGGING AT Sun Jun 11 13:11:19 2006
Attached Files
To refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.You are not allowed to attach a file to this page.