Attachment '2006-06-07.txt'
Download 1 (20:34:17) Conversation with #sourceview at 2006-06-07 203417 on pbor
2 (20:34:17) You joined #sourceview
3 (20:34:17) irc.acc.umu.se sets mode: +nt
4 (20:53:48) barisione (~demian@81-208-36-94.ip.fastwebnet.it) joined #sourceview
5 (20:54:27) <pbor> ciao
6 (20:54:34) <barisione> ciao
7 (20:54:54) <pbor> intanto che muntyan arriva ti faccio il paste della mia nota tomboy
8 (20:55:00) <barisione> ok
9 (20:55:05) <pbor> NewEngineIssues
10 (20:55:05) <pbor> - async queue: offsets break. This is fixed in muntyan's tree, but a) the solution is not exactly trivial b) the solution is potentially expensive if the queue grows large (think of search&replace)
11 (20:55:05) <pbor> - highlighting breaks: we thought this was due to the queue issue, but there are still problems which we cannot pinpoint to an 'obvious' cause. In particular see the "debug thing"
12 (20:55:05) <pbor> - performance: even with the queue fixed and despite there are some good euristics to speed up the 'interactive' case, we still can come up with cases that fool the euristic: if 0.
13 (20:55:10) nud (~sf@d83-182-30-136.cust.tele2.be) joined #sourceview
14 (20:55:57) <barisione> I don't understand the second point
15 (20:56:19) <pbor> barisione: the 'hl breaks' point?
16 (20:56:27) <barisione> yes
17 (20:56:33) <pbor> barisione: do you have a checkout of muntyan repo?
18 (20:57:01) <barisione> i'm downloading it now
19 (20:57:07) <pbor> ok
20 (20:57:15) muntyan (~muntyan@pool-71-113-224-59.herntx.dsl-w.verizon.net) joined #sourceview
21 (20:57:17) <muntyan> hi there
22 (20:57:36) <pbor> basically there is a (or maybe more) cases where the hl is not properly updated
23 (20:58:08) <pbor> those cases were crashing before the queue fixes, so they were 'hidden'
24 (20:58:13) <barisione> hi muntyan
25 (20:58:32) <pbor> muntyan: I pasted barisione the tomboy note I shown you before
26 (20:58:40) <muntyan> good
27 (20:58:59) <muntyan> so, who's talking?
28 (20:59:20) <muntyan> pbor: you rule the meeting
29 (20:59:25) <pbor> ok
30 (20:59:45) <pbor> I'd say point 1) the queue thingie is mostly a statement
31 (21:00:05) <pbor> so there is not much to discuss about it (at least at the begining)
32 (21:00:17) <pbor> there was an issue-
33 (21:00:22) <pbor> there is a solution
34 (21:00:46) <pbor> the solution is not very nice so we need to evaluate its cost
35 (21:01:03) <muntyan> err, i have a correction
36 (21:01:11) <pbor> ok, shoot
37 (21:01:38) <muntyan> we do not know if it's a solution; and we do not know what exactly to do with the queue, since sorting doesn't seem to be enough
38 (21:01:56) <muntyan> barisione: what do you think about this, will it work?
39 (21:02:20) <pbor> muntyan: yeah, that was what I was coming to...
40 (21:02:43) <muntyan> well, you started far away from it then :)
41 (21:02:50) <pbor> I was trying to introduce stuff: the summary is
42 (21:02:56) <pbor> we have a bunch of issues
43 (21:03:06) <barisione> (I was reading you last commits)
44 (21:03:25) <pbor> they are not normal bugs, but are fairly deep in how things works apparently
45 (21:03:34) <pbor> so my idea was:
46 (21:03:49) <pbor> - list the issues (at least the ones that we know)
47 (21:04:05) <pbor> - list the possible approaches to correct them
48 (21:04:23) <pbor> - evaluate if the result is not an ugly hack
49 (21:05:06) <pbor> obviosly if we can't fullfill the second, we need more drastic decisions and there is no point discussing the third :)
50 (21:05:55) <pbor> the issues we know are the ones I pasted above
51 (21:06:01) *muntyan hopes the queue will save us
52 (21:06:31) <barisione> muntyan: in you last email you wrote:
53 (21:06:31) <barisione> The obvious one is to have timer and make update_syntax have
54 (21:06:31) <barisione> an extra argument - the time it's allowed to spend. It doesn't look
55 (21:06:31) <barisione> pretty though.
56 (21:06:42) <barisione> how do you plan to do it?
57 (21:07:52) <muntyan> well, the idle_worker creates a timer, and calls update_syntax(..., how_much_time_left)
58 (21:08:00) <muntyan> in a loop
59 (21:08:15) <barisione> how big should be the batch you read?
60 (21:08:18) <muntyan> update_syntax in turn processes batches of fixed size until it runs out of time
61 (21:08:27) <muntyan> i think 4k chars or so
62 (21:08:33) <barisione> you don't know how much text you will need
63 (21:08:34) <muntyan> the initial batch size looks good
64 (21:08:42) <muntyan> but i guess it needs experimenting
65 (21:09:06) <muntyan> right, i don't know
66 (21:09:12) <muntyan> but it's not a problem
67 (21:09:23) <barisione> i tested gtktextbuffer, reading a big batch is two time faster than reading it line by line
68 (21:09:41) <barisione> it's for this reason i wrote the ugly LineReader stuff
69 (21:09:59) <muntyan> i read line by line in my thing, and the most time is spent in regex searching and redrawing
70 (21:10:20) <muntyan> so i don't think line vs big batch will be an issue
71 (21:10:24) <muntyan> plus, why line?
72 (21:10:39) <muntyan> many lines at once, how many will fit into a batch
73 (21:11:31) <muntyan> anyway, this is *not* the most important issue, imo
74 (21:11:32) <barisione> I tested it line by line, it's the opposite case of reading a big batch
75 (21:11:38) <barisione> it was just testing
76 (21:11:54) <muntyan> it doesn't affect how we deal with modifications; and modifications is the thing that needs to be fixed
77 (21:12:27) <muntyan> i mean, once we know how to make it *work*, we can think how to make it work fast and nice
78 (21:13:09) *muntyan is not good in those idle things and batches, the stuff about loops and timers is just some vague ideas
79 (21:13:21) <barisione> however i like you idea on how to modify the engine
80 (21:13:53) <muntyan> barisione: tell more :)
81 (21:14:16) <muntyan> barisione: you know the code, i know only some tiny bits
82 (21:15:17) <barisione> I don't know how to handle modifications that don't fall completely in a batch
83 (21:15:57) <muntyan> looks like splitting them should work fine
84 (21:16:37) <muntyan> say, we process now chars from 0 to 1000; if there are twenty chars deleted from 990 to 1010, it's okay to process first deletion from 990 to 1000, and then from 1000 to 1010
85 (21:16:52) <muntyan> since those chars after 1000 won't fall into the batch, and analysis won't touch them
86 (21:17:35) <barisione> yes it should work
87 (21:17:36) <muntyan> (we do need to avoid analyzing-whole-remainder)
88 (21:18:42) <muntyan> what i worry about is that 'CHANGED' modification. i looked at the code, and apparently it should just work (with corrections in the places that modify offsets in the tree). but i don't know if it works. do you think it will work?
89 (21:19:39) <barisione> i'm trying this approach on paper :)
90 (21:19:49) <muntyan> good :)
91 (21:19:54) <muntyan> you know what to draw :D
92 (21:24:31) <muntyan> by the way, LineReader is an auxiliary thing to avoid slow reading from buffer, and it can't be removed if we don't care about the performance (i don't want to remove it, i want to understand what it is). correct?
93 (21:24:45) <muntyan> err, it can be removed
94 (21:24:58) <barisione> yes
95 (21:25:22) <muntyan> then we can safely read big batches regardless of how much we analyze
96 (21:25:49) <muntyan> so the reading lines vs reading reading a batch won't be an issue
97 (21:26:02) <barisione> it's here because reading line by line is slow but reading a big batch is often not needed if the user just inserted a char
98 (21:26:11) <barisione> if you read big batches it works
99 (21:26:19) <barisione> without any problem
100 (21:26:29) <muntyan> well, it does read big batches now, and it doesn't seem to hurt; so reading big batches is good
101 (21:26:42) <muntyan> (maybe not huge batches, of course)
102 (21:27:46) <muntyan> it's better to read big chunk since after inserting a char we potentially modify big region. so if it's not expensive to always read big chunk of text, we should do it. and it doesn't seem to be expensive
103 (21:28:23) <pbor> muntyan: well, but you analyze the chunk line by line anyway... no?
104 (21:28:31) <barisione> i tried it and the engine was slower but gtktextbuffer seems faster than one year ago
105 (21:28:32) <muntyan> pbor: yes
106 (21:29:01) <muntyan> barisione: no no, i have gtk-2.6 as minimal requrement :)
107 (21:29:16) <barisione> are you using 2.6?
108 (21:29:17) <muntyan> pbor: my department got gtk-2.6! no more gtk-2.2!
109 (21:29:22) <muntyan> barisione: yes
110 (21:31:03) <barisione> i tried to read 10K of normal C code and reading it line by line was really slower than reading it in a single batch
111 (21:31:32) <pbor> mmm
112 (21:31:35) <muntyan> so let LineReader do its job!
113 (21:32:36) <pbor> which is a bit weird since TextBTree stores segments line by line, so reading a whole block should just read line by line and concat...
114 (21:33:29) <muntyan> maybe offsets are the problem?
115 (21:33:57) <muntyan> hm, if there is an iter, then forward_line should work fine
116 (21:34:12) <muntyan> well, maybe it was a textbuffer bug
117 (21:34:17) <muntyan> it's full of bugs, you know ;)
118 (21:34:25) *pbor knows
119 (21:34:32) <muntyan> anyway, it's not the most important thing!
120 (21:34:37) <pbor> indeed
121 (21:34:56) <pbor> I am most concerned about the "debug thing"
122 (21:35:04) <muntyan> what I need now is to know whether my approach is feasible so that I try to do that stuff. if not, i need to know what to do :)
123 (21:35:41) <muntyan> i hope that debug thing is caused by modification which falls into the batch but is not processed in it
124 (21:35:53) <muntyan> (it is the case with debug thing)
125 (21:36:11) <muntyan> barisione: what does your paper say?
126 (21:37:07) <barisione> your idea should work, CHANGED should work
127 (21:37:32) <barisione> but how to you plan to use the old tree?
128 (21:37:59) <muntyan> heh, this is what i was asking about :)
129 (21:38:13) <muntyan> i have no idea how to use it, i hoped that the code will just do it right
130 (21:39:07) <muntyan> it can merge new nodes and old nodes now for deletions and insertions, right? i hope that it will just do it for 'changed'
131 (21:39:57) <pbor> well, as far as I can see the tree doesn't care about insert and delete
132 (21:40:05) <muntyan> hm, actually, this can be easily tested by manual injecting CHANGED modification into the queue and seeing what happens
133 (21:40:05) <pbor> so it should not care about changed
134 (21:40:13) <muntyan> yeah, it only adjusts offsets accordingly
135 (21:40:35) <pbor> yep
136 (21:40:35) <barisione> yes for changed it should work because, as pbor said, it doesn't care about insert or delete
137 (21:40:54) <muntyan> barisione: could you tell in couple of words how tags are applied and removed in the reanalyzed regions?
138 (21:41:14) <barisione> but if the tree cannot be used do you plan to discard it as the current engine does?
139 (21:41:29) <muntyan> no, i don't want to discard everything
140 (21:41:45) <muntyan> i think it should keep trying till the end
141 (21:43:30) <barisione> the code to apply tags is mostly copied from the old engine
142 (21:43:45) <pbor> as far as I understood from the code, the reasoning is that if doesn't merge "soon" it will prolly not merge at all
143 (21:44:00) <barisione> pbor: yep
144 (21:44:06) <muntyan> err, is it really the case?
145 (21:44:20) <barisione> I copied from other engines (colorer?)
146 (21:44:55) <muntyan> well, i personally don't see a reason to give up after one batch
147 (21:44:56) <barisione> after something like 100 lines of code is really really difficult to reuse the old contexts
148 (21:45:27) <muntyan> hm
149 (21:45:51) <pbor> barisione: does colorer use a full tree like the new-engine?
150 (21:45:52) <muntyan> the tree is mixed with the text
151 (21:46:29) <muntyan> i like more an approach of having tree and the segments separately
152 (21:46:35) <muntyan> then it is easy to reuse old stuff
153 (21:46:59) <muntyan> since two segments are 'the same' if and only if they point to the same node
154 (21:47:07) <muntyan> but it would require lot of changes
155 (21:47:31) <barisione> I don't remember exactly how colorer works
156 (21:52:17) <pbor> barisione: any opinion on the debug thing miscoloring? do you think it's related to the queue?
157 (21:53:34) <barisione> i can't test it now as I'm using windows
158 (21:53:50) <pbor> okay
159 (21:54:12) <barisione> and i cannot reboot as booting in windows does not always work on my old laptop
160 (21:54:23) <pbor> no prob
161 (21:54:59) <pbor> the third issue in the list has to do with batch estimation...
162 (21:55:06) <barisione> and sadly i need windows until i finish a stupid vnc clone written in C# for university :(
163 (21:55:35) <pbor> (or least that was our guess)
164 (21:55:44) <pbor> barisione: server or client?
165 (21:56:03) *pbor would love a decent vnc client, even on mono :)
166 (21:56:27) <barisione> server and client, it's similar to vnc but it doesn't use the vnc protocol. it's not really usable
167 (21:56:30) <muntyan> so, what about the simplest thing now: adding CHANGED, sorting the queue, getting the modifications in a batch correct, and leaving the rest as it is
168 (21:56:46) <barisione> vnc works surely better
169 (21:56:54) <muntyan> then we can see what happens, if it works we can think of batch size and stuff
170 (21:57:15) <barisione> muntyan: I agree with you
171 (21:57:17) <pbor> muntyan: sounds like a plan
172 (21:57:29) <muntyan> good
173 (21:57:47) <muntyan> then i'm leaving. real life calling :)
174 (21:57:54) <pbor> ok
175 (21:57:54) <muntyan> thanks guys, see you, etc.
176 (21:57:55) <barisione> bye
177 (21:58:13) <pbor> thanks for your help marco
178 (21:58:18) <barisione> pbor: I need to ask you something about ErrRegex
179 (21:58:23) <barisione> EggRegex
180 (21:58:35) <pbor> sure (though I know little about it)
181 (21:59:37) <muntyan> oh, i forgot about that! i have a question about regex too
182 (21:59:57) pbor set topic on #sourceview: regex!
183 (22:00:03) <barisione> muntyan: ask you question
184 (22:00:12) <muntyan> barisione: remember there was a disucssion in bugzilla about offsets vs indices; and you with mclasen seem to have agreed that it should be changed to use offsets
185 (22:00:27) <muntyan> barisione: so, did you do it in gtksourceview's eggerex?
186 (22:00:35) <barisione> it's what I was going to ask :)
187 (22:00:57) <barisione> mclasen, paolo and I preferred offsets
188 (22:01:05) <muntyan> don't change it! there are people (me) who use eggregex a lot, as it is
189 (22:01:07) <muntyan> :)
190 (22:01:29) <barisione> but it has several problems as pcre always uses indices
191 (22:01:38) <barisione> are you using offsets or indices?
192 (22:01:46) <muntyan> indices are better because it's easier to work with the subject string when you have indices. you don't have to use g_utf8_stuff
193 (22:01:52) <muntyan> i am using indices
194 (22:02:23) <muntyan> if one works with "words" and "chars", then offsets are better. but who in C works with "words" and "chars"?
195 (22:02:46) <barisione> the current version uses offsets
196 (22:02:59) <muntyan> um, then i have old version?
197 (22:03:14) <barisione> the version in gtksourceview
198 (22:03:24) <muntyan> specifically, egg_regex_fetch_pos()
199 (22:03:33) <muntyan> does it returns char offsets or byte indices?
200 (22:03:46) <muntyan> (my copy returns bytes)
201 (22:04:29) <pbor> is the api big? to me it would sound sane have both...
202 (22:04:55) <muntyan> hm, offsets are good for textbuffer
203 (22:05:03) <pbor> yep
204 (22:05:04) <muntyan> since you can't work with bytes in a textbuffer
205 (22:05:15) <pbor> my point exactly
206 (22:05:15) <barisione> the version in gtksourceview uses chars
207 (22:05:29) <barisione> i converted both the engine and eggregex to that
208 (22:05:50) <muntyan> api is quite big, like almost every function is affected :)
209 (22:05:56) <barisione> the old engine was really ugly because it kept in memory both indices and offsets
210 (22:06:04) <pbor> so it means that muntyan copied eggregex from sourceview... so it's using it patched
211 (22:06:18) <pbor> s/it/he
212 (22:06:34) <muntyan> what do you mean?
213 (22:06:51) <pbor> <muntyan> i am using indices
214 (22:07:03) <muntyan> no, i am using non-patched version :)
215 (22:07:03) <pbor> gah
216 (22:07:07) <pbor> sorry
217 (22:07:15) <muntyan> i have eggregex copied long time ago
218 (22:07:20) <barisione> contextengine.c uses only offsets, converting it back to indices (if we decide to do so) should not be difficult
219 (22:07:20) <muntyan> before it was converted to offsets
220 (22:07:29) <pbor> yes
221 (22:07:40) <muntyan> well, converting my code shouldn't be difficult either :)
222 (22:07:47) <muntyan> whatever is better will be good
223 (22:07:53) <barisione> in the cvs on sourceforge you can find the conversion from indices to offsets
224 (22:08:00) <muntyan> and perhaps offsets are really better for textbuffer
225 (22:08:19) <muntyan> barisione: you mean eggregex changes?
226 (22:08:22) <barisione> yes they are better but they could be terribly slow!!!
227 (22:08:31) <barisione> no changes to the engine
228 (22:08:45) <muntyan> barisione: i am not using gtksourceview :)
229 (22:08:54) <muntyan> are they slow or they could be slow?
230 (22:09:40) <muntyan> hm, wait
231 (22:09:44) <barisione> my version of eggregex needs to do some conversions from offsets to indices but it the text you are analyzing is big than this is slow!
232 (22:09:46) <muntyan> if engine uses LineReader
233 (22:09:55) <muntyan> then it uses char*, and it doesn't care about textbuffer
234 (22:10:03) <muntyan> offsets are needed only for tags
235 (22:10:15) <muntyan> ah, i got why you kept both offsets and indices
236 (22:10:28) <muntyan> hm, situation
237 (22:11:41) <muntyan> okay guys, now really have to go
238 (22:11:44) <muntyan> see you
239 (22:11:50) <barisione> bye muntyan
240 (22:11:57) <barisione> i will wrote a mail on this
241 (22:14:51) <pbor> ok
242 (22:14:55) <pbor> I have to go too
243 (22:14:57) <pbor> bye
244 (22:15:16) <pbor> I'll pass the logs to paolo if you are ok
245 (22:16:23) <barisione> ok
246 (22:16:26) <barisione> bye
247 (22:16:52) You are now known as pbor|out
248 (22:27:06) barisione (~demian@81-208-36-94.ip.fastwebnet.it) left #sourceview
249 (00:10:03) nud (~sf@d83-182-30-136.cust.tele2.be) left #sourceview
250 (01:23:25) muntyan (~muntyan@pool-71-113-224-59.herntx.dsl-w.verizon.net) left #sourceview
Attached Files
To refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.You are not allowed to attach a file to this page.