1 (20:34:17) Conversation with #sourceview at 2006-06-07 203417 on pbor 2 (20:34:17) You joined #sourceview 3 (20:34:17) irc.acc.umu.se sets mode: +nt 4 (20:53:48) barisione (~firstname.lastname@example.org) joined #sourceview 5 (20:54:27) <pbor> ciao 6 (20:54:34) <barisione> ciao 7 (20:54:54) <pbor> intanto che muntyan arriva ti faccio il paste della mia nota tomboy 8 (20:55:00) <barisione> ok 9 (20:55:05) <pbor> NewEngineIssues 10 (20:55:05) <pbor> - async queue: offsets break. This is fixed in muntyan's tree, but a) the solution is not exactly trivial b) the solution is potentially expensive if the queue grows large (think of search&replace) 11 (20:55:05) <pbor> - highlighting breaks: we thought this was due to the queue issue, but there are still problems which we cannot pinpoint to an 'obvious' cause. In particular see the "debug thing" 12 (20:55:05) <pbor> - performance: even with the queue fixed and despite there are some good euristics to speed up the 'interactive' case, we still can come up with cases that fool the euristic: if 0. 13 (20:55:10) nud (~email@example.com) joined #sourceview 14 (20:55:57) <barisione> I don't understand the second point 15 (20:56:19) <pbor> barisione: the 'hl breaks' point? 16 (20:56:27) <barisione> yes 17 (20:56:33) <pbor> barisione: do you have a checkout of muntyan repo? 18 (20:57:01) <barisione> i'm downloading it now 19 (20:57:07) <pbor> ok 20 (20:57:15) muntyan (~firstname.lastname@example.org) joined #sourceview 21 (20:57:17) <muntyan> hi there 22 (20:57:36) <pbor> basically there is a (or maybe more) cases where the hl is not properly updated 23 (20:58:08) <pbor> those cases were crashing before the queue fixes, so they were 'hidden' 24 (20:58:13) <barisione> hi muntyan 25 (20:58:32) <pbor> muntyan: I pasted barisione the tomboy note I shown you before 26 (20:58:40) <muntyan> good 27 (20:58:59) <muntyan> so, who's talking? 28 (20:59:20) <muntyan> pbor: you rule the meeting 29 (20:59:25) <pbor> ok 30 (20:59:45) <pbor> I'd say point 1) the queue thingie is mostly a statement 31 (21:00:05) <pbor> so there is not much to discuss about it (at least at the begining) 32 (21:00:17) <pbor> there was an issue- 33 (21:00:22) <pbor> there is a solution 34 (21:00:46) <pbor> the solution is not very nice so we need to evaluate its cost 35 (21:01:03) <muntyan> err, i have a correction 36 (21:01:11) <pbor> ok, shoot 37 (21:01:38) <muntyan> we do not know if it's a solution; and we do not know what exactly to do with the queue, since sorting doesn't seem to be enough 38 (21:01:56) <muntyan> barisione: what do you think about this, will it work? 39 (21:02:20) <pbor> muntyan: yeah, that was what I was coming to... 40 (21:02:43) <muntyan> well, you started far away from it then :) 41 (21:02:50) <pbor> I was trying to introduce stuff: the summary is 42 (21:02:56) <pbor> we have a bunch of issues 43 (21:03:06) <barisione> (I was reading you last commits) 44 (21:03:25) <pbor> they are not normal bugs, but are fairly deep in how things works apparently 45 (21:03:34) <pbor> so my idea was: 46 (21:03:49) <pbor> - list the issues (at least the ones that we know) 47 (21:04:05) <pbor> - list the possible approaches to correct them 48 (21:04:23) <pbor> - evaluate if the result is not an ugly hack 49 (21:05:06) <pbor> obviosly if we can't fullfill the second, we need more drastic decisions and there is no point discussing the third :) 50 (21:05:55) <pbor> the issues we know are the ones I pasted above 51 (21:06:01) *muntyan hopes the queue will save us 52 (21:06:31) <barisione> muntyan: in you last email you wrote: 53 (21:06:31) <barisione> The obvious one is to have timer and make update_syntax have 54 (21:06:31) <barisione> an extra argument - the time it's allowed to spend. It doesn't look 55 (21:06:31) <barisione> pretty though. 56 (21:06:42) <barisione> how do you plan to do it? 57 (21:07:52) <muntyan> well, the idle_worker creates a timer, and calls update_syntax(..., how_much_time_left) 58 (21:08:00) <muntyan> in a loop 59 (21:08:15) <barisione> how big should be the batch you read? 60 (21:08:18) <muntyan> update_syntax in turn processes batches of fixed size until it runs out of time 61 (21:08:27) <muntyan> i think 4k chars or so 62 (21:08:33) <barisione> you don't know how much text you will need 63 (21:08:34) <muntyan> the initial batch size looks good 64 (21:08:42) <muntyan> but i guess it needs experimenting 65 (21:09:06) <muntyan> right, i don't know 66 (21:09:12) <muntyan> but it's not a problem 67 (21:09:23) <barisione> i tested gtktextbuffer, reading a big batch is two time faster than reading it line by line 68 (21:09:41) <barisione> it's for this reason i wrote the ugly LineReader stuff 69 (21:09:59) <muntyan> i read line by line in my thing, and the most time is spent in regex searching and redrawing 70 (21:10:20) <muntyan> so i don't think line vs big batch will be an issue 71 (21:10:24) <muntyan> plus, why line? 72 (21:10:39) <muntyan> many lines at once, how many will fit into a batch 73 (21:11:31) <muntyan> anyway, this is *not* the most important issue, imo 74 (21:11:32) <barisione> I tested it line by line, it's the opposite case of reading a big batch 75 (21:11:38) <barisione> it was just testing 76 (21:11:54) <muntyan> it doesn't affect how we deal with modifications; and modifications is the thing that needs to be fixed 77 (21:12:27) <muntyan> i mean, once we know how to make it *work*, we can think how to make it work fast and nice 78 (21:13:09) *muntyan is not good in those idle things and batches, the stuff about loops and timers is just some vague ideas 79 (21:13:21) <barisione> however i like you idea on how to modify the engine 80 (21:13:53) <muntyan> barisione: tell more :) 81 (21:14:16) <muntyan> barisione: you know the code, i know only some tiny bits 82 (21:15:17) <barisione> I don't know how to handle modifications that don't fall completely in a batch 83 (21:15:57) <muntyan> looks like splitting them should work fine 84 (21:16:37) <muntyan> say, we process now chars from 0 to 1000; if there are twenty chars deleted from 990 to 1010, it's okay to process first deletion from 990 to 1000, and then from 1000 to 1010 85 (21:16:52) <muntyan> since those chars after 1000 won't fall into the batch, and analysis won't touch them 86 (21:17:35) <barisione> yes it should work 87 (21:17:36) <muntyan> (we do need to avoid analyzing-whole-remainder) 88 (21:18:42) <muntyan> what i worry about is that 'CHANGED' modification. i looked at the code, and apparently it should just work (with corrections in the places that modify offsets in the tree). but i don't know if it works. do you think it will work? 89 (21:19:39) <barisione> i'm trying this approach on paper :) 90 (21:19:49) <muntyan> good :) 91 (21:19:54) <muntyan> you know what to draw :D 92 (21:24:31) <muntyan> by the way, LineReader is an auxiliary thing to avoid slow reading from buffer, and it can't be removed if we don't care about the performance (i don't want to remove it, i want to understand what it is). correct? 93 (21:24:45) <muntyan> err, it can be removed 94 (21:24:58) <barisione> yes 95 (21:25:22) <muntyan> then we can safely read big batches regardless of how much we analyze 96 (21:25:49) <muntyan> so the reading lines vs reading reading a batch won't be an issue 97 (21:26:02) <barisione> it's here because reading line by line is slow but reading a big batch is often not needed if the user just inserted a char 98 (21:26:11) <barisione> if you read big batches it works 99 (21:26:19) <barisione> without any problem 100 (21:26:29) <muntyan> well, it does read big batches now, and it doesn't seem to hurt; so reading big batches is good 101 (21:26:42) <muntyan> (maybe not huge batches, of course) 102 (21:27:46) <muntyan> it's better to read big chunk since after inserting a char we potentially modify big region. so if it's not expensive to always read big chunk of text, we should do it. and it doesn't seem to be expensive 103 (21:28:23) <pbor> muntyan: well, but you analyze the chunk line by line anyway... no? 104 (21:28:31) <barisione> i tried it and the engine was slower but gtktextbuffer seems faster than one year ago 105 (21:28:32) <muntyan> pbor: yes 106 (21:29:01) <muntyan> barisione: no no, i have gtk-2.6 as minimal requrement :) 107 (21:29:16) <barisione> are you using 2.6? 108 (21:29:17) <muntyan> pbor: my department got gtk-2.6! no more gtk-2.2! 109 (21:29:22) <muntyan> barisione: yes 110 (21:31:03) <barisione> i tried to read 10K of normal C code and reading it line by line was really slower than reading it in a single batch 111 (21:31:32) <pbor> mmm 112 (21:31:35) <muntyan> so let LineReader do its job! 113 (21:32:36) <pbor> which is a bit weird since TextBTree stores segments line by line, so reading a whole block should just read line by line and concat... 114 (21:33:29) <muntyan> maybe offsets are the problem? 115 (21:33:57) <muntyan> hm, if there is an iter, then forward_line should work fine 116 (21:34:12) <muntyan> well, maybe it was a textbuffer bug 117 (21:34:17) <muntyan> it's full of bugs, you know ;) 118 (21:34:25) *pbor knows 119 (21:34:32) <muntyan> anyway, it's not the most important thing! 120 (21:34:37) <pbor> indeed 121 (21:34:56) <pbor> I am most concerned about the "debug thing" 122 (21:35:04) <muntyan> what I need now is to know whether my approach is feasible so that I try to do that stuff. if not, i need to know what to do :) 123 (21:35:41) <muntyan> i hope that debug thing is caused by modification which falls into the batch but is not processed in it 124 (21:35:53) <muntyan> (it is the case with debug thing) 125 (21:36:11) <muntyan> barisione: what does your paper say? 126 (21:37:07) <barisione> your idea should work, CHANGED should work 127 (21:37:32) <barisione> but how to you plan to use the old tree? 128 (21:37:59) <muntyan> heh, this is what i was asking about :) 129 (21:38:13) <muntyan> i have no idea how to use it, i hoped that the code will just do it right 130 (21:39:07) <muntyan> it can merge new nodes and old nodes now for deletions and insertions, right? i hope that it will just do it for 'changed' 131 (21:39:57) <pbor> well, as far as I can see the tree doesn't care about insert and delete 132 (21:40:05) <muntyan> hm, actually, this can be easily tested by manual injecting CHANGED modification into the queue and seeing what happens 133 (21:40:05) <pbor> so it should not care about changed 134 (21:40:13) <muntyan> yeah, it only adjusts offsets accordingly 135 (21:40:35) <pbor> yep 136 (21:40:35) <barisione> yes for changed it should work because, as pbor said, it doesn't care about insert or delete 137 (21:40:54) <muntyan> barisione: could you tell in couple of words how tags are applied and removed in the reanalyzed regions? 138 (21:41:14) <barisione> but if the tree cannot be used do you plan to discard it as the current engine does? 139 (21:41:29) <muntyan> no, i don't want to discard everything 140 (21:41:45) <muntyan> i think it should keep trying till the end 141 (21:43:30) <barisione> the code to apply tags is mostly copied from the old engine 142 (21:43:45) <pbor> as far as I understood from the code, the reasoning is that if doesn't merge "soon" it will prolly not merge at all 143 (21:44:00) <barisione> pbor: yep 144 (21:44:06) <muntyan> err, is it really the case? 145 (21:44:20) <barisione> I copied from other engines (colorer?) 146 (21:44:55) <muntyan> well, i personally don't see a reason to give up after one batch 147 (21:44:56) <barisione> after something like 100 lines of code is really really difficult to reuse the old contexts 148 (21:45:27) <muntyan> hm 149 (21:45:51) <pbor> barisione: does colorer use a full tree like the new-engine? 150 (21:45:52) <muntyan> the tree is mixed with the text 151 (21:46:29) <muntyan> i like more an approach of having tree and the segments separately 152 (21:46:35) <muntyan> then it is easy to reuse old stuff 153 (21:46:59) <muntyan> since two segments are 'the same' if and only if they point to the same node 154 (21:47:07) <muntyan> but it would require lot of changes 155 (21:47:31) <barisione> I don't remember exactly how colorer works 156 (21:52:17) <pbor> barisione: any opinion on the debug thing miscoloring? do you think it's related to the queue? 157 (21:53:34) <barisione> i can't test it now as I'm using windows 158 (21:53:50) <pbor> okay 159 (21:54:12) <barisione> and i cannot reboot as booting in windows does not always work on my old laptop 160 (21:54:23) <pbor> no prob 161 (21:54:59) <pbor> the third issue in the list has to do with batch estimation... 162 (21:55:06) <barisione> and sadly i need windows until i finish a stupid vnc clone written in C# for university :( 163 (21:55:35) <pbor> (or least that was our guess) 164 (21:55:44) <pbor> barisione: server or client? 165 (21:56:03) *pbor would love a decent vnc client, even on mono :) 166 (21:56:27) <barisione> server and client, it's similar to vnc but it doesn't use the vnc protocol. it's not really usable 167 (21:56:30) <muntyan> so, what about the simplest thing now: adding CHANGED, sorting the queue, getting the modifications in a batch correct, and leaving the rest as it is 168 (21:56:46) <barisione> vnc works surely better 169 (21:56:54) <muntyan> then we can see what happens, if it works we can think of batch size and stuff 170 (21:57:15) <barisione> muntyan: I agree with you 171 (21:57:17) <pbor> muntyan: sounds like a plan 172 (21:57:29) <muntyan> good 173 (21:57:47) <muntyan> then i'm leaving. real life calling :) 174 (21:57:54) <pbor> ok 175 (21:57:54) <muntyan> thanks guys, see you, etc. 176 (21:57:55) <barisione> bye 177 (21:58:13) <pbor> thanks for your help marco 178 (21:58:18) <barisione> pbor: I need to ask you something about ErrRegex 179 (21:58:23) <barisione> EggRegex 180 (21:58:35) <pbor> sure (though I know little about it) 181 (21:59:37) <muntyan> oh, i forgot about that! i have a question about regex too 182 (21:59:57) pbor set topic on #sourceview: regex! 183 (22:00:03) <barisione> muntyan: ask you question 184 (22:00:12) <muntyan> barisione: remember there was a disucssion in bugzilla about offsets vs indices; and you with mclasen seem to have agreed that it should be changed to use offsets 185 (22:00:27) <muntyan> barisione: so, did you do it in gtksourceview's eggerex? 186 (22:00:35) <barisione> it's what I was going to ask :) 187 (22:00:57) <barisione> mclasen, paolo and I preferred offsets 188 (22:01:05) <muntyan> don't change it! there are people (me) who use eggregex a lot, as it is 189 (22:01:07) <muntyan> :) 190 (22:01:29) <barisione> but it has several problems as pcre always uses indices 191 (22:01:38) <barisione> are you using offsets or indices? 192 (22:01:46) <muntyan> indices are better because it's easier to work with the subject string when you have indices. you don't have to use g_utf8_stuff 193 (22:01:52) <muntyan> i am using indices 194 (22:02:23) <muntyan> if one works with "words" and "chars", then offsets are better. but who in C works with "words" and "chars"? 195 (22:02:46) <barisione> the current version uses offsets 196 (22:02:59) <muntyan> um, then i have old version? 197 (22:03:14) <barisione> the version in gtksourceview 198 (22:03:24) <muntyan> specifically, egg_regex_fetch_pos() 199 (22:03:33) <muntyan> does it returns char offsets or byte indices? 200 (22:03:46) <muntyan> (my copy returns bytes) 201 (22:04:29) <pbor> is the api big? to me it would sound sane have both... 202 (22:04:55) <muntyan> hm, offsets are good for textbuffer 203 (22:05:03) <pbor> yep 204 (22:05:04) <muntyan> since you can't work with bytes in a textbuffer 205 (22:05:15) <pbor> my point exactly 206 (22:05:15) <barisione> the version in gtksourceview uses chars 207 (22:05:29) <barisione> i converted both the engine and eggregex to that 208 (22:05:50) <muntyan> api is quite big, like almost every function is affected :) 209 (22:05:56) <barisione> the old engine was really ugly because it kept in memory both indices and offsets 210 (22:06:04) <pbor> so it means that muntyan copied eggregex from sourceview... so it's using it patched 211 (22:06:18) <pbor> s/it/he 212 (22:06:34) <muntyan> what do you mean? 213 (22:06:51) <pbor> <muntyan> i am using indices 214 (22:07:03) <muntyan> no, i am using non-patched version :) 215 (22:07:03) <pbor> gah 216 (22:07:07) <pbor> sorry 217 (22:07:15) <muntyan> i have eggregex copied long time ago 218 (22:07:20) <barisione> contextengine.c uses only offsets, converting it back to indices (if we decide to do so) should not be difficult 219 (22:07:20) <muntyan> before it was converted to offsets 220 (22:07:29) <pbor> yes 221 (22:07:40) <muntyan> well, converting my code shouldn't be difficult either :) 222 (22:07:47) <muntyan> whatever is better will be good 223 (22:07:53) <barisione> in the cvs on sourceforge you can find the conversion from indices to offsets 224 (22:08:00) <muntyan> and perhaps offsets are really better for textbuffer 225 (22:08:19) <muntyan> barisione: you mean eggregex changes? 226 (22:08:22) <barisione> yes they are better but they could be terribly slow!!! 227 (22:08:31) <barisione> no changes to the engine 228 (22:08:45) <muntyan> barisione: i am not using gtksourceview :) 229 (22:08:54) <muntyan> are they slow or they could be slow? 230 (22:09:40) <muntyan> hm, wait 231 (22:09:44) <barisione> my version of eggregex needs to do some conversions from offsets to indices but it the text you are analyzing is big than this is slow! 232 (22:09:46) <muntyan> if engine uses LineReader 233 (22:09:55) <muntyan> then it uses char*, and it doesn't care about textbuffer 234 (22:10:03) <muntyan> offsets are needed only for tags 235 (22:10:15) <muntyan> ah, i got why you kept both offsets and indices 236 (22:10:28) <muntyan> hm, situation 237 (22:11:41) <muntyan> okay guys, now really have to go 238 (22:11:44) <muntyan> see you 239 (22:11:50) <barisione> bye muntyan 240 (22:11:57) <barisione> i will wrote a mail on this 241 (22:14:51) <pbor> ok 242 (22:14:55) <pbor> I have to go too 243 (22:14:57) <pbor> bye 244 (22:15:16) <pbor> I'll pass the logs to paolo if you are ok 245 (22:16:23) <barisione> ok 246 (22:16:26) <barisione> bye 247 (22:16:52) You are now known as pbor|out 248 (22:27:06) barisione (~email@example.com) left #sourceview 249 (00:10:03) nud (~firstname.lastname@example.org) left #sourceview 250 (01:23:25) muntyan (~email@example.com) left #sourceview
Attached FilesTo refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.
You are not allowed to attach a file to this page.