(20:34:17) Conversation with #sourceview at 2006-06-07 203417 on pbor (20:34:17) You joined #sourceview (20:34:17) irc.acc.umu.se sets mode: +nt (20:53:48) barisione (~demian@81-208-36-94.ip.fastwebnet.it) joined #sourceview (20:54:27) ciao (20:54:34) ciao (20:54:54) intanto che muntyan arriva ti faccio il paste della mia nota tomboy (20:55:00) ok (20:55:05) NewEngineIssues (20:55:05) - async queue: offsets break. This is fixed in muntyan's tree, but a) the solution is not exactly trivial b) the solution is potentially expensive if the queue grows large (think of search&replace) (20:55:05) - highlighting breaks: we thought this was due to the queue issue, but there are still problems which we cannot pinpoint to an 'obvious' cause. In particular see the "debug thing" (20:55:05) - performance: even with the queue fixed and despite there are some good euristics to speed up the 'interactive' case, we still can come up with cases that fool the euristic: if 0. (20:55:10) nud (~sf@d83-182-30-136.cust.tele2.be) joined #sourceview (20:55:57) I don't understand the second point (20:56:19) barisione: the 'hl breaks' point? (20:56:27) yes (20:56:33) barisione: do you have a checkout of muntyan repo? (20:57:01) i'm downloading it now (20:57:07) ok (20:57:15) muntyan (~muntyan@pool-71-113-224-59.herntx.dsl-w.verizon.net) joined #sourceview (20:57:17) hi there (20:57:36) basically there is a (or maybe more) cases where the hl is not properly updated (20:58:08) those cases were crashing before the queue fixes, so they were 'hidden' (20:58:13) hi muntyan (20:58:32) muntyan: I pasted barisione the tomboy note I shown you before (20:58:40) good (20:58:59) so, who's talking? (20:59:20) pbor: you rule the meeting (20:59:25) ok (20:59:45) I'd say point 1) the queue thingie is mostly a statement (21:00:05) so there is not much to discuss about it (at least at the begining) (21:00:17) there was an issue- (21:00:22) there is a solution (21:00:46) the solution is not very nice so we need to evaluate its cost (21:01:03) err, i have a correction (21:01:11) ok, shoot (21:01:38) we do not know if it's a solution; and we do not know what exactly to do with the queue, since sorting doesn't seem to be enough (21:01:56) barisione: what do you think about this, will it work? (21:02:20) muntyan: yeah, that was what I was coming to... (21:02:43) well, you started far away from it then :) (21:02:50) I was trying to introduce stuff: the summary is (21:02:56) we have a bunch of issues (21:03:06) (I was reading you last commits) (21:03:25) they are not normal bugs, but are fairly deep in how things works apparently (21:03:34) so my idea was: (21:03:49) - list the issues (at least the ones that we know) (21:04:05) - list the possible approaches to correct them (21:04:23) - evaluate if the result is not an ugly hack (21:05:06) obviosly if we can't fullfill the second, we need more drastic decisions and there is no point discussing the third :) (21:05:55) the issues we know are the ones I pasted above (21:06:01) *muntyan hopes the queue will save us (21:06:31) muntyan: in you last email you wrote: (21:06:31) The obvious one is to have timer and make update_syntax have (21:06:31) an extra argument - the time it's allowed to spend. It doesn't look (21:06:31) pretty though. (21:06:42) how do you plan to do it? (21:07:52) well, the idle_worker creates a timer, and calls update_syntax(..., how_much_time_left) (21:08:00) in a loop (21:08:15) how big should be the batch you read? (21:08:18) update_syntax in turn processes batches of fixed size until it runs out of time (21:08:27) i think 4k chars or so (21:08:33) you don't know how much text you will need (21:08:34) the initial batch size looks good (21:08:42) but i guess it needs experimenting (21:09:06) right, i don't know (21:09:12) but it's not a problem (21:09:23) i tested gtktextbuffer, reading a big batch is two time faster than reading it line by line (21:09:41) it's for this reason i wrote the ugly LineReader stuff (21:09:59) i read line by line in my thing, and the most time is spent in regex searching and redrawing (21:10:20) so i don't think line vs big batch will be an issue (21:10:24) plus, why line? (21:10:39) many lines at once, how many will fit into a batch (21:11:31) anyway, this is *not* the most important issue, imo (21:11:32) I tested it line by line, it's the opposite case of reading a big batch (21:11:38) it was just testing (21:11:54) it doesn't affect how we deal with modifications; and modifications is the thing that needs to be fixed (21:12:27) i mean, once we know how to make it *work*, we can think how to make it work fast and nice (21:13:09) *muntyan is not good in those idle things and batches, the stuff about loops and timers is just some vague ideas (21:13:21) however i like you idea on how to modify the engine (21:13:53) barisione: tell more :) (21:14:16) barisione: you know the code, i know only some tiny bits (21:15:17) I don't know how to handle modifications that don't fall completely in a batch (21:15:57) looks like splitting them should work fine (21:16:37) say, we process now chars from 0 to 1000; if there are twenty chars deleted from 990 to 1010, it's okay to process first deletion from 990 to 1000, and then from 1000 to 1010 (21:16:52) since those chars after 1000 won't fall into the batch, and analysis won't touch them (21:17:35) yes it should work (21:17:36) (we do need to avoid analyzing-whole-remainder) (21:18:42) what i worry about is that 'CHANGED' modification. i looked at the code, and apparently it should just work (with corrections in the places that modify offsets in the tree). but i don't know if it works. do you think it will work? (21:19:39) i'm trying this approach on paper :) (21:19:49) good :) (21:19:54) you know what to draw :D (21:24:31) by the way, LineReader is an auxiliary thing to avoid slow reading from buffer, and it can't be removed if we don't care about the performance (i don't want to remove it, i want to understand what it is). correct? (21:24:45) err, it can be removed (21:24:58) yes (21:25:22) then we can safely read big batches regardless of how much we analyze (21:25:49) so the reading lines vs reading reading a batch won't be an issue (21:26:02) it's here because reading line by line is slow but reading a big batch is often not needed if the user just inserted a char (21:26:11) if you read big batches it works (21:26:19) without any problem (21:26:29) well, it does read big batches now, and it doesn't seem to hurt; so reading big batches is good (21:26:42) (maybe not huge batches, of course) (21:27:46) it's better to read big chunk since after inserting a char we potentially modify big region. so if it's not expensive to always read big chunk of text, we should do it. and it doesn't seem to be expensive (21:28:23) muntyan: well, but you analyze the chunk line by line anyway... no? (21:28:31) i tried it and the engine was slower but gtktextbuffer seems faster than one year ago (21:28:32) pbor: yes (21:29:01) barisione: no no, i have gtk-2.6 as minimal requrement :) (21:29:16) are you using 2.6? (21:29:17) pbor: my department got gtk-2.6! no more gtk-2.2! (21:29:22) barisione: yes (21:31:03) i tried to read 10K of normal C code and reading it line by line was really slower than reading it in a single batch (21:31:32) mmm (21:31:35) so let LineReader do its job! (21:32:36) which is a bit weird since TextBTree stores segments line by line, so reading a whole block should just read line by line and concat... (21:33:29) maybe offsets are the problem? (21:33:57) hm, if there is an iter, then forward_line should work fine (21:34:12) well, maybe it was a textbuffer bug (21:34:17) it's full of bugs, you know ;) (21:34:25) *pbor knows (21:34:32) anyway, it's not the most important thing! (21:34:37) indeed (21:34:56) I am most concerned about the "debug thing" (21:35:04) what I need now is to know whether my approach is feasible so that I try to do that stuff. if not, i need to know what to do :) (21:35:41) i hope that debug thing is caused by modification which falls into the batch but is not processed in it (21:35:53) (it is the case with debug thing) (21:36:11) barisione: what does your paper say? (21:37:07) your idea should work, CHANGED should work (21:37:32) but how to you plan to use the old tree? (21:37:59) heh, this is what i was asking about :) (21:38:13) i have no idea how to use it, i hoped that the code will just do it right (21:39:07) it can merge new nodes and old nodes now for deletions and insertions, right? i hope that it will just do it for 'changed' (21:39:57) well, as far as I can see the tree doesn't care about insert and delete (21:40:05) hm, actually, this can be easily tested by manual injecting CHANGED modification into the queue and seeing what happens (21:40:05) so it should not care about changed (21:40:13) yeah, it only adjusts offsets accordingly (21:40:35) yep (21:40:35) yes for changed it should work because, as pbor said, it doesn't care about insert or delete (21:40:54) barisione: could you tell in couple of words how tags are applied and removed in the reanalyzed regions? (21:41:14) but if the tree cannot be used do you plan to discard it as the current engine does? (21:41:29) no, i don't want to discard everything (21:41:45) i think it should keep trying till the end (21:43:30) the code to apply tags is mostly copied from the old engine (21:43:45) as far as I understood from the code, the reasoning is that if doesn't merge "soon" it will prolly not merge at all (21:44:00) pbor: yep (21:44:06) err, is it really the case? (21:44:20) I copied from other engines (colorer?) (21:44:55) well, i personally don't see a reason to give up after one batch (21:44:56) after something like 100 lines of code is really really difficult to reuse the old contexts (21:45:27) hm (21:45:51) barisione: does colorer use a full tree like the new-engine? (21:45:52) the tree is mixed with the text (21:46:29) i like more an approach of having tree and the segments separately (21:46:35) then it is easy to reuse old stuff (21:46:59) since two segments are 'the same' if and only if they point to the same node (21:47:07) but it would require lot of changes (21:47:31) I don't remember exactly how colorer works (21:52:17) barisione: any opinion on the debug thing miscoloring? do you think it's related to the queue? (21:53:34) i can't test it now as I'm using windows (21:53:50) okay (21:54:12) and i cannot reboot as booting in windows does not always work on my old laptop (21:54:23) no prob (21:54:59) the third issue in the list has to do with batch estimation... (21:55:06) and sadly i need windows until i finish a stupid vnc clone written in C# for university :( (21:55:35) (or least that was our guess) (21:55:44) barisione: server or client? (21:56:03) *pbor would love a decent vnc client, even on mono :) (21:56:27) server and client, it's similar to vnc but it doesn't use the vnc protocol. it's not really usable (21:56:30) so, what about the simplest thing now: adding CHANGED, sorting the queue, getting the modifications in a batch correct, and leaving the rest as it is (21:56:46) vnc works surely better (21:56:54) then we can see what happens, if it works we can think of batch size and stuff (21:57:15) muntyan: I agree with you (21:57:17) muntyan: sounds like a plan (21:57:29) good (21:57:47) then i'm leaving. real life calling :) (21:57:54) ok (21:57:54) thanks guys, see you, etc. (21:57:55) bye (21:58:13) thanks for your help marco (21:58:18) pbor: I need to ask you something about ErrRegex (21:58:23) EggRegex (21:58:35) sure (though I know little about it) (21:59:37) oh, i forgot about that! i have a question about regex too (21:59:57) pbor set topic on #sourceview: regex! (22:00:03) muntyan: ask you question (22:00:12) barisione: remember there was a disucssion in bugzilla about offsets vs indices; and you with mclasen seem to have agreed that it should be changed to use offsets (22:00:27) barisione: so, did you do it in gtksourceview's eggerex? (22:00:35) it's what I was going to ask :) (22:00:57) mclasen, paolo and I preferred offsets (22:01:05) don't change it! there are people (me) who use eggregex a lot, as it is (22:01:07) :) (22:01:29) but it has several problems as pcre always uses indices (22:01:38) are you using offsets or indices? (22:01:46) indices are better because it's easier to work with the subject string when you have indices. you don't have to use g_utf8_stuff (22:01:52) i am using indices (22:02:23) if one works with "words" and "chars", then offsets are better. but who in C works with "words" and "chars"? (22:02:46) the current version uses offsets (22:02:59) um, then i have old version? (22:03:14) the version in gtksourceview (22:03:24) specifically, egg_regex_fetch_pos() (22:03:33) does it returns char offsets or byte indices? (22:03:46) (my copy returns bytes) (22:04:29) is the api big? to me it would sound sane have both... (22:04:55) hm, offsets are good for textbuffer (22:05:03) yep (22:05:04) since you can't work with bytes in a textbuffer (22:05:15) my point exactly (22:05:15) the version in gtksourceview uses chars (22:05:29) i converted both the engine and eggregex to that (22:05:50) api is quite big, like almost every function is affected :) (22:05:56) the old engine was really ugly because it kept in memory both indices and offsets (22:06:04) so it means that muntyan copied eggregex from sourceview... so it's using it patched (22:06:18) s/it/he (22:06:34) what do you mean? (22:06:51) i am using indices (22:07:03) no, i am using non-patched version :) (22:07:03) gah (22:07:07) sorry (22:07:15) i have eggregex copied long time ago (22:07:20) contextengine.c uses only offsets, converting it back to indices (if we decide to do so) should not be difficult (22:07:20) before it was converted to offsets (22:07:29) yes (22:07:40) well, converting my code shouldn't be difficult either :) (22:07:47) whatever is better will be good (22:07:53) in the cvs on sourceforge you can find the conversion from indices to offsets (22:08:00) and perhaps offsets are really better for textbuffer (22:08:19) barisione: you mean eggregex changes? (22:08:22) yes they are better but they could be terribly slow!!! (22:08:31) no changes to the engine (22:08:45) barisione: i am not using gtksourceview :) (22:08:54) are they slow or they could be slow? (22:09:40) hm, wait (22:09:44) my version of eggregex needs to do some conversions from offsets to indices but it the text you are analyzing is big than this is slow! (22:09:46) if engine uses LineReader (22:09:55) then it uses char*, and it doesn't care about textbuffer (22:10:03) offsets are needed only for tags (22:10:15) ah, i got why you kept both offsets and indices (22:10:28) hm, situation (22:11:41) okay guys, now really have to go (22:11:44) see you (22:11:50) bye muntyan (22:11:57) i will wrote a mail on this (22:14:51) ok (22:14:55) I have to go too (22:14:57) bye (22:15:16) I'll pass the logs to paolo if you are ok (22:16:23) ok (22:16:26) bye (22:16:52) You are now known as pbor|out (22:27:06) barisione (~demian@81-208-36-94.ip.fastwebnet.it) left #sourceview (00:10:03) nud (~sf@d83-182-30-136.cust.tele2.be) left #sourceview (01:23:25) muntyan (~muntyan@pool-71-113-224-59.herntx.dsl-w.verizon.net) left #sourceview