Attachment '2006-06-07.txt'

Download

   1 (20:34:17) Conversation with #sourceview at 2006-06-07 203417 on pbor
   2 (20:34:17) You joined #sourceview
   3 (20:34:17) irc.acc.umu.se sets mode: +nt
   4 (20:53:48) barisione (~demian@81-208-36-94.ip.fastwebnet.it) joined #sourceview
   5 (20:54:27) <pbor> ciao
   6 (20:54:34) <barisione> ciao
   7 (20:54:54) <pbor> intanto che muntyan arriva ti faccio il paste della mia nota tomboy
   8 (20:55:00) <barisione> ok
   9 (20:55:05) <pbor> NewEngineIssues
  10 (20:55:05) <pbor> - async queue: offsets break. This is fixed in muntyan's tree, but a) the solution is not exactly trivial b) the solution is potentially expensive if the queue grows large (think of search&replace)
  11 (20:55:05) <pbor> - highlighting breaks: we thought this was due to the queue issue, but there are still problems which we cannot pinpoint to an 'obvious' cause. In particular see the "debug thing"
  12 (20:55:05) <pbor> - performance: even with the queue fixed and despite there are some good euristics to speed up the 'interactive' case, we still can come up with cases that fool the euristic: if 0.
  13 (20:55:10) nud (~sf@d83-182-30-136.cust.tele2.be) joined #sourceview
  14 (20:55:57) <barisione> I don't understand the second point
  15 (20:56:19) <pbor> barisione: the 'hl breaks' point?
  16 (20:56:27) <barisione> yes
  17 (20:56:33) <pbor> barisione: do you have a checkout of muntyan repo?
  18 (20:57:01) <barisione> i'm downloading it now
  19 (20:57:07) <pbor> ok
  20 (20:57:15) muntyan (~muntyan@pool-71-113-224-59.herntx.dsl-w.verizon.net) joined #sourceview
  21 (20:57:17) <muntyan> hi there
  22 (20:57:36) <pbor> basically there is a (or maybe more) cases where the hl is not properly updated
  23 (20:58:08) <pbor> those cases were crashing before the queue fixes, so they were 'hidden'
  24 (20:58:13) <barisione> hi muntyan
  25 (20:58:32) <pbor> muntyan: I pasted barisione the tomboy note I shown you before
  26 (20:58:40) <muntyan> good
  27 (20:58:59) <muntyan> so, who's talking?
  28 (20:59:20) <muntyan> pbor: you rule the meeting
  29 (20:59:25) <pbor> ok
  30 (20:59:45) <pbor> I'd say point 1) the queue thingie is mostly a statement
  31 (21:00:05) <pbor> so there is not much to discuss about it (at least at the begining)
  32 (21:00:17) <pbor> there was an issue-
  33 (21:00:22) <pbor> there is a solution
  34 (21:00:46) <pbor> the solution is not very nice so we need to evaluate its cost
  35 (21:01:03) <muntyan> err, i have a correction
  36 (21:01:11) <pbor> ok, shoot
  37 (21:01:38) <muntyan> we do not know if it's a solution; and we do not know what exactly to do with the queue, since sorting doesn't seem to be enough
  38 (21:01:56) <muntyan> barisione: what do you think about this, will it work?
  39 (21:02:20) <pbor> muntyan: yeah, that was what I was coming to...
  40 (21:02:43) <muntyan> well, you started far away from it then :)
  41 (21:02:50) <pbor> I was trying to introduce stuff: the summary is
  42 (21:02:56) <pbor> we have a bunch of issues
  43 (21:03:06) <barisione> (I was reading you last commits)
  44 (21:03:25) <pbor> they are not normal bugs, but are fairly deep in how things works apparently
  45 (21:03:34) <pbor> so my idea was:
  46 (21:03:49) <pbor> - list the issues (at least the ones that we know)
  47 (21:04:05) <pbor> - list the possible approaches to correct them
  48 (21:04:23) <pbor> - evaluate if the result is not an ugly hack
  49 (21:05:06) <pbor> obviosly if we can't fullfill the second, we need more drastic decisions and there is no point discussing the third :)
  50 (21:05:55) <pbor> the issues we know are the ones I pasted above
  51 (21:06:01) *muntyan hopes the queue will save us
  52 (21:06:31) <barisione> muntyan: in you last email you wrote:
  53 (21:06:31) <barisione> The obvious one is to have timer and make update_syntax have
  54 (21:06:31) <barisione> an extra argument - the time it's allowed to spend. It doesn't look
  55 (21:06:31) <barisione> pretty though.
  56 (21:06:42) <barisione> how do you plan to do it?
  57 (21:07:52) <muntyan> well, the idle_worker creates a timer, and calls update_syntax(..., how_much_time_left)
  58 (21:08:00) <muntyan> in a loop
  59 (21:08:15) <barisione> how big should be the batch you read?
  60 (21:08:18) <muntyan> update_syntax in turn processes batches of fixed size until it runs out of time
  61 (21:08:27) <muntyan> i think 4k chars or so
  62 (21:08:33) <barisione> you don't know how much text you will need
  63 (21:08:34) <muntyan> the initial batch size looks good
  64 (21:08:42) <muntyan> but i guess it needs experimenting
  65 (21:09:06) <muntyan> right, i don't know
  66 (21:09:12) <muntyan> but it's not a problem
  67 (21:09:23) <barisione> i tested gtktextbuffer, reading a big batch is two time faster than reading it line by line
  68 (21:09:41) <barisione> it's for this reason i wrote the ugly LineReader stuff
  69 (21:09:59) <muntyan> i read line by line in my thing, and the most time is spent in regex searching and redrawing
  70 (21:10:20) <muntyan> so i don't think line vs big batch will be an issue
  71 (21:10:24) <muntyan> plus, why line?
  72 (21:10:39) <muntyan> many lines at once, how many will fit into a batch
  73 (21:11:31) <muntyan> anyway, this is *not* the most important issue, imo
  74 (21:11:32) <barisione> I tested it line by line, it's the opposite case of reading a big batch
  75 (21:11:38) <barisione> it was just testing
  76 (21:11:54) <muntyan> it doesn't affect how we deal with modifications; and modifications is the thing that needs to be fixed
  77 (21:12:27) <muntyan> i mean, once we know how to make it *work*, we can think how to make it work fast and nice
  78 (21:13:09) *muntyan is not good in those idle things and batches, the stuff about loops and timers is just some vague ideas
  79 (21:13:21) <barisione> however i like you idea on how to modify the engine
  80 (21:13:53) <muntyan> barisione: tell more :)
  81 (21:14:16) <muntyan> barisione: you know the code, i know only some tiny bits
  82 (21:15:17) <barisione> I don't know how to handle modifications that don't fall completely in a batch
  83 (21:15:57) <muntyan> looks like splitting them should work fine
  84 (21:16:37) <muntyan> say, we process now chars from 0 to 1000; if there are twenty chars deleted from 990 to 1010, it's okay to process first deletion from 990 to 1000, and then from 1000 to 1010
  85 (21:16:52) <muntyan> since those chars after 1000 won't fall into the batch, and analysis won't touch them
  86 (21:17:35) <barisione> yes it should work
  87 (21:17:36) <muntyan> (we do need to avoid analyzing-whole-remainder)
  88 (21:18:42) <muntyan> what i worry about is that 'CHANGED' modification. i looked at the code, and apparently it should just work (with corrections in the places that modify offsets in the tree). but i don't know if it works. do you think it will work?
  89 (21:19:39) <barisione> i'm trying this approach on paper :)
  90 (21:19:49) <muntyan> good :)
  91 (21:19:54) <muntyan> you know what to draw :D
  92 (21:24:31) <muntyan> by the way, LineReader is an auxiliary thing to avoid slow reading from buffer, and it can't be removed if we don't care about the performance (i don't want to remove it, i want to understand what it is). correct?
  93 (21:24:45) <muntyan> err, it can be removed
  94 (21:24:58) <barisione> yes
  95 (21:25:22) <muntyan> then we can safely read big batches regardless of how much we analyze
  96 (21:25:49) <muntyan> so the reading lines vs reading reading a batch won't be an issue
  97 (21:26:02) <barisione> it's here because reading line by line is slow but reading a big batch is often not needed if the user just inserted a char
  98 (21:26:11) <barisione> if you read big batches it works
  99 (21:26:19) <barisione> without any problem
 100 (21:26:29) <muntyan> well, it does read big batches now, and it doesn't seem to hurt; so reading big batches is good
 101 (21:26:42) <muntyan> (maybe not huge batches, of course)
 102 (21:27:46) <muntyan> it's better to read big chunk since after inserting a char we potentially modify big region. so if it's not expensive to always read big chunk of text, we should do it. and it doesn't seem to be expensive
 103 (21:28:23) <pbor> muntyan: well, but you analyze the chunk line by line anyway... no?
 104 (21:28:31) <barisione> i tried it and the engine was slower but gtktextbuffer seems faster than one year ago
 105 (21:28:32) <muntyan> pbor: yes
 106 (21:29:01) <muntyan> barisione: no no, i have gtk-2.6 as minimal requrement :)
 107 (21:29:16) <barisione> are you using 2.6?
 108 (21:29:17) <muntyan> pbor: my department got gtk-2.6! no more gtk-2.2!
 109 (21:29:22) <muntyan> barisione: yes
 110 (21:31:03) <barisione> i tried to read 10K of normal C code and reading it line by line was really slower than reading it in a single batch
 111 (21:31:32) <pbor> mmm
 112 (21:31:35) <muntyan> so let LineReader do its job!
 113 (21:32:36) <pbor> which is a bit weird since TextBTree stores segments line by line, so reading a whole block should just read line by line and concat...
 114 (21:33:29) <muntyan> maybe offsets are the problem?
 115 (21:33:57) <muntyan> hm, if there is an iter, then forward_line should work fine
 116 (21:34:12) <muntyan> well, maybe it was a textbuffer bug
 117 (21:34:17) <muntyan> it's full of bugs, you know ;)
 118 (21:34:25) *pbor knows
 119 (21:34:32) <muntyan> anyway, it's not the most important thing!
 120 (21:34:37) <pbor> indeed
 121 (21:34:56) <pbor> I am most concerned about the "debug thing"
 122 (21:35:04) <muntyan> what I need now is to know whether my approach is feasible so that I try to do that stuff. if not, i need to know what to do :)
 123 (21:35:41) <muntyan> i hope that debug thing is caused by modification which falls into the batch but is not processed in it
 124 (21:35:53) <muntyan> (it is the case with debug thing)
 125 (21:36:11) <muntyan> barisione: what does your paper say?
 126 (21:37:07) <barisione> your idea should work, CHANGED should work
 127 (21:37:32) <barisione> but how to you plan to use the old tree?
 128 (21:37:59) <muntyan> heh, this is what i was asking about :)
 129 (21:38:13) <muntyan> i have no idea how to use it, i hoped that the code will just do it right
 130 (21:39:07) <muntyan> it can merge new nodes and old nodes now for deletions and insertions, right? i hope that it will just do it for 'changed'
 131 (21:39:57) <pbor> well, as far as I can see the tree doesn't care about insert and delete
 132 (21:40:05) <muntyan> hm, actually, this can be easily tested by manual injecting CHANGED modification into the queue and seeing what happens
 133 (21:40:05) <pbor> so it should not care about changed
 134 (21:40:13) <muntyan> yeah, it only adjusts offsets accordingly
 135 (21:40:35) <pbor> yep
 136 (21:40:35) <barisione> yes for changed it should work because, as pbor said, it doesn't care about insert or delete
 137 (21:40:54) <muntyan> barisione: could you tell in couple of words how tags are applied and removed in the reanalyzed regions?
 138 (21:41:14) <barisione> but if the tree cannot be used do you plan to discard it as the current engine does?
 139 (21:41:29) <muntyan> no, i don't want to discard everything
 140 (21:41:45) <muntyan> i think it should keep trying till the end
 141 (21:43:30) <barisione> the code to apply tags is mostly copied from the old engine
 142 (21:43:45) <pbor> as far as I understood from the code, the reasoning is that if doesn't merge "soon" it will prolly not merge at all
 143 (21:44:00) <barisione> pbor: yep
 144 (21:44:06) <muntyan> err, is it really the case?
 145 (21:44:20) <barisione> I copied from other engines (colorer?)
 146 (21:44:55) <muntyan> well, i personally don't see a reason to give up after one batch
 147 (21:44:56) <barisione> after something like 100 lines of code is really really difficult to reuse the old contexts
 148 (21:45:27) <muntyan> hm
 149 (21:45:51) <pbor> barisione: does colorer use a full tree like the new-engine?
 150 (21:45:52) <muntyan> the tree is mixed with the text
 151 (21:46:29) <muntyan> i like more an approach of having tree and the segments separately
 152 (21:46:35) <muntyan> then it is easy to reuse old stuff
 153 (21:46:59) <muntyan> since two segments are 'the same' if and only if they point to the same node
 154 (21:47:07) <muntyan> but it would require lot of changes
 155 (21:47:31) <barisione> I don't remember exactly how colorer works
 156 (21:52:17) <pbor> barisione: any opinion on the debug thing miscoloring? do you think it's related to the queue?
 157 (21:53:34) <barisione> i can't test it now as I'm using windows
 158 (21:53:50) <pbor> okay
 159 (21:54:12) <barisione> and i cannot reboot as booting in windows does not always work on my old laptop
 160 (21:54:23) <pbor> no prob
 161 (21:54:59) <pbor> the third issue in the list has to do with batch estimation...
 162 (21:55:06) <barisione> and sadly i need windows until i finish a stupid vnc clone written in C# for university  :(
 163 (21:55:35) <pbor> (or least that was our guess)
 164 (21:55:44) <pbor> barisione: server or client?
 165 (21:56:03) *pbor would love a decent vnc client, even on mono :)
 166 (21:56:27) <barisione> server and client, it's similar to vnc but it doesn't use the vnc protocol. it's not really usable
 167 (21:56:30) <muntyan> so, what about the simplest thing now: adding CHANGED, sorting the queue, getting the modifications in a batch correct, and leaving the rest as it is
 168 (21:56:46) <barisione> vnc works surely better
 169 (21:56:54) <muntyan> then we can see what happens, if it works we can think of batch size and stuff
 170 (21:57:15) <barisione> muntyan: I agree with you
 171 (21:57:17) <pbor> muntyan: sounds like a plan
 172 (21:57:29) <muntyan> good
 173 (21:57:47) <muntyan> then i'm leaving. real life calling :)
 174 (21:57:54) <pbor> ok
 175 (21:57:54) <muntyan> thanks guys, see you, etc.
 176 (21:57:55) <barisione> bye
 177 (21:58:13) <pbor> thanks for your help marco
 178 (21:58:18) <barisione> pbor: I need to ask you something about ErrRegex
 179 (21:58:23) <barisione> EggRegex
 180 (21:58:35) <pbor> sure (though I know little about it)
 181 (21:59:37) <muntyan> oh, i forgot about that! i have a question about regex too
 182 (21:59:57) pbor set topic on #sourceview: regex!
 183 (22:00:03) <barisione> muntyan: ask you question
 184 (22:00:12) <muntyan> barisione: remember there was a disucssion in bugzilla about offsets vs indices; and you with mclasen seem to have agreed that it should be changed to use offsets
 185 (22:00:27) <muntyan> barisione: so, did you do it in gtksourceview's eggerex?
 186 (22:00:35) <barisione> it's what I was going to ask :)
 187 (22:00:57) <barisione> mclasen, paolo and I preferred offsets
 188 (22:01:05) <muntyan> don't change it! there are people (me) who use eggregex a lot, as it is
 189 (22:01:07) <muntyan> :)
 190 (22:01:29) <barisione> but it has several problems as pcre always uses indices
 191 (22:01:38) <barisione> are you using offsets or indices?
 192 (22:01:46) <muntyan> indices are better because it's easier to work with the subject string when you have indices. you don't have to use g_utf8_stuff
 193 (22:01:52) <muntyan> i am using indices
 194 (22:02:23) <muntyan> if one works with "words" and "chars", then offsets are better. but who in C works with "words" and "chars"?
 195 (22:02:46) <barisione> the current version uses offsets
 196 (22:02:59) <muntyan> um, then i have old version?
 197 (22:03:14) <barisione> the version in gtksourceview
 198 (22:03:24) <muntyan> specifically, egg_regex_fetch_pos()
 199 (22:03:33) <muntyan> does it returns char offsets or byte indices?
 200 (22:03:46) <muntyan> (my copy returns bytes)
 201 (22:04:29) <pbor> is the api big? to me it would sound sane have both...
 202 (22:04:55) <muntyan> hm, offsets are good for textbuffer
 203 (22:05:03) <pbor> yep
 204 (22:05:04) <muntyan> since you can't work with bytes in a textbuffer
 205 (22:05:15) <pbor> my point exactly
 206 (22:05:15) <barisione> the version in gtksourceview uses chars
 207 (22:05:29) <barisione> i converted both the engine and eggregex to that
 208 (22:05:50) <muntyan> api is quite big, like almost every function is affected :)
 209 (22:05:56) <barisione> the old engine was really ugly because it kept in memory both indices and offsets
 210 (22:06:04) <pbor> so it means that muntyan copied eggregex from sourceview... so it's using it patched
 211 (22:06:18) <pbor> s/it/he
 212 (22:06:34) <muntyan> what do you mean?
 213 (22:06:51) <pbor> <muntyan> i am using indices
 214 (22:07:03) <muntyan> no, i am using non-patched version :)
 215 (22:07:03) <pbor> gah
 216 (22:07:07) <pbor> sorry
 217 (22:07:15) <muntyan> i have eggregex copied long time ago
 218 (22:07:20) <barisione> contextengine.c uses only offsets, converting it back to indices (if we decide to do so) should not be difficult
 219 (22:07:20) <muntyan> before it was converted to offsets
 220 (22:07:29) <pbor> yes
 221 (22:07:40) <muntyan> well, converting my code shouldn't be difficult either :)
 222 (22:07:47) <muntyan> whatever is better will be good
 223 (22:07:53) <barisione> in the cvs on sourceforge you can find the conversion from indices to offsets
 224 (22:08:00) <muntyan> and perhaps offsets are really better for textbuffer
 225 (22:08:19) <muntyan> barisione: you mean eggregex changes?
 226 (22:08:22) <barisione> yes they are better but they could be terribly slow!!!
 227 (22:08:31) <barisione> no changes to the engine
 228 (22:08:45) <muntyan> barisione: i am not using gtksourceview :)
 229 (22:08:54) <muntyan> are they slow or they could be slow?
 230 (22:09:40) <muntyan> hm, wait
 231 (22:09:44) <barisione> my version of eggregex needs to do some conversions from offsets to indices but it the text you are analyzing is big than this is slow!
 232 (22:09:46) <muntyan> if engine uses LineReader
 233 (22:09:55) <muntyan> then it uses char*, and it doesn't care about textbuffer
 234 (22:10:03) <muntyan> offsets are needed only for tags
 235 (22:10:15) <muntyan> ah, i got why you kept both offsets and indices
 236 (22:10:28) <muntyan> hm, situation
 237 (22:11:41) <muntyan> okay guys, now really have to go
 238 (22:11:44) <muntyan> see you
 239 (22:11:50) <barisione> bye muntyan
 240 (22:11:57) <barisione> i will wrote a mail on this
 241 (22:14:51) <pbor> ok
 242 (22:14:55) <pbor> I have to go too
 243 (22:14:57) <pbor> bye
 244 (22:15:16) <pbor> I'll pass the logs to paolo if you are ok
 245 (22:16:23) <barisione> ok
 246 (22:16:26) <barisione> bye
 247 (22:16:52) You are now known as pbor|out
 248 (22:27:06) barisione (~demian@81-208-36-94.ip.fastwebnet.it) left #sourceview
 249 (00:10:03) nud (~sf@d83-182-30-136.cust.tele2.be) left #sourceview
 250 (01:23:25) muntyan (~muntyan@pool-71-113-224-59.herntx.dsl-w.verizon.net) left #sourceview

Attached Files

To refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.
  • [get | view] (2021-02-25 09:59:57, 17.2 KB) [[attachment:2006-06-07.txt]]
  • [get | view] (2021-02-25 09:59:57, 55.0 KB) [[attachment:2006.06.11.txt]]
  • [get | view] (2021-02-25 09:59:57, 12.7 KB) [[attachment:2006.08.18-pbor-paolo.txt]]
  • [get | view] (2021-02-25 09:59:57, 9.1 KB) [[attachment:2006.08.18-pbor.txt]]
 All files | Selected Files: delete move to page copy to page

You are not allowed to attach a file to this page.