<<16-09-2012>>

01:12:42*Trixar_za is now known as Trix[a]r_za
06:47:58*Boscop quit (*.net *.split)
06:47:59*ccssnet quit (*.net *.split)
06:48:00*reactormonk quit (*.net *.split)
06:48:01*mal`` quit (*.net *.split)
06:48:01*fowl quit (*.net *.split)
06:48:01*Roin quit (*.net *.split)
06:48:01*comex quit (*.net *.split)
06:48:01*silven quit (*.net *.split)
06:48:02*CodeBlock quit (*.net *.split)
06:48:02*Owner quit (*.net *.split)
06:48:02*moxie quit (*.net *.split)
06:48:03*Amrykid quit (*.net *.split)
06:48:05*rking quit (*.net *.split)
06:48:05*JStoker quit (*.net *.split)
06:48:05*dom96 quit (*.net *.split)
06:48:05*Araq quit (*.net *.split)
06:48:05*Trix[a]r_za quit (*.net *.split)
06:48:07*XAMPP quit (*.net *.split)
06:48:08*Reisen_ quit (*.net *.split)
06:48:08*shevy quit (*.net *.split)
07:11:42*moxie joined #nimrod
07:11:42*XAMPP joined #nimrod
07:11:42*shevy joined #nimrod
07:11:42*Owner joined #nimrod
07:11:42*reactormonk joined #nimrod
07:11:42*Reisen_ joined #nimrod
07:11:42*silven joined #nimrod
07:11:42*Amrykid joined #nimrod
07:11:42*mal`` joined #nimrod
07:11:42*fowl joined #nimrod
07:11:42*rking joined #nimrod
07:11:42*Araq joined #nimrod
07:11:42*JStoker joined #nimrod
07:11:42*Roin joined #nimrod
07:11:42*dom96 joined #nimrod
07:11:42*CodeBlock joined #nimrod
07:11:42*Trix[a]r_za joined #nimrod
07:11:42*comex joined #nimrod
07:11:45*Boscop joined #nimrod
07:11:46*ccssnet joined #nimrod
09:45:33*moxie quit (Ping timeout: 276 seconds)
09:47:00dom96hello
09:47:05Araqhi dom96
09:47:17dom96Araq: Did you not read scrollback?
09:47:51AraqI think I did, why?
09:47:59dom96graphics.nim/DrawLineAA -- sdl_gfx (as fowl pointed out) contains a function which does this.
09:48:14dom96Should we alias DrawLineAA to that or remove DrawLineAA altogether?
09:48:19Araqso what, native nimrod code is nicer
09:48:38dom96Yeah, but it's incorrect :P
09:48:44dom96it doesn't work half the time
09:48:46Araqsdl_gfx is an additional package?
09:49:00Araqwell why does it not work?
09:53:42dom96it fails on some coordinates for some reason
10:02:29dom96Araq: Did you see my reminder that we should test everything in taint mode? :P
10:05:42Araqthe tester itself is compiled in taint mode
10:05:49Araqso there is some checking
10:06:08Araqhowever, ftpclient etc. indeed need to be tested in taint mode
10:08:46dom96mmm
11:20:47*apriori| joined #nimrod
11:21:00apriori|hi guys
11:21:04Araqhi apriori|
11:21:15apriori|is it me being unable to count, or is the user count actually increasing?
11:21:35Araqincreasing
11:21:41apriori|good :)
11:21:55apriori|Araq: I've got a question concerning your GC
11:22:03Araqalright
11:22:21apriori|now I'm definetly no expert on this topic.. but you said, slices are bad for it because of the interior pointers they represent
11:22:28Araqyeah
11:22:36apriori|aren't there actually gc adaptions that can deal in a sane way with it?
11:23:10apriori|yesterday I found a paper, for example, about that exact topic.. though I don't quite now, whether that is supposed to be used with your GC variant
11:23:49apriori|funny enough, even though apparently being from after 2006 it references tons of papers from the mid 70s
11:23:56apriori|http://benediktmeurer.de/files/fast-garbage-compaction-with-interior-pointers.pdf
11:26:16Araqyou can do quite a lot to keep overhead neglible
11:26:29Araqjust like you can to have only 1 pointer type like D does
11:27:38Araqbut it seems much wiser to disallow both
11:28:08AraqGo and D both start with an incredible permissive memory modell
11:28:27Araqand GC performance is left as an implementation detail
11:28:34Araqthis is not Nimrod's design at all
11:29:31apriori|while I can agree with that... I think, since I also used D quite a while, slices offer a really convenient way to deal with subranges in an array ...
11:30:12apriori|and that, efficient, without copying, etc.
11:31:29Araqwalter claims D's design allows for a precise copying collector
11:31:47Araquntil that is actually implemented and works (!) I doubt it
11:32:54apriori|I understand... well, the GC is definetly a major topic in D, which the devs seem to have real problems with...
11:33:12apriori|D community devs complain a long time already about the GC ..
11:34:46apriori|I guess for the matrix stuff (which is my current motivation why I'd like to have efficient slices) I will just stick to pointer + offsetrange and create respective function overloads
11:35:06apriori|Araq: did you happen to see eigen3? the linear algebra library?
11:35:45Araqadrian benchmarked slices and on a modern CPU ptr+offset is as efficent as 'ptr' only
11:36:07Araqso you can do the slices just fine and keep the base pointer around for the GC
11:36:17apriori|yeah, that's what I meant
11:36:26apriori|so you wont have interior pointers
11:37:21apriori|well, about eigen3... I absolutely like its design. its a c++ header-only library, which uses a hell lot of expression template magic to create a really convenient API
11:37:22Araqbtw the GC supports interior pointers on the stack
11:37:35apriori|hm
11:37:46Araqand it caused huge pains
11:37:56apriori|what causes huge pains?
11:38:05Araqand it took me a month to get the overhead down to almost 0
11:38:15apriori|oh, ok.. the interior pointers
11:38:50Araqthe GC has to support it because we lack the control over the generated assembler code
11:38:51apriori|so its fine to pass in ptr+range in a function and calculate that down to pointers in a function and use these...
11:38:58apriori|or what would be the real consequence of that?
11:39:20AraqI'm just explaining why I hate them
11:39:25apriori|okk
11:39:26apriori|ok
11:39:45apriori|about lack of control...
11:39:59apriori|do you plan to write a real backend sooner or later?
11:40:03apriori|like using llvm or so?
11:40:17Araqnot anymore
11:40:29Araqat least not before 1.0 is out
11:40:40apriori|yeah, I understand
11:41:04apriori|well, some languages also dealt quite well with it to compile down to C for a while until a real backend was developed
11:41:07apriori|(haskell?)
11:41:20Araqeiffel still does that
11:41:36apriori|in a way it really makes sense...
11:41:37Araqthough I don't know about the "dealt quite well" ;-)
11:41:50apriori|one automatically inherits all improvements on the compilers
11:42:14apriori|well, I think.. one problem is the lack of propper debugging
11:42:39AraqI plan to patch the generated debug info
11:42:44apriori|if one compiles down to C, one would at least need a plugin for the debugger to "demangle" (if one could it that) the generated code
11:42:48Araqso that it show demangled names
11:43:03apriori|yeah, that would be nice
11:43:17apriori|I like the edb idea tough...
11:43:19Araqon the other hand, I can easily debug the generated C already
11:43:29apriori|at least one debugger that would be exactly the same, everywhere
11:43:57Araqedb is nice, but it's often too slow
11:44:10apriori|and needs a recompile of the source
11:44:22Araqplus having the debugger in the same address space is insane ;-)
11:44:27apriori|hehe, yeah
11:44:42apriori|one crash/messing with the registers tearing it all down
11:46:58apriori|btw., about the snippets of foldl/foldr etc. I once showed you...
11:47:04Araqbtw one motivating example for the TR macros is copy elimination for slices
11:47:16apriori|actually foldl should be possible to be optimized because it is a tail call
11:47:37Araqyeah one of the 4 was a tail call iirc
11:48:12Araqbut I dislike tail calls sometimes ;-)
11:48:17apriori|why?
11:48:30fowlcan i set up a tr so that array[#, type] expands to array[0..#-1, type]
11:48:56Araqfowl: perhaps, I guess not
11:49:21Araqwe could add it to the language if it constantly annoys you though
11:49:29apriori|I would second that
11:49:39AraqI know ;-)
11:49:49fowl# would be an int literal only
11:49:50apriori|it is a nice feature to allow a range in there (btw... is a set also allowed?)
11:50:03fowlapriori|: you can use enums
11:50:10apriori|fowl: yup, I know that
11:50:15apriori|I meant an oridnal type with "holes" in it
11:50:35Araqno, how could that work?
11:50:42apriori|only via mapping
11:50:55Araqthat's not an 'array' then
11:51:05Araquse a table for that
11:51:13apriori|okay
11:51:40Araqfowl: if it's important enough for you, make a feature request
11:51:40apriori|Araq: about using # instead of a...b
11:51:56apriori|I think one actually rarely uses the a..b specialcase, in which indexing doesnt start with 0
11:52:12Araqabout the tail calls: principle of least power; recursion can do much more than a simple loop
11:52:36apriori|sure.. but recursion is also convenient ;)
11:52:44Araqso a loop is easier to read when you care about performance
11:53:17Araqthere are also no tail calls in debug mode due to the stack tracing
11:53:23apriori|ouch
11:53:45apriori|that could make code code run a hell lot slower in debug mode ;)
11:54:17Araqyeah
11:54:32Araqthe a..b special case is more common if the index type is an enum
11:54:39apriori|yep
11:54:56apriori|but for "normal" arrays its not
11:55:06Araqbut there are also lots of people who prefer 1..N
11:55:22apriori|yeah, well I currently constantly write using low/high only..
11:55:34Araqhowever they have a hard time already as strings and sequences start with 0 ..
11:55:56apriori|and with the matrix multiplication for example, I would even have to make sure, that the len of the index space is identical, no matter how wicked it was setup
11:56:11apriori|not quite sure though, whether that's actually a good idea there
11:56:26Araqyou'll soon write m.first .. m.last everywhere
11:56:33apriori|so currently I more work on low(A)+offset and low(B)+offset etc.
11:56:41Araqto support efficient slicing ...
11:57:42apriori|okay :)
12:00:07Araqwell if you want efficient slicing that is
12:00:21AraqI'm not sure it's useful all the time
12:00:47apriori|one needs exact control over when copies are done and when not
12:01:27apriori|and, well... for linear algebra stuff efficient slicing definetly rocks
12:02:00Araqhow can you program efficient generic matrixes anyway? 4x4 vs. 100x100 is a big difference, right?
12:02:26apriori|sure it is
12:02:42apriori|with bigger matrices one would switch over to non-full representations
12:02:49apriori|coordinate matrix, etc.
12:03:05Araqyep
12:03:13Araqso what should the stdlib support?
12:03:39apriori|but those assume that matrices are actually sparse.. and there also exist cases in which they simply arent..and full sparse matrices are of course a hell lot slower than simple 2-dim arrays (or 1 flat array)
12:03:54apriori|the stdlib should of course support both
12:04:30apriori|and I also wouldn't include some automatic logic to switch over to e.g. sparse matrix for bigger matrices
12:04:41apriori|that would be a bad idea
12:05:23apriori|because usually one implements coordinate matrices as a map of a triple (i, j, V), keeping it sorted by one coordinate
12:05:52apriori|and this would results in insertion order being very important for the construction performance of the sparse matrix
12:07:18apriori|the stdlib should also make sure, that the matrix representation is compatible with other stuff out there...
12:07:46apriori|e.g. column-major in its representation for passing it directly without transposing into OpenGL
12:08:39Araqah, didn't know about that
12:09:34apriori|yeah, well.. I guess that column major stuff is a relict of the early fortran times, though I really don't know where that convention comes from
12:10:24Araqbecause it's "intuitive"? a[x, y] == a[x][y]
12:11:10apriori|yeah, I guess so
12:11:25apriori|but it constantly confuses me..because.. AFAIK c++ is row-major
12:11:40Araqyes
12:12:09Araqin fact, I think of an array of strings to get it right ;-)
12:12:18apriori|hehe
12:14:12apriori|I'm not quite sure, whether we should later add support for common BLAS and *PACK packages
12:14:33apriori|I mean.. they are widely used, yes.. but hell, their API sucks so bad, one definetly would need to wrap them.
12:15:06Araqbtw I've implemented a simple stack trace profiler :D
12:15:08apriori|BLAS is like a library for basic operations on the types (e.g. vector operations) and usually it is replaced on the ABI level
12:15:30Araqthe "embedding" part makes much more sense for a profiler than a debugger
12:15:47apriori|yeah, when I read that commit messages I though: "wtf.. didnt he just wrote ""I guess I write one on my own"" this very evening?!"
12:16:08Araqyeah, didn't feel like hunting bugs ;-)
12:16:10apriori|agreed.. but such a profiler should definetly support sampling
12:16:42Araqwhat do you mean? it already does
12:17:17apriori|with sampling I actually mean, not really counting very instruction one by one.. which usually results in a quite slow behaviour
12:17:27apriori|but only take a sample of an "area of interest"
12:17:38Araq{.push profiler:off.} :P
12:17:57apriori|one should invert the logic
12:17:59Araqbut yeah, that doesn't do quite the right thing
12:18:04apriori|some global flag.. and some local regions
12:18:20Araqyeah, good point; I'm adding it
12:18:35apriori|you're just insane, man ;)
12:18:43apriori|pulling off such a huge pile of work on your own
12:20:02apriori|as an addition, I meant: https://en.wikipedia.org/wiki/Profiling_(computer_programming)
12:20:11apriori|read the part about "statistical profilers"
12:21:04apriori|while not extremely precise, sampling profilers can at least give you a good hint about bottlenecks without impacting runtime performance too much
12:21:34Araqwell it is a sampling profiler
12:21:49apriori|interesting..that MIPS even had hardware support for it
12:21:53apriori|ok, good :)
12:22:25fowli had to rearrange an enum so tthat it was in order
12:22:42fowllast enum on this page: http://enet.bespin.org/protocol_8h.html
12:22:53Araqit used to be 'flat' profiler but then there's already gprof
12:22:53fowlis it still compatible if rearranged?
12:23:20Araqfowl: I'd think so
12:23:46Araqhowever it looks like you want an enum + a set to represent it in Nimrod
12:24:29apriori|Araq: btw.. couldn't one just generate a static internal mapping for such enums with "holes"?
12:24:44apriori|like.. one has e.g. the actual enum values 2, 4, 8, etc.
12:24:52Araqwhy not do instead:
12:25:01apriori|there is no problem to actually interpret those as 0, 1, 2 in an array
12:25:31Araqtype TMyEnum = enum meA = 0, meB, meUnused, meUnused, meC
12:25:47fowlwhys that? i have ENET_PROTOCOL_HEADER_FLAG_MASK = ENET_PROTOCOL_HEADER_FLAG_COMPRESSED.cint or ENET_PROTOCOL_HEADER_FLAG_SENT_TIME.cint in the enum and its not raised any problems
12:25:48Araqer, meUnused2 of course for the second
12:26:17apriori|Araq: bad thing.. because if people use e.g. 2^x, then you got a hell lot of such unused sections to write
12:26:34apriori|but, yeah, you already said it.. thats definetly a "flagging" case, which is better described using a set
12:26:38Araq2^x means you want a set of an enum anyway
12:27:15Araqand yeah if you can't map it easily you have to live with 'const's and inferior type checking
12:27:32Araqthough you can also do:
12:27:38Araqtype TEnum = distinct cint
12:27:46Araqconst valueA = TEnum(0)
12:28:10apriori|interesting
12:28:12Araqproc `or`(a, b: TEnum): TEnum {.borrow.}
12:28:46apriori|Araq: not wanting to ennerve you again.. but.. pleeaase start documenting every single pragma there is ;)
12:30:39Araqit's all documented
12:30:45Araqbut you have to read the manual
12:30:51apriori|ok
12:30:52Araqand skip the verbose introduction
12:31:08Araqthe tuts simply don't cut it
12:31:11Araqbrb
12:31:18apriori|k
13:01:25Araqback
13:06:36*apriori| quit (Ping timeout: 255 seconds)
13:31:06*apriori| joined #nimrod
13:43:35*XAMPP quit (Ping timeout: 272 seconds)
13:52:32apriori|Araq: one question..
13:52:39apriori|about constrained general types
13:52:53Araqthey're a bit limited for now, yes
13:53:02Araqbasically you can do:
13:53:08apriori|would you prefer to have that constraint (e.g. square matrix) expressed by using only the matching arguments in the function
13:53:13apriori|or should I check with assert?
13:53:42Araqhu?
13:53:46Araqcan't follow
13:53:51Araqdo you mean:
13:53:52apriori|like: matrixmultiply instead of 4 dimensions only 3.. thereby already enforcing that matrixes are compatible
13:54:25AraqI have no opinion about it
13:54:38apriori|http://pastebin.com/5q2zZNgK
13:55:13apriori|I think, assert might be better, because it allows an actual non-crypting error message
13:55:27Araqresul[i, j] = cast[T](0) # unnecessary, but if you want to do it, use:
13:55:36apriori|whereas the other variant will just spit out a not neccessarily helpful compiler error
13:55:42Araqresult[i, j] = T(0)
13:55:49apriori|ok
13:55:55apriori|but usually it would implicitly cast?
13:55:58Araqand yeah the identifier lookup in generics needs to be fixed
13:56:17Araqtypo: 'resul' vs 'result'
13:56:27apriori|yeah.. not compiled yet
13:56:28Araqand the compiler doesn't complain until instantiation time
13:56:34apriori|yes
13:56:36apriori|but...
13:56:43Araqit's really horrible and I apologize ;-)
13:56:52apriori|it does have no real clue about a constraint in place in in version 2
13:57:11Araqand yeah, usually it would implicitely cast
13:57:12apriori|it would just say "can't find a matching function"
13:57:40Araqyou should be able to do:
13:58:05apriori|hm.. I guess this is a good usecase for when + hint
13:58:08apriori|+ assert
13:58:12Araqyeah
13:58:19Araqwas about to suggest that
13:58:22Araq:-)
13:58:24apriori|:)
14:04:47Araqapriori|: this program is pretty bad for the GC as it generates lots of garbage
14:04:59Araqfor i in 0.. 1_000_000:
14:05:15Araq let s = formatFloat(i)
14:06:01Araqthere are lots of ways to deal with this, but they all have problems:
14:06:41Araqa) change formatFloat to take a 'result: var string' so that the memory can be re-used within formatFloat
14:07:17Araqb) fix the string implementation to have an STL-like "short string" optimization
14:07:36Araqwhere a short string is directly embedded in the object
14:08:25apriori|hm
14:08:45apriori|do you have a minimal format string example?
14:08:57Araqstrutils.formatFloat
14:08:59apriori|e.g. the equivalent of printf("something %i", bla)"
14:09:29Araqc) make strings reference counted like in Delphi
14:10:02Araqc) is really nice when the output target is C++ because we have fast exception safe destructors there
14:10:52Araqthe real problem is that we want proc p: string for convenience
14:10:54apriori|(btw.. % operator similar to pythons... that rocks ;)
14:11:00apriori|yeah...
14:11:08Araqbut need proc p(result: var string) for performance
14:11:41Araqwould be nice if we could transform one into the other
14:12:06AraqTR macros can do the transformation
14:12:23Araqbut that still requires 2 different 'p' implementations
14:13:19apriori|yes
14:13:24apriori|btw...
14:13:30apriori|how are those tr macros actually applied?
14:13:39apriori|because.. in term rewriting order should matter
14:13:50Araqin an extra pass over the AST after semantic checking
14:14:31Araqthe longest match should win but currently it's applied in reverse definition order
14:14:39apriori|ok
14:15:06Araqwell hm
14:15:10*zahary joined #nimrod
14:15:28AraqI'm wrong, it should already implement "longest match wins"
14:15:47apriori|okay
14:18:22Araqhi zahary
14:18:44zaharyhi
14:21:12apriori|Araq: btw: say I got those types:
14:21:15apriori| TMatrixNM*[M, N, T] = array[M, array[N, T]]
14:21:16apriori| TVectorN*[A, T] = array[A, T]
14:22:12apriori|assume I got a matrix with only say 1 row or 1 column.. will the codegen for array[0..1, array[0..n, T]] expand to the same as array[0..n, T] ?
14:23:04apriori|I mean.. is that obvious useless extra dimension stripped?
14:23:16Araqno but GCC may do that
14:23:26Araqit should ;-)
14:24:01apriori|hm, because I currently think I could also just treat vectors as special matrices
14:24:32Araqjust try it and look the generated assembler
14:24:37Araq*look at
14:24:39apriori|yeah, I guess so
14:24:47Araqit's the only thing that works with modern optimizers
14:25:15apriori|okay
14:25:30*q66_ joined #nimrod
14:25:36Araqbenchmarking may not help as much as that could trigger some cache anomalies
14:26:05Araqwhere the less efficient code runs faster because it's at an address that happens to not produce a cache conflict
14:26:22apriori|yeah... cache aware programming sucks ;)
14:27:16*q66_ is now known as q66
14:30:12Araqzahary: read the history please about the formatFloat issue
14:31:07Araqany ideas how to solve the "excessive proc duplications for strings" problem?
14:38:25zaharywhat is the issue again?
14:38:42Araqfor i in 0.. 1_000_000:
14:38:51Araq let s = formatFloat(i)
14:39:25Araqproduces lots of garbage
14:39:58Araqand the GC is not really good for this
14:43:20apriori|btw: how would I need to write an overload for a len function for ranges?
14:43:33apriori|I mean.. what is the proper type of a range[T]?
14:43:42*zahary quit (Read error: No route to host)
14:43:51apriori|because something like len*[T](x: range[T]): int does not work
14:43:52*zahary joined #nimrod
14:44:07zaharyI guess it's possible to statically determine that this string is temporary and then some alternative allocation strategy could be used for it
14:45:07Araqyeah and you can do it with a TR macro
14:45:15Araqbut the problem is you wnat
14:45:21Araqproc p: string
14:45:25Araqfor convienience
14:45:27Araqand
14:45:38Araqproc p(result: var string) for performance
14:45:57Araqand you need to provide both
14:46:04Araqand then the TR macro can do its job
14:48:33Araqapriori|: proc len*(x: typedesc[range]): int
14:48:38zaharyi've said before that I think RVO should be used everywhere and when not explicitly told otherwise procs should be able to construct objects both on the stack and on the heap (if the result is a new heap location, the compiler just places an allocation call before the proc call)
14:49:39AraqI remember now, but that's not the only solution
14:50:00Araqif you have 'result: var string' you can easily do:
14:50:05AraqsetLen(result, 0)
14:50:14Araq# fill 'result' then
14:50:34Araqwith no new allocations unless it happens to need a buffer resize
14:50:51Araqand it doesn't matter that it's been placed on the heap
14:51:04zaharyyou mean, var string is more optimal when there is a chain of operations?
14:51:05*Boscop quit (Ping timeout: 244 seconds)
14:51:15zaharylet me think about it
14:51:19Araqin fact, it may be beneficial to have it on the heap because it can stay there
14:51:50Araqwhereas if it's on the stack you likely end up copying it to the heap later
14:55:59zaharythe question boils down to whether it's possible to optimise a proc like uppercase(x: string): string to sometimes reuse existing memory
14:57:33*Boscop joined #nimrod
14:59:30zaharyin C++, the current thinking is that if you accept the input as copy it would be possible for the compiler to reuse its buffer in the result string. also sometimes it would be possible to avoid copying the input string when calling the function (when the input is a rvalue)
14:59:53zaharyobviously uppercase(x: var string) is different interface so we are ignoring it here
15:03:20zaharyC++ is not completely optimal as there is still code for resource stealing and noop destructors being executed even in the best case - in theory it's possible to compile uppercase(x: string): string automatically in 2 forms (one of them being uppercase(result: var string))
15:10:37apriori|Araq: that does not work :/
15:12:44apriori|http://pastebin.com/Jcm06fDj
15:13:33apriori|I know that test is silly, because there is array.len, but its just a test for extracting the range and passing that over to the proc
15:15:02zaharywell, apriori|, why are you passing the array to the test function. you should pass the T type
15:15:36apriori|yep, right
15:15:42apriori|but doesnt change the error message
15:16:16zaharylet me try it. it's probably my fault :)
15:18:53zaharythis works for me: http://pastebin.com/6zbeNhDz
15:19:10zaharyindeed, there is some error when you say typedesc[range] - I'll look into it later
15:20:39apriori|ok, testing
15:21:09apriori|yup, working, thank you
15:21:23Araqzahary: well yeah the problem is the optimization changes semantics
15:21:43Araq'result: var T' is an inout as opposed to an 'out'
15:25:24zaharythe optimisations I'm describing are possible if you know that the input string lifetime ends at the proc call - in C++ you can then move it to the param instead of copying it and in the other hypothetical optimisation you then choose the alternative "var version" of uppercase
15:30:33Araqit's not at all about copying vs. moving IMO
15:30:57Araqin fact the code snippet already does not copy, but move
15:36:40Araqthe problem is that the current semantics of a string result prohibit a variant of loop invariant code motion
15:36:43apriori|hrm
15:36:56apriori|Error: illformed AST: m1[0, 0] .. what the hell? :/
15:37:12Araqfor i: x = p(alloc())
15:37:20Araq-->
15:37:23Araqy = alloc()
15:37:28Araqfor i: p(x)
15:38:03Araqer, loop hoisting, it's not really "invariant"
15:38:08*Trix[a]r_za is now known as Trixar_za
15:41:47Araqwe could do this:
15:42:16Araqproc toUpper(x: string, result: var string) {.asfunction.} # in stdlib
15:42:23Araqand then allow usage as
15:42:35Araqa function that returns something
15:42:43Araqecho toUpper("abc")
15:42:57Araqand the compiler introduces a temporary for us:
15:43:07Araqvar tmp1 = ""
15:43:22AraqtoUpper("abc", tmp1)
15:43:26Araqecho tmp1
15:43:49Araqwhich is something along the lines what you suggested I guess?
15:44:52*moxie joined #nimrod
15:44:56Araqwe can't hide it completely I think because it affects proc var compatibility
15:45:22Araqplus for 'int' it's stupid to transform it into 'result: var int'
15:47:13Araqthinking about it more, all we need a 'snoopResult' pragma
15:47:36Araqproc toUpper(s: string): string {.snoopResult.} =
15:47:56Araq if isNil(result): result = newStringOfCap(s.len)
15:48:14Araq else: result.ensureCap(s.len)
15:48:30Araq for i in 0.. <s.len: result[i] = toUpper(s[i])
15:49:28Araq'snoopResult' ensures its passed as 'var T' and that it's not reset/initialized to binary zero
15:50:26apriori|I guess, another bug:http://pastebin.com/893ZZ0ZJ
15:51:34Araqit's not a really bug, if you overload [] for arrays the compiler is confused
15:51:54Araqand uses the builtin [] which does not support ',' notation
15:52:05apriori|Araq: its not overloaded anywhere there
15:52:08Araqso it's an "illformed AST" ;-)
15:52:15apriori|hm...
15:52:30AraqI can fix it easily I think
15:52:42apriori|well, if that's the case, how should I overload [] for a matrix which is array[..,array[.., T]]?
15:53:03Araqwell I expected people to wrap it in an object
15:53:14Araqbut I don't mind supporting it for arrays
15:53:23apriori|hm, ok
15:54:39apriori|https://github.com/Araq/Nimrod/issues/205 I wonder how that would actually work there...
15:54:46apriori|return type inference out of nowhere?
15:55:04Araqyeah somehow peope simply expect that to work ...
15:58:09apriori|zahary: as an update to the ticket you were working on: https://github.com/Araq/Nimrod/issues/202
15:58:24apriori|name doesn't cause ICE anymore, but segfaults
15:58:50zaharyapriori|, 2) works for me. can you send me the exact code for which you get a segfault
16:01:28zaharyAraq, what problem do you see if there isn't snoopResult, but rather the compiler implements it for you automatically in certain cases?
16:02:18apriori|zahary: sry, I'm stupid today.. it does segfault if you pass in a variable instead of a type. which should be expected. but maybe we should provide an overload
16:05:09zahary"we can't hide it completely I think because it affects proc var compatibility" - you mean that when the proc is converted to a proc var the optimization is no longer applicable? sure, but this sounds like a rarely triggered edge case. snoopResult can not be overloaded with regular upperCase proc I think, which is somewhat more limiting
16:05:54zaharyapriori|, the compiler should never segfault so it's bug after all
16:07:13apriori|zahary: yup, right
16:11:18zahary… also you can have a pragma on the proc var rather then the proc itself (much like you define the calling convention)
16:23:36Araq'(result: var int)' is not compatible to '(): int'
16:24:02Araq '(): int' is returned in a register
16:24:14Araqso that's why it affects proc var compatibility
16:24:28Araqand yeah 'snoopResult' is for proc var types too
16:24:52Araqyou can't make it the default because it changes semantics within toUpper's body
16:25:08Araqit's this semantic change that allows the optimization
16:25:19Araqin toUpper's body you can then do:
16:25:32AraqsetLen(result, 0); fill result
16:25:48Araqthis is otherwise not possible because 'result' starts with a 'nil' value
16:26:23Araq'snoopResult' transforms the 'result' from an 'out T' into an 'inout T'
16:26:29fowlAraq: i think you said not to, but im storing the address of a dereferenced ref and casting it back to a ref and its working, so whats the reason i shouldn't do this?
16:27:22Araqdepends on what you mean exactly
16:27:41fowli'm using it for c callbacks
16:27:47Araqit could be fine but addr x[] is not any different from cast[T](x)
16:27:59Araqexcept more confusing to read IMHO
16:28:25Araqso I'd use the 'cast' version
16:29:06fowl? i need a generic pointer to store
16:29:47fowlbut all the related functions expect a ref type so casting it to ptr T isnt useful
16:33:07Araqbut 'addr x[]' produces a ptr T
16:33:14Araqso I can't follow
16:33:36fowlbecause i cant store ref T as a generic pointer
16:34:34Araqwell yeah so you need a 'ptr T'
16:34:49Araqso you can do: cast[ptr T](myRefT)
16:35:02Araqno need to do: addr myRefT[]
16:35:21fowlooo ok
16:35:25zaharyAraq, sure, I understand the implications. What I'm saying is that it's possible to compile 2 versions of the proc from a single definition. The compiler automatically uses the "snoop" version in regular chained calls like "foo".toUpper.trim.reverse and when converting the proc to a procvar, it just looks at the procvar type to determine which implementation to pick
16:36:14zaharyno need for the user to mark the proc as snoop (thus disabling some of its uses in regular code)
16:36:28Araqwell that's my suggestion as well
16:36:48Araqa snoop proc can be called like every other proc
16:37:03Araqclient code is not affected except for proc vars
16:37:15*shevy quit (Read error: Operation timed out)
16:37:34Araqbut in the implementation of toUpper the programmer needs to provide the optimized version
16:37:42zaharyok, but why should I enable my proc for snooping - this should be the default - have a pragma to disable it if you want
16:38:09Araqyeah maybe, I don't know
16:38:43zaharymaybe we still disagree somewhere - snoop only in certain situations in my point of view
16:39:11zahary… when the input is a var about to be destroyed
16:40:09zaharywhat happens when you call a snoop proc in expression like
16:40:10zaharyvar a = x.toUpper ?
16:41:04zaharyx is copied and then passed to the snooped proc?
16:48:07Araqwhy? x ain't copied today
16:48:44Araqit's not even a problem if you do: x = x.toUpper
16:49:04Araqthough it could become a problem for other examples
16:49:22Araqbut then we already have an alias analyses
16:51:22zaharyah, poor example indeed
16:51:57Araqwell I think 'snoop' shouldn't be the default because it changes semantics in a subtle way
16:52:18*shevy joined #nimrod
16:52:40Araqand you can do another interesting thing with 'snoop': the proc can actually *add* to the result buffer instead of replacing its content
16:53:00Araqwhich is very useful for string handling
16:54:05Araqthough I think it's still not enough to transform $(x: tree) into toStringAux(x: tree, result)
16:54:18Araqin the recursive case
16:54:47Araqa snoop proc should be allowed to be called like:
16:55:02Araqp(x, res)
16:55:10Araqinstead of: res = p(x)
16:55:53zaharywhat about?
16:55:54zaharyvar a = x.reverse.toUpper # 1) both reverse and toUpper are snoop procs. 2) reverse is not snoop
16:55:54zaharyso you see snoop as different semantics in which you expect the result to be already constructed so you are supposed to append to it? I seem to care more about automatic copy elision in certain situations
16:57:11Araqwell it's at least an interesting idea
16:58:03Araq1) both are snoops: where is the problem?
16:58:12Araqvar tmp = ""
16:58:21Araqx.reverse(tmp)
16:58:31zaharysyntax assisted append-semantics are cool idea, but somewhat orthogonal to what I talk about
16:58:54Araqvar tmp2 = ""; toUpper(tmp, tmp2)
16:59:04Araqa = tmp2
16:59:18Araqor maybe directly: toUpper(tmp, a)
16:59:34Araq2) reverse is not snoop:
16:59:54Araqvar tmp = x.reverse
17:00:08AraqtoUpper(tmp, a)
17:00:44Araqit's all very easy to do with TR macros except that the TR macro can't optimize the ordinary toUpper easily
17:01:07Araqbecause it can't transform its body
17:01:30zaharyI suggest that it's possible to do
17:01:30zaharyvar a = x
17:01:30zaharyreverse(var a)
17:01:30zaharytoUpper(var a)
17:01:59Araqwe often have:
17:02:12Araqproc `$`(x): string =
17:02:14Araq result = ""
17:02:19Araq toStringAux(x, result)
17:03:02Araqbut the compiler can easily transform toStringAux to a proc with result type
17:03:31Araqthe other direction is MUCH harder to do for the compiler
17:03:50Araqso the programmer should only have to provide toStringAux
17:04:14Araqwhich is what 'snoopResult' enables
17:05:23Araqyour transformation only works with 1 argument procs I think?
17:05:35AraqI mean unary procs
17:06:04zaharyno, the var part is the result - other arguments can still be passed along
17:06:57zaharywhat is hard in my suggestion is compiling the correct proc body to mutate the input in place
17:10:13Araqyeah ;-)
17:12:31zaharywell, the body can be supplied by the user and then the rule is just that
17:12:31zaharyproc (x: var T, …) is interchangeable to proc(x: T, …): T in certain situations
17:13:46zaharystill thinking about automatically transforming the body - seems to be possible if you cache reads of certain fields into local variables
17:15:34zaharyyou know the input is immutable, so if you write to some field in the result and then attempt to read the same field from the input, the compiler have to ensure that the original value was cached before the mutation (sure, this will actually result in worst code sometimes)
17:18:42zaharywhat about my first rule instead snoopResult? the benefit is that intermediate variables in x.reverse.toUpper are eliminated this way
17:19:30Araqsorry, what is your first rule again?
17:19:40zaharyproc (x: var T, …) is interchangeable to proc(x: T, …): T in certain situations
17:19:55apriori|ohm.. how would I overload the [] operator so that it could be used for any operation (like +=, -= etc.) on the scalar type? (matrix scenario)
17:20:10Araqproc []: var T
17:20:38fowlhow about making them return var strings, then variants that take var strings
17:21:04Araqbut I will implement a[i, j] to mean a[j][i] if 'a' is an array, so that you don't have to overload any [] at all
17:21:37apriori|Araq: I thought about your wrapping of the array as object
17:21:40zaharythere is 2 definitions of toUpper supplied by me
17:21:41zaharyproc toUpper(x: string): string
17:21:41zaharyproc toUpper(x: var string)
17:21:41zaharythe compiler notices that x.trim.toUpper.reverse matches the first overload, but it will nevertheless use the second one, because it can determine that x.trim produces a temporary value that can be mutated in place
17:22:09apriori|it makes sense.. because I could then "annotate" the matrix, e.g. as being trilinear, etc... and allow procs to use special algorithms then
17:22:15Araqwell, should a[x,y] mean a[x][y] or a[y][x]?
17:23:25Araqzahary: 2 definitions of toUpper is exactly the problem that I want to avoid
17:23:53zaharywell, my first thesis was that it's possible to derive the second definition from the first one
17:23:55Araqyour rule is easily done with a TR macro, no need to implement it in the compiler
17:24:14Araqbut I want to get rid of the 2 definitions
17:24:33Araqbecause it means a strutils of twice the size it is today
17:25:17Araqand I doubt it's possible to derive the second implement from the first one
17:25:22Araq*implementatino
17:25:28Araq*implementation
17:25:34apriori|Araq: [x][y] for column major
17:26:55Araqok
17:27:06Araqthat's easier to remember anyway
17:27:10apriori|indeed
17:28:27zaharyit certainly is, only not necessarily in efficient way
17:28:27zaharyproc (x: var T) =
17:28:27zahary input = x.copy
17:28:27zahary … replace any occurrence of x with input in the original body
17:29:34Araqinteresting
17:30:04zaharybut that misses the point of reusing existing resources (which is hard problem I agree)
17:32:05zaharymy point is that snoop doesn't buy that much in these chained calls - you still have temporary objects and reallocations?
17:32:08Araqapriori|: there is a codegen bug returning a 'var array', watch out
17:32:22apriori|I dont even get to that now
17:32:31apriori|got some issues with [] for component access already
17:33:05apriori|[]= catches the assignment just fine.. but I dont get other stuff to work (and actually, only a proc that returns the ref should do, but that's not working)
17:33:20apriori|and I guess.. because [] is not allowed as a normal proc name? just a guess
17:33:43Araqdunno -.-
17:35:08apriori|yup, [] not allowed as normal non-op proc
17:38:44apriori|oh fuck
17:39:24fowlproc foo[]() = is foo() with no generics
17:39:41apriori|yeah
17:42:56apriori|hm
17:43:02apriori|this _might_ be a bug too
17:43:12apriori|in a funcion, with its return type as value type
17:43:20apriori|shouldnt result be actually "var T"?
17:43:26apriori|in the context of the function
17:43:49Araqwhat makes you think it isn't?
17:44:05apriori|wait a esc
17:44:06apriori|sc
17:44:09apriori|grr... sec
17:45:06apriori|Araq: http://pastebin.com/jGGWiKb6
17:46:06Araqwhy do you prove the ': T' version?
17:46:18Araqand the '[]=' version?
17:46:23Araq*provide
17:46:28apriori|[]= is needed for the assigment
17:46:35apriori| result[i, j] = 0
17:46:36Araqthat's the bug then
17:47:24apriori|and []: T is needed for a[i, k] in the line and b[k, j], because those are no var variables
17:48:02Araqhm
17:48:55AraqI see, we need overloading of 'var T' and 'T' I think
17:49:06apriori|yep
17:49:24AraqC++ is actually much better designed than most people realize
17:49:37apriori|yeah, I figured :P
17:49:45apriori|because of ref etc...
17:50:09apriori|brb.. and thank you guys :)
17:57:06Araqzahary: I'm still not worried about elimination of temporaries, it'll be done once I implemented the optimizer I have in mind
17:57:58Araqespecially call chains are simple to optimize
17:58:24zaharyhow will it work?
17:59:16AraqGCSE with the effect analysis that I outlined days ago
17:59:41Araqplus copy elimination
18:00:01Araqtemporaries can be dealt with much like 'let' variables
18:00:01zaharycan you provide some example transformations?
18:00:11*Trixar_za is now known as Trix[a]r_za
18:00:26Araqessentially 'let' gives us an SSA representation
18:00:33zaharythis is not about eliminating chained assignments
18:00:49Araqthe lack of 'goto' also helps ;-)
18:01:47Araqzahary: but it is the same problem as register allocation, only simpler
18:01:56Araqas the number of registers is unbound
18:02:16Araqyou simply merge tempories whose lifetime does not overlap
18:02:20Araq*temporaries
18:02:33zaharyit's about mutating values in place. I don't see how the low level SSA optimisations can help here as the optimization makes use of high-level knowledge about he string type .
18:02:33zaharydescribe how x.trim.reverse will work?
18:03:02Araqwell if we have snooping procs:
18:03:20Araqtrim(x, tmp1)
18:03:28Araqreverse(tmp1, tmp2)
18:03:53Araqdest = tmp2
18:04:25Araq-->
18:04:42Araqtrim(x, tmp1); reverse(tmp1, dest)
18:05:13Araqand then ... ok, this needs more thoughts ;-)
18:05:56Araqdamn you :P
18:06:19zahary:)
18:07:16zaharybtw check out this. I was a little bit blown away / humbled down by the description of what kind of optimisation is carried away by gcc
18:07:17zaharyhttp://stackoverflow.com/questions/10250419/why-does-gcc-generate-such-radically-different-assembly-for-nearly-the-same-c-co/10251097#10251097
18:13:23Araqand yet I never saw a compiler perform anything like that with string ops ;-)
18:14:20zaharywell, that would be because it requires high-level knowledge of the type as I said :)
18:14:46Araqwhich is why our optimizer should be concerned only with high level optimizations
18:15:17AraqI can't see anything expensive in fast_trunc_one btw, I wouldn't worry about it at all
18:15:33Araqit doesn't access memory
18:15:48Araqit's irrelevant ;-)
18:18:20zaharywhat amazed me is how the compiler figured out from sign = i & 0x80000000; that sign = -sign and then used that to make another non-trivial transformation
18:19:34AraqI saw the microsoft compiler transform a nontrivial recursion that implements integer add into the 'add' asm instruction :-)
18:19:59Araqthere was still an unnecessary 'jmp' involved iirc
18:20:09Araqbut the recursion had been completely eliminated
18:20:16Araqand no, it was *not* a tail call
18:21:21zaharyso we should leave the SSA stuff to these guys for a while :)
18:21:43Araqwell I don't plan to transform anything to SSA
18:22:02Araqbut we get it for free: let vars are in SSA
18:22:18Araqand every compiler introduced temporary is in SSA too
18:22:36zaharyyes, and the C compiler will happily take care of them
18:23:04zaharywe should strive to eliminate high-level code (calls to user copy procs, etc)
18:23:45Araqtrue but I still think copy elimination for ints and strings is essentially the same
18:24:03Araqexcept that you get it for 'int' for free in the C backend
18:24:06zaharysure
18:28:21Araqthe new profiler hooks work exactly like a "stop the world" mechanism could work
18:28:40Araqevery proc entry and every loop body is injected with a call to 'nimProfile()'
18:29:58Araqbut I have no idea how expensive that is as the profiler requires stack traces to be turned on
18:30:07Araqwhich are expensive
18:33:21zaharyyeah, I snooped the diff :) btw, it may be a worthwhile to support some existing report format
18:33:21zaharyhttp://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html
18:33:21zaharythere are some visualization scripts with packages like these
18:33:49zaharymaybe dom96 could be interested in such feature?
18:35:42Araqhe's away ;-)
18:36:28Araqbut we should do important work instead, I already played too long with it
18:38:12Araqbtw do you happen to know how fixed length software caches work?
18:39:48AraqI mean a count table which keeps the entries with the greaters count values
18:39:55Araq*greatest
18:40:00zaharymost used?
18:40:04Araqyeah
18:40:24Araqbut it should only use a fixed amount of memory
18:40:47zaharyI've worked on something similar for textures - let me try to recall what was it all about
18:41:02Araqit can only work stochastically
18:41:12AraqI think
18:41:31zaharywell, you could go with priority queue, but that wasn't the solution then
18:46:08Araqa fixed length priority queue does not work as we don't know the priority upfront
18:46:36Araqthe priority is the usage counter which we don't know when profiling
18:49:15zaharydon't quite follow. priority queue will have log(n) updates on each usage.
18:50:19Araqyes but it's not of a fixed length
18:50:31Araqif we start with [1, 1, 1]
18:51:18Araqwell ok, let me elaborate: we have stack traces
18:51:53Araqand we count them
18:52:06Araqthe stack trace with the highest count is the critical path
18:52:41Araqso if we have 3 slots in our count table, we can keep track of 3 different stack traces
18:52:58Araqbut now a forth stack trace is encountered
18:53:12Araqand we have to replace 1 slot somehow
18:53:13zaharyI looked at the old texture cache code and it's not overly sophisticated - it doesn't remove one item at a time so purges are linear scans over the whole cache with some value chosen as threshold for removal in heuristic way (there is a global counter of texture usages and you can divide the delta usages from the last purge to get the expected average usage)
18:53:45*shevy left #nimrod ("I'll be back ... maybe")
18:54:12zaharyotherwise, these caches are called LFU (least frequently used) - here is a paper that provides O(1) for all operations
18:54:13zaharyhttp://dhruvbird.com/lfu.pdf
18:54:14Araqbut whatever slot we pick, it could be that that's the stack trace that ends up getting the greatest count value
18:54:29zaharyuses hash table and linked lists
18:55:07AraqI'm not concerned about O(1), I'm concerned about a replacement strategy that guarantees say, "won't be wrong with p probability"
18:55:50Araqok, so I want a fixed size LFU ;-)
18:56:17zaharyyou can have a max size before starting to purge with the above algorithm
18:56:27zaharyseems simple to implement
18:56:27Araqwhich can only be done stochastically
18:57:47zaharythe algorithm is not stochastic in nature, but the results still won't be precise (some stack can be purged and placed back into the cache multiple times)
18:58:59zaharyI've supervised a memory profiler implementation that used stack traces are keys btw - let me try to recall how that worked too :)
18:59:15zahary* as keys *
19:01:34Araqcurrently I use a hash table that tracks the max chain length
19:02:29Araqif the table is full, one element on the hash chain is replaced
19:02:40Araqthe element with the minimal count
19:03:21Araqthe max chain length is used so that it doesn't compute the minimum over all slots
19:04:12zaharyhow do you get the element with the minimal count?
19:04:53Araqas I said, I do: for i in 0..maxChainLength-1: # follow the chain
19:05:19Araqwhen finding the element in the chain, it's not a new element and we can update its counter
19:05:41Araqbut we also keep the index of the minimum while traversing the chain
19:05:52zaharythe algorithm from the paper just adds a linked list next to the table that is updated with insertion sort - since the updates always change the usage count by 1, you know that the insertion sort with have O(1) cost
19:06:43Araqthe problem is: every new element starts with counter 1
19:07:12Araqbut it still needs to be added because it could become the critical stack trace
19:07:39zaharyand after some time, the newcomers are always the prime candidates for purging?
19:08:17Araqno ;-)
19:08:28AraqI keep no "age"
19:08:39Araqonce added it's treated equally to everything else
19:08:47Araqhrm, that sounds stupid
19:08:54Araq:D
19:09:26Araqoh well the table is big enough anyway
19:09:43Araqthe code dealing with purging is never executed in practice
19:09:55zaharyso what's the problem again?
19:10:04zaharynewcomers start with 1, so?
19:10:07AraqI'm interested in what's out there
19:10:39Araqwhat I've implemented works good enough already
19:10:40zaharyyou can start newcomers as the current minimum so they don't get unfair disadvantage
19:11:27Araqthat could work, yes
19:11:33zaharyotherwise, O(1) sounds like you can't get better than this
19:12:39Araqas I said I'm interested in some numbers about how probable it is that the critical stack trace will get purged incidentically
19:14:02zaharyaha
19:15:22zaharybtw, most profilers I've worked with doesn't seem to purge anything (I'm not entirely sure tho) - you see functions that were called only once in the report
19:15:39Araqyeah I'm aware ;-)
19:16:07Araqbut it seemed risky to let it allocate
19:18:31Araqnow it allocates anyway, but I can easily change that
19:21:51zaharybtw I've seen another inaccurate strategy used in some intrusive profilers
19:23:26zaharyyou don't insert the real stack traces in a cache, but rather assign a single record for each proc (this record resides in a static variable, so it's very cheap to update)
19:23:35*XAMPP joined #nimrod
19:24:05zaharyso for functions with low fan-in it works relatively accurate, but for high fan-in you get results that are not very useful
19:24:45AraqI can't follow
19:24:58Araqthere is a static variable per proc, ok
19:25:12Araqwhat does the variable contain? a counter?
19:25:26zaharyand you just measure the time spent in that proc and its children
19:25:44Araqthat's exactly what my old profiler did :-)
19:25:55zaharyaha, ok
19:26:13Araqbut it was as useful as gprof
19:26:17zaharyin practice, when you get to pick your own functions to profile the inaccuracies potentially doesn't matter that much
19:26:49zaharyhmm, I disagree in the comparision with gprof
19:27:04Araqso why reinvent gprof? the name mangling is not bad enough to justify it
19:28:03zaharyyou still get meaningful nested report like this:
19:28:03zaharyreadNetwork 10%
19:28:03zaharyupdateWorld 20%
19:28:03zahary simulatePhysics 14%
19:28:04zahary runAI 5%
19:28:04zahary runMobs 3%
19:28:05zaharyrenderWorld 40%
19:28:26zaharygprof only show leafs
19:28:41Araqno, it also shows total time
19:29:01AraqI think ...
19:29:05zaharybut not for function that's up the stack
19:29:17zaharyit just looks at the program counter to see in which function you are currently in
19:32:01Araqhttp://www.linuxselfhelp.com/gnu/gprof/html_chapter/gprof_5.html
19:32:17Araq"the call graph"
19:33:30zaharythe disclaimer is that I haven't actually used it, but that's how people explain it usually:
19:33:30zaharyhttp://stackoverflow.com/questions/6328673/which-is-the-most-reliable-profiling-tool-gprof-or-kachegrind
19:33:30zaharyI've experience only with sampling profilers
19:38:21Araqok
19:38:36Araqbut gprof is also a "sampling" profiler, isn't it?
19:39:45Araqbtw profiling shows the copyTree calls for semchecking of calls is really expensive
19:40:31Araqbut we can get rid of them once I rewrote sigmatch with all the changes we have in mind
19:41:54zaharyyou mean the copyTree in DirectOp?
19:42:41Araqthere is also a copyTree for each arg in sigmatch
19:44:05zaharywhen I tested it in 4 different profilers I got consistent results, but the copyTree in DirectOp at least was very cheap
19:44:19apriori|Araq: can you give me a simple example for a tr macro which wants to replace a call to a generic function (not a specific one) with something?
19:45:37Araqthe copyTree produces much GC pressure I think
19:46:52zaharyor the profiler overestimates recursive procs
19:46:58Araqapriori|: template t{f(X)}(f: expr, X: varargs[expr]): expr = g(X)
19:47:27Araqno because the stack trace did not show copyTree as the leaf
19:47:35zahary… small recursive procs - the issue with intrusive profilers I've brought before
19:47:38Araqit showed it as the *reason* :P
19:47:46apriori|Araq: I meant a function with a generic parameter
19:48:02apriori|Araq: and a named function.. not all possible calls
19:48:08Araqapriori|: it's eliminated
19:48:15apriori|hm?
19:48:35apriori|so I can only match on specific calls to a generic function or what do you mean?
19:48:47Araqtemplate f{nameHere(X)}(X: varargs[expr]): expr = g(X)
19:49:08AraqnameHere[int, int](a, b, c) # should be replaced with:
19:49:14Araqg(a, b, c)
19:49:33apriori|hm, okay, thank you
19:52:19Araqbtw even easier and should also work: template t{nameHere}: expr = g
19:53:02apriori|well, let me be more precise:
19:53:31apriori|template optimizeCrossOfSameVector{cross(a, b)}(a: expr{alias}, b: expr) = [ 0, 0, 0 ]
19:53:37apriori|the problem here, is:
19:54:13apriori| I need to know the actual type of an element in a or b so I can properly construct that array literal
19:54:42Araqa aliases b?
19:54:50apriori|or vise versa
19:57:36Araqproc default[T](): T {.inline.} = nil
19:57:59Araqtemplate ... = default[type(a)]()
19:58:48Araqyou can also construct the [] in a macro but why bother
20:01:47fowlif you have a generic ref, how do you set the free proc
20:01:57fowlthis is what i tried https://gist.github.com/3734130
20:02:39Araqnew(result, free2[A])
20:03:38fowlah
20:05:59apriori|Araq: btw., are tr macros automatically active?
20:06:13Araqyeah
20:06:25apriori|hm
20:11:52apriori|Araq: what is the scope of tr macros?
20:12:20Araqeverthing in the same module after declaration
20:12:30apriori|no way to export it?
20:12:34Araqthey don't respect scope
20:12:40Araqexport with *
20:13:01apriori|grrr...
20:13:15apriori|actually obvious o_O
20:21:08apriori|Araq: alias check doesn't seem to work (or works different than I think):
20:21:15apriori|template optimizeCrossOfSameVector*{cross(a, b)}(a: expr{alias}, b: expr) : expr =
20:21:16apriori| [ 0.0, 0.0, 0.0 ]
20:21:34apriori|this should actually only replace, if a is the same array as b
20:21:51apriori|I now went for the simpler pattern cross(a, a)(a: expr)
20:26:32Araqstrange
20:27:15apriori|I guess tomorrow is bug filing day ;)
20:27:28Araqyeah
20:30:27zaharyAraq, how much of the total time is spent in copyTree according to the built-in profiler?
20:30:52AraqI don't know, I'm improving it and will measure again soon
20:31:16zaharyI did some profiles with Instruments on mac os and Very Sleepy on windows so I can compare
20:31:46Araqwell I can't do "total time spent in"
20:32:05AraqI can only do "frequently encountered stack traces"
20:33:42*moxie quit (Remote host closed the connection)
20:34:47zaharyis there percent of the samples for which it was in the trace?
20:35:08Araqyeah, just wait a sec ;-)
20:41:38Araq copyTree 36/1992 = 1.8%
20:41:52Araqok, can't reproduce now that I improved the profiler
20:42:51Araqwell it's still significant, but not the worst offender,
20:42:55Araqthe stack trace is:
20:42:58AraqEntry: 9/770 Calls: 4/879 = 0.46% [sum: 77; 77/879 = 8.8%]
20:43:00Araq add 11/1992 = 0.55%
20:43:02Araq addZCT 4/1992 = 0.20%
20:43:04Araq doOperation 5/1992 = 0.25%
20:43:06Araq nimGCvisit 5/1992 = 0.25%
20:43:07Araq forAllChildren 5/1992 = 0.25%
20:43:09Araq CollectZCT 8/1992 = 0.40%
20:43:10Araq collectCTBody 17/1992 = 0.85%
20:43:12Araq collectCT 17/1992 = 0.85%
20:43:13Araq rawNewObj 14/1992 = 0.70%
20:43:15Araq newObj 16/1992 = 0.80%
20:43:16Araq newNode 8/1992 = 0.40%
20:43:18Araq copyTree 36/1992 = 1.8%
20:43:19Araq matchesAux 24/1992 = 1.2%
20:43:21Araq matches 23/1992 = 1.2%
20:43:22Araq resolveOverloads 35/1992 = 1.8%
20:43:24Araq semOverloadedCall 44/1992 = 2.2%
20:43:25Araq ... 96/1992 = 4.8%
20:43:27Araq CommandCompileToC 101/1992 = 5.1%
20:43:29Araq MainCommand 101/1992 = 5.1%
20:43:30Araq HandleCmdLine 101/1992 = 5.1%
20:44:14Araqthe CRC stuff wins again
20:44:31AraqEntry: 0/770 Calls: 21/879 = 2.4% [sum: 21; 21/879 = 2.4%]
20:44:32Araq updateCrc32 4/1992 = 0.20%
20:44:34Araq newCrcFromRopeAux 3/1992 = 0.15%
20:44:36Araq crcFromRope 3/1992 = 0.15%
20:44:43Araqis the top entry
20:44:50zaharyhmm, I'm getting much more actually, weird
20:45:40zaharyI'm getting 12.5% now - indeed mostly because of GC functions after it
20:45:53AraqI'm using -d:release --stackTrace:on
20:46:24Araqwell the copyTree is really bad in any case:
20:46:29Araqthink about nested calls
20:46:39Araqwe copy them again and again
20:46:54Araqwe copy the whole call
20:47:04Araqand each argument again
20:47:09zaharymy profile says the worst copyTrees are these in matchesAux
20:47:32Araqthat's what I'm talking about, yeah
20:47:33zaharythis makes sense as they are performed once per-candidate
20:48:06AraqEntry: 10/770 Calls: 4/879 = 0.46% [sum: 81; 81/879 = 9.2%]
20:48:08Araq genericAssignAux 45/1992 = 2.3%
20:48:10Araq genericAssignAux 45/1992 = 2.3%
20:48:11Araq genericAssignAux 45/1992 = 2.3%
20:48:13Araq genericAssignAux 45/1992 = 2.3%
20:48:15Araq genericAssignAux 45/1992 = 2.3%
20:48:16Araq genericAssignAux 45/1992 = 2.3%
20:48:18Araq genericAssignAux 45/1992 = 2.3%
20:48:19Araq genericAssignAux 45/1992 = 2.3%
20:48:21Araq genericAssignAux 45/1992 = 2.3%
20:48:22Araq genericAssign 6/1992 = 0.30%
20:48:24Araq resolveOverloads 35/1992 = 1.8%
20:48:25Araqthis is also pretty bad
20:48:35AraqI'm not even sure where the call to genericAssign comes from
20:48:56zahary1.8% total for genericAssignAux here
20:49:39Araqwell I take it that means my profiler works
20:50:27Araqbut it's not suprising really, calls and nkDotExpr are everywhere
20:50:36Araqso it's important to optimize these
20:51:18Araqit doesn't make sense to optimize semYield because there are so few 'yield' statements in Nimrod code ;-)
20:52:17Araqoh and I compute the percentages wrong :D
20:52:32zaharyhow much do you get for collectbody ?
20:53:15AraqcollectCTBody 17/1992 = 0.85%
20:53:32Araqbut I'm not sure how to compute it
20:53:44Araqthe 1992 refers to the total stack slots
20:53:57Araqbut it should be for unique stack slots, right?
20:54:18zahary1992 total slots in your cache table?
20:54:42zaharyit should be real active slots
20:55:00Araqno 1992 slots in total
20:55:15Araq879 stack traces
20:55:53zaharyso what's 1992? capacity (that's what I meant)
20:56:32Araqwell it occurs in 12 stack traces out of 49
20:56:58Araq1992 is the length of all stack traces together I think
20:57:29zahary12/49 is much closer to what I measure
20:57:46Araqyeah, I should record that instead
20:57:52zaharyI get 22.3
20:57:55zahary%
20:58:23Araqclose enough :-)
20:58:48zaharyyou don't have functions like setjmp or memcpy on your radar, right?
20:59:04zaharyno way to get to them in iterruption point
20:59:20Araqthat doesn't matter, does it?
20:59:35Araqit will be attributed to the proc calling these
21:00:17zaharyyes
21:19:31Araqit's now collectCTBody 67/787 = 8.5%
21:19:55Araqit occurs in 67 out of 787 stack traces
21:20:10Araqhowever, that includes stack traces which occur only once
21:20:20Araqdunno if that's the correct way to count it
21:20:38Araqa stack trace that occurs only once is random noise ...
21:21:16Araqhowever only 44/787 occur more than once ...
21:22:29zaharywhy random noise - stack trace that occurs once in just half as good as stack trace that occurs twice
21:23:14Araqhm
21:23:50AraqI am thinking about init time
21:24:06Araqoh, so it's once in processCmdLineArgs() ...
21:26:12Araqnah, it's wrong
21:26:28Araqif it occurs in a stack trace that occured 3x
21:26:35Araqit should have weight 3
21:28:55zaharyif there is 100 traces collected at equal intervals of time and proc X is present on 16 of them, then it's likely that 16% of the time of program can be eliminated by deleting the proc X
21:29:38Araqnow that's a useful statement
21:30:18Araqso weight 3 for a stack trace that occured 3x is correct, right?
21:31:15zaharywell, what do you do with that weight?
21:31:24Araqsum over it
21:31:34zaharyok, then it's right
21:31:55AraqI get: collectCTBody 77/789 = 9.8% then
21:32:25Araqnah arghhh
21:32:35Araqit's too late already
21:33:05Araqit should be 900 cause that's the number of samplings
21:33:46Araqmaking it 8.5%
21:33:59Araqthat's not in line with your results, is it?
21:38:35zaharyI'm getting 22% on both systems
21:39:14zaharywhat happens when you change the sampling frequency?
21:41:39Araqwell it's a bit weird
21:41:46Araqthe frequency is 5ms
21:42:06Araqand the program runs for 5.1 secs
21:42:17Araqwhich means I should 1000 samples
21:42:24Araqbut I get only 899 samples
21:42:50Araqwhich is suspicious
21:44:02AraqI guess sampling itself is that expensive
21:46:22Araq3332 samples for 1ms frequency :-/
21:47:07Araq collectCTBody 278/3332 = 8.3% now
21:47:36Araq semExpr 1475/3332 = 44.% btw
21:50:21zaharywhat's 60% and more?
21:50:29zaharybtw, there is big difference between clean build and rebuild
21:50:50zaharyI usually benchmark the rebuild so the C compiler will be out of picture
21:50:57Araqso do I
21:51:12Araqkoch boot -d:release --profiler:on --stackTrace:on
21:51:16zaharywell, semExpr is 80+% for me
21:52:24AraqCommandCompileToC 3332/3332 = 1.0e+02%
21:52:29Araqaka 100%
21:53:52Araqwell I know why it's only 44% for semExpr
21:54:03AraqI don't record whole stack traces
21:54:16Araqonly 20 entries are in a stack trace
21:54:30zaharyaha, I was about to ask this on the first result for collect
21:54:44Araqso if semExpr is in the "..." section, it is not recorded
21:55:12Araqlast 3 entries are always the stack bottom
21:55:22Araqso it's accurate for CommandCompileToC
21:59:43Araq lookupInRecordAndBuildCheck 136/3332 = 4.1%
22:02:33zahary7% - it's consistently half the time
22:03:02Araqsame as semexpr applies for it I guess
22:03:21Araq newCrcFromRopeAux 184/3332 = 5.5%
22:03:40zahary6.5%
22:03:50AraqcrcFromFile 57/3332 = 1.7%
22:04:12zahary0.8%
22:04:29zaharybut I have SSD disk
22:04:45Araqshould be in RAM anyway
22:05:00Araqbut maybe linux's caching isn't as good as I think it is
22:05:26Araq copyTree 387/3332 = 12.%
22:05:41zaharyyep, me too
22:06:10zaharyrawalloc is leaf for me 84% of the time
22:06:15zaharyshould be shallow for you too
22:06:31Araqindeed it is
22:06:46zahary10%
22:07:07AraqnewObj 423/3332 = 13.%
22:07:47Araq newObjRC1 89/3332 = 2.7%
22:07:48Araq newSeqRC1 114/3332 = 3.4%
22:08:03zaharynewObj 23.3%, leaf only 14%
22:08:37zaharynewSeq 6.8%
22:08:45Araqhu, how can this ever be a leaf?
22:09:06zahary6.8% total time
22:09:37zaharyany function can be leaf if the profiler interrupted execution somewhere within the body
22:10:01AraqI see
22:10:43Araqso it's accurate with a factor of 2 :-)
22:10:46zaharytry raising the depth limit to see whether these 2x discrepancies are indeed coming from it
22:11:07Araqgood idea
22:12:34Araq semExpr 2041/3349 = 61.% with a depth limit of 35
22:12:58Araqso I guess it'll become your number if I'd increase it further
22:13:24AraqnewObj 427/3349 = 13.%
22:13:40Araqbut it's always near top anyway
22:13:58AraqcopyTree 373/3349 = 11.% same
22:16:47zaharyhow much do you get for rawalloc?
22:17:01Araqcan't measure it
22:17:20zaharyit has profiling off?
22:17:28zaharyit's a nimrod proc, no?
22:17:38Araqyeah but I turned it off
22:19:19Araq semExpr 2041/3349 = 61.%
22:19:20Araq semExprWithType 1075/3349 = 32.%
22:19:33Araqthis is interesting, I would have guessed they are the same
22:19:33*Trix[a]r_za is now known as Trixar_za
22:21:18zaharyI get the same
22:22:19zaharyif I break down semExpr, I can see that evalImport is called from it, which in turn carries all the passes on the module (sem, cgen, etc)
22:22:33zaharyso that's how it takes all the time
22:22:44Araqah I see
22:23:10Araq newObj 427/3349 = 13.%
22:23:12Araq ropef 249/3349 = 7.4%
22:23:14Araq genTraverseProc 28/3349 = 0.84%
22:23:48zaharyropef 8.9%
22:23:49zaharygenTraverse 0.2%
22:24:24zaharynewObj 23% - this is that last mystery
22:25:13Araqis it? maybe it's less severe when everything runs slower due to the stack tracing
22:26:20AraqscanComment 15/3349 = 0.45%
22:26:53zaharyscanComment 0.2%
22:27:23*Trixar_za is now known as Trix[a]r_za
22:28:45*Trix[a]r_za is now known as Trixar_za
22:32:05Araqbut I have to sleep now, good night
22:32:31Trixar_zaGoodnite Araq
22:53:32*zahary quit (Quit: Leaving.)
22:54:53*q66 quit (Quit: Quit)
23:00:13*apriori| quit (Remote host closed the connection)