#nimrod logs for 16-09-2012

<<16-09-2012>>

01:12:42	*	Trixar_za is now known as Trix[a]r_za
06:47:58	*	Boscop quit (.net .split)
06:47:59	*	ccssnet quit (.net .split)
06:48:00	*	reactormonk quit (.net .split)
06:48:01	*	mal`` quit (.net .split)
06:48:01	*	fowl quit (.net .split)
06:48:01	*	Roin quit (.net .split)
06:48:01	*	comex quit (.net .split)
06:48:01	*	silven quit (.net .split)
06:48:02	*	CodeBlock quit (.net .split)
06:48:02	*	Owner quit (.net .split)
06:48:02	*	moxie quit (.net .split)
06:48:03	*	Amrykid quit (.net .split)
06:48:05	*	rking quit (.net .split)
06:48:05	*	JStoker quit (.net .split)
06:48:05	*	dom96 quit (.net .split)
06:48:05	*	Araq quit (.net .split)
06:48:05	*	Trix[a]r_za quit (.net .split)
06:48:07	*	XAMPP quit (.net .split)
06:48:08	*	Reisen_ quit (.net .split)
06:48:08	*	shevy quit (.net .split)
07:11:42	*	moxie joined #nimrod
07:11:42	*	XAMPP joined #nimrod
07:11:42	*	shevy joined #nimrod
07:11:42	*	Owner joined #nimrod
07:11:42	*	reactormonk joined #nimrod
07:11:42	*	Reisen_ joined #nimrod
07:11:42	*	silven joined #nimrod
07:11:42	*	Amrykid joined #nimrod
07:11:42	*	mal`` joined #nimrod
07:11:42	*	fowl joined #nimrod
07:11:42	*	rking joined #nimrod
07:11:42	*	Araq joined #nimrod
07:11:42	*	JStoker joined #nimrod
07:11:42	*	Roin joined #nimrod
07:11:42	*	dom96 joined #nimrod
07:11:42	*	CodeBlock joined #nimrod
07:11:42	*	Trix[a]r_za joined #nimrod
07:11:42	*	comex joined #nimrod
07:11:45	*	Boscop joined #nimrod
07:11:46	*	ccssnet joined #nimrod
09:45:33	*	moxie quit (Ping timeout: 276 seconds)
09:47:00	dom96	hello
09:47:05	Araq	hi dom96
09:47:17	dom96	Araq: Did you not read scrollback?
09:47:51	Araq	I think I did, why?
09:47:59	dom96	graphics.nim/DrawLineAA -- sdl_gfx (as fowl pointed out) contains a function which does this.
09:48:14	dom96	Should we alias DrawLineAA to that or remove DrawLineAA altogether?
09:48:19	Araq	so what, native nimrod code is nicer
09:48:38	dom96	Yeah, but it's incorrect :P
09:48:44	dom96	it doesn't work half the time
09:48:46	Araq	sdl_gfx is an additional package?
09:49:00	Araq	well why does it not work?
09:53:42	dom96	it fails on some coordinates for some reason
10:02:29	dom96	Araq: Did you see my reminder that we should test everything in taint mode? :P
10:05:42	Araq	the tester itself is compiled in taint mode
10:05:49	Araq	so there is some checking
10:06:08	Araq	however, ftpclient etc. indeed need to be tested in taint mode
10:08:46	dom96	mmm
11:20:47	*	apriori\| joined #nimrod
11:21:00	apriori\|	hi guys
11:21:04	Araq	hi apriori\|
11:21:15	apriori\|	is it me being unable to count, or is the user count actually increasing?
11:21:35	Araq	increasing
11:21:41	apriori\|	good :)
11:21:55	apriori\|	Araq: I've got a question concerning your GC
11:22:03	Araq	alright
11:22:21	apriori\|	now I'm definetly no expert on this topic.. but you said, slices are bad for it because of the interior pointers they represent
11:22:28	Araq	yeah
11:22:36	apriori\|	aren't there actually gc adaptions that can deal in a sane way with it?
11:23:10	apriori\|	yesterday I found a paper, for example, about that exact topic.. though I don't quite now, whether that is supposed to be used with your GC variant
11:23:49	apriori\|	funny enough, even though apparently being from after 2006 it references tons of papers from the mid 70s
11:23:56	apriori\|	http://benediktmeurer.de/files/fast-garbage-compaction-with-interior-pointers.pdf
11:26:16	Araq	you can do quite a lot to keep overhead neglible
11:26:29	Araq	just like you can to have only 1 pointer type like D does
11:27:38	Araq	but it seems much wiser to disallow both
11:28:08	Araq	Go and D both start with an incredible permissive memory modell
11:28:27	Araq	and GC performance is left as an implementation detail
11:28:34	Araq	this is not Nimrod's design at all
11:29:31	apriori\|	while I can agree with that... I think, since I also used D quite a while, slices offer a really convenient way to deal with subranges in an array ...
11:30:12	apriori\|	and that, efficient, without copying, etc.
11:31:29	Araq	walter claims D's design allows for a precise copying collector
11:31:47	Araq	until that is actually implemented and works (!) I doubt it
11:32:54	apriori\|	I understand... well, the GC is definetly a major topic in D, which the devs seem to have real problems with...
11:33:12	apriori\|	D community devs complain a long time already about the GC ..
11:34:46	apriori\|	I guess for the matrix stuff (which is my current motivation why I'd like to have efficient slices) I will just stick to pointer + offsetrange and create respective function overloads
11:35:06	apriori\|	Araq: did you happen to see eigen3? the linear algebra library?
11:35:45	Araq	adrian benchmarked slices and on a modern CPU ptr+offset is as efficent as 'ptr' only
11:36:07	Araq	so you can do the slices just fine and keep the base pointer around for the GC
11:36:17	apriori\|	yeah, that's what I meant
11:36:26	apriori\|	so you wont have interior pointers
11:37:21	apriori\|	well, about eigen3... I absolutely like its design. its a c++ header-only library, which uses a hell lot of expression template magic to create a really convenient API
11:37:22	Araq	btw the GC supports interior pointers on the stack
11:37:35	apriori\|	hm
11:37:46	Araq	and it caused huge pains
11:37:56	apriori\|	what causes huge pains?
11:38:05	Araq	and it took me a month to get the overhead down to almost 0
11:38:15	apriori\|	oh, ok.. the interior pointers
11:38:50	Araq	the GC has to support it because we lack the control over the generated assembler code
11:38:51	apriori\|	so its fine to pass in ptr+range in a function and calculate that down to pointers in a function and use these...
11:38:58	apriori\|	or what would be the real consequence of that?
11:39:20	Araq	I'm just explaining why I hate them
11:39:25	apriori\|	okk
11:39:26	apriori\|	ok
11:39:45	apriori\|	about lack of control...
11:39:59	apriori\|	do you plan to write a real backend sooner or later?
11:40:03	apriori\|	like using llvm or so?
11:40:17	Araq	not anymore
11:40:29	Araq	at least not before 1.0 is out
11:40:40	apriori\|	yeah, I understand
11:41:04	apriori\|	well, some languages also dealt quite well with it to compile down to C for a while until a real backend was developed
11:41:07	apriori\|	(haskell?)
11:41:20	Araq	eiffel still does that
11:41:36	apriori\|	in a way it really makes sense...
11:41:37	Araq	though I don't know about the "dealt quite well" ;-)
11:41:50	apriori\|	one automatically inherits all improvements on the compilers
11:42:14	apriori\|	well, I think.. one problem is the lack of propper debugging
11:42:39	Araq	I plan to patch the generated debug info
11:42:44	apriori\|	if one compiles down to C, one would at least need a plugin for the debugger to "demangle" (if one could it that) the generated code
11:42:48	Araq	so that it show demangled names
11:43:03	apriori\|	yeah, that would be nice
11:43:17	apriori\|	I like the edb idea tough...
11:43:19	Araq	on the other hand, I can easily debug the generated C already
11:43:29	apriori\|	at least one debugger that would be exactly the same, everywhere
11:43:57	Araq	edb is nice, but it's often too slow
11:44:10	apriori\|	and needs a recompile of the source
11:44:22	Araq	plus having the debugger in the same address space is insane ;-)
11:44:27	apriori\|	hehe, yeah
11:44:42	apriori\|	one crash/messing with the registers tearing it all down
11:46:58	apriori\|	btw., about the snippets of foldl/foldr etc. I once showed you...
11:47:04	Araq	btw one motivating example for the TR macros is copy elimination for slices
11:47:16	apriori\|	actually foldl should be possible to be optimized because it is a tail call
11:47:37	Araq	yeah one of the 4 was a tail call iirc
11:48:12	Araq	but I dislike tail calls sometimes ;-)
11:48:17	apriori\|	why?
11:48:30	fowl	can i set up a tr so that array[#, type] expands to array[0..#-1, type]
11:48:56	Araq	fowl: perhaps, I guess not
11:49:21	Araq	we could add it to the language if it constantly annoys you though
11:49:29	apriori\|	I would second that
11:49:39	Araq	I know ;-)
11:49:49	fowl	# would be an int literal only
11:49:50	apriori\|	it is a nice feature to allow a range in there (btw... is a set also allowed?)
11:50:03	fowl	apriori\|: you can use enums
11:50:10	apriori\|	fowl: yup, I know that
11:50:15	apriori\|	I meant an oridnal type with "holes" in it
11:50:35	Araq	no, how could that work?
11:50:42	apriori\|	only via mapping
11:50:55	Araq	that's not an 'array' then
11:51:05	Araq	use a table for that
11:51:13	apriori\|	okay
11:51:40	Araq	fowl: if it's important enough for you, make a feature request
11:51:40	apriori\|	Araq: about using # instead of a...b
11:51:56	apriori\|	I think one actually rarely uses the a..b specialcase, in which indexing doesnt start with 0
11:52:12	Araq	about the tail calls: principle of least power; recursion can do much more than a simple loop
11:52:36	apriori\|	sure.. but recursion is also convenient ;)
11:52:44	Araq	so a loop is easier to read when you care about performance
11:53:17	Araq	there are also no tail calls in debug mode due to the stack tracing
11:53:23	apriori\|	ouch
11:53:45	apriori\|	that could make code code run a hell lot slower in debug mode ;)
11:54:17	Araq	yeah
11:54:32	Araq	the a..b special case is more common if the index type is an enum
11:54:39	apriori\|	yep
11:54:56	apriori\|	but for "normal" arrays its not
11:55:06	Araq	but there are also lots of people who prefer 1..N
11:55:22	apriori\|	yeah, well I currently constantly write using low/high only..
11:55:34	Araq	however they have a hard time already as strings and sequences start with 0 ..
11:55:56	apriori\|	and with the matrix multiplication for example, I would even have to make sure, that the len of the index space is identical, no matter how wicked it was setup
11:56:11	apriori\|	not quite sure though, whether that's actually a good idea there
11:56:26	Araq	you'll soon write m.first .. m.last everywhere
11:56:33	apriori\|	so currently I more work on low(A)+offset and low(B)+offset etc.
11:56:41	Araq	to support efficient slicing ...
11:57:42	apriori\|	okay :)
12:00:07	Araq	well if you want efficient slicing that is
12:00:21	Araq	I'm not sure it's useful all the time
12:00:47	apriori\|	one needs exact control over when copies are done and when not
12:01:27	apriori\|	and, well... for linear algebra stuff efficient slicing definetly rocks
12:02:00	Araq	how can you program efficient generic matrixes anyway? 4x4 vs. 100x100 is a big difference, right?
12:02:26	apriori\|	sure it is
12:02:42	apriori\|	with bigger matrices one would switch over to non-full representations
12:02:49	apriori\|	coordinate matrix, etc.
12:03:05	Araq	yep
12:03:13	Araq	so what should the stdlib support?
12:03:39	apriori\|	but those assume that matrices are actually sparse.. and there also exist cases in which they simply arent..and full sparse matrices are of course a hell lot slower than simple 2-dim arrays (or 1 flat array)
12:03:54	apriori\|	the stdlib should of course support both
12:04:30	apriori\|	and I also wouldn't include some automatic logic to switch over to e.g. sparse matrix for bigger matrices
12:04:41	apriori\|	that would be a bad idea
12:05:23	apriori\|	because usually one implements coordinate matrices as a map of a triple (i, j, V), keeping it sorted by one coordinate
12:05:52	apriori\|	and this would results in insertion order being very important for the construction performance of the sparse matrix
12:07:18	apriori\|	the stdlib should also make sure, that the matrix representation is compatible with other stuff out there...
12:07:46	apriori\|	e.g. column-major in its representation for passing it directly without transposing into OpenGL
12:08:39	Araq	ah, didn't know about that
12:09:34	apriori\|	yeah, well.. I guess that column major stuff is a relict of the early fortran times, though I really don't know where that convention comes from
12:10:24	Araq	because it's "intuitive"? a[x, y] == a[x][y]
12:11:10	apriori\|	yeah, I guess so
12:11:25	apriori\|	but it constantly confuses me..because.. AFAIK c++ is row-major
12:11:40	Araq	yes
12:12:09	Araq	in fact, I think of an array of strings to get it right ;-)
12:12:18	apriori\|	hehe
12:14:12	apriori\|	I'm not quite sure, whether we should later add support for common BLAS and *PACK packages
12:14:33	apriori\|	I mean.. they are widely used, yes.. but hell, their API sucks so bad, one definetly would need to wrap them.
12:15:06	Araq	btw I've implemented a simple stack trace profiler :D
12:15:08	apriori\|	BLAS is like a library for basic operations on the types (e.g. vector operations) and usually it is replaced on the ABI level
12:15:30	Araq	the "embedding" part makes much more sense for a profiler than a debugger
12:15:47	apriori\|	yeah, when I read that commit messages I though: "wtf.. didnt he just wrote ""I guess I write one on my own"" this very evening?!"
12:16:08	Araq	yeah, didn't feel like hunting bugs ;-)
12:16:10	apriori\|	agreed.. but such a profiler should definetly support sampling
12:16:42	Araq	what do you mean? it already does
12:17:17	apriori\|	with sampling I actually mean, not really counting very instruction one by one.. which usually results in a quite slow behaviour
12:17:27	apriori\|	but only take a sample of an "area of interest"
12:17:38	Araq	{.push profiler:off.} :P
12:17:57	apriori\|	one should invert the logic
12:17:59	Araq	but yeah, that doesn't do quite the right thing
12:18:04	apriori\|	some global flag.. and some local regions
12:18:20	Araq	yeah, good point; I'm adding it
12:18:35	apriori\|	you're just insane, man ;)
12:18:43	apriori\|	pulling off such a huge pile of work on your own
12:20:02	apriori\|	as an addition, I meant: https://en.wikipedia.org/wiki/Profiling_(computer_programming)
12:20:11	apriori\|	read the part about "statistical profilers"
12:21:04	apriori\|	while not extremely precise, sampling profilers can at least give you a good hint about bottlenecks without impacting runtime performance too much
12:21:34	Araq	well it is a sampling profiler
12:21:49	apriori\|	interesting..that MIPS even had hardware support for it
12:21:53	apriori\|	ok, good :)
12:22:25	fowl	i had to rearrange an enum so tthat it was in order
12:22:42	fowl	last enum on this page: http://enet.bespin.org/protocol_8h.html
12:22:53	Araq	it used to be 'flat' profiler but then there's already gprof
12:22:53	fowl	is it still compatible if rearranged?
12:23:20	Araq	fowl: I'd think so
12:23:46	Araq	however it looks like you want an enum + a set to represent it in Nimrod
12:24:29	apriori\|	Araq: btw.. couldn't one just generate a static internal mapping for such enums with "holes"?
12:24:44	apriori\|	like.. one has e.g. the actual enum values 2, 4, 8, etc.
12:24:52	Araq	why not do instead:
12:25:01	apriori\|	there is no problem to actually interpret those as 0, 1, 2 in an array
12:25:31	Araq	type TMyEnum = enum meA = 0, meB, meUnused, meUnused, meC
12:25:47	fowl	whys that? i have ENET_PROTOCOL_HEADER_FLAG_MASK = ENET_PROTOCOL_HEADER_FLAG_COMPRESSED.cint or ENET_PROTOCOL_HEADER_FLAG_SENT_TIME.cint in the enum and its not raised any problems
12:25:48	Araq	er, meUnused2 of course for the second
12:26:17	apriori\|	Araq: bad thing.. because if people use e.g. 2^x, then you got a hell lot of such unused sections to write
12:26:34	apriori\|	but, yeah, you already said it.. thats definetly a "flagging" case, which is better described using a set
12:26:38	Araq	2^x means you want a set of an enum anyway
12:27:15	Araq	and yeah if you can't map it easily you have to live with 'const's and inferior type checking
12:27:32	Araq	though you can also do:
12:27:38	Araq	type TEnum = distinct cint
12:27:46	Araq	const valueA = TEnum(0)
12:28:10	apriori\|	interesting
12:28:12	Araq	proc `or`(a, b: TEnum): TEnum {.borrow.}
12:28:46	apriori\|	Araq: not wanting to ennerve you again.. but.. pleeaase start documenting every single pragma there is ;)
12:30:39	Araq	it's all documented
12:30:45	Araq	but you have to read the manual
12:30:51	apriori\|	ok
12:30:52	Araq	and skip the verbose introduction
12:31:08	Araq	the tuts simply don't cut it
12:31:11	Araq	brb
12:31:18	apriori\|	k
13:01:25	Araq	back
13:06:36	*	apriori\| quit (Ping timeout: 255 seconds)
13:31:06	*	apriori\| joined #nimrod
13:43:35	*	XAMPP quit (Ping timeout: 272 seconds)
13:52:32	apriori\|	Araq: one question..
13:52:39	apriori\|	about constrained general types
13:52:53	Araq	they're a bit limited for now, yes
13:53:02	Araq	basically you can do:
13:53:08	apriori\|	would you prefer to have that constraint (e.g. square matrix) expressed by using only the matching arguments in the function
13:53:13	apriori\|	or should I check with assert?
13:53:42	Araq	hu?
13:53:46	Araq	can't follow
13:53:51	Araq	do you mean:
13:53:52	apriori\|	like: matrixmultiply instead of 4 dimensions only 3.. thereby already enforcing that matrixes are compatible
13:54:25	Araq	I have no opinion about it
13:54:38	apriori\|	http://pastebin.com/5q2zZNgK
13:55:13	apriori\|	I think, assert might be better, because it allows an actual non-crypting error message
13:55:27	Araq	resul[i, j] = cast[T](0) # unnecessary, but if you want to do it, use:
13:55:36	apriori\|	whereas the other variant will just spit out a not neccessarily helpful compiler error
13:55:42	Araq	result[i, j] = T(0)
13:55:49	apriori\|	ok
13:55:55	apriori\|	but usually it would implicitly cast?
13:55:58	Araq	and yeah the identifier lookup in generics needs to be fixed
13:56:17	Araq	typo: 'resul' vs 'result'
13:56:27	apriori\|	yeah.. not compiled yet
13:56:28	Araq	and the compiler doesn't complain until instantiation time
13:56:34	apriori\|	yes
13:56:36	apriori\|	but...
13:56:43	Araq	it's really horrible and I apologize ;-)
13:56:52	apriori\|	it does have no real clue about a constraint in place in in version 2
13:57:11	Araq	and yeah, usually it would implicitely cast
13:57:12	apriori\|	it would just say "can't find a matching function"
13:57:40	Araq	you should be able to do:
13:58:05	apriori\|	hm.. I guess this is a good usecase for when + hint
13:58:08	apriori\|	+ assert
13:58:12	Araq	yeah
13:58:19	Araq	was about to suggest that
13:58:22	Araq	:-)
13:58:24	apriori\|	:)
14:04:47	Araq	apriori\|: this program is pretty bad for the GC as it generates lots of garbage
14:04:59	Araq	for i in 0.. 1_000_000:
14:05:15	Araq	let s = formatFloat(i)
14:06:01	Araq	there are lots of ways to deal with this, but they all have problems:
14:06:41	Araq	a) change formatFloat to take a 'result: var string' so that the memory can be re-used within formatFloat
14:07:17	Araq	b) fix the string implementation to have an STL-like "short string" optimization
14:07:36	Araq	where a short string is directly embedded in the object
14:08:25	apriori\|	hm
14:08:45	apriori\|	do you have a minimal format string example?
14:08:57	Araq	strutils.formatFloat
14:08:59	apriori\|	e.g. the equivalent of printf("something %i", bla)"
14:09:29	Araq	c) make strings reference counted like in Delphi
14:10:02	Araq	c) is really nice when the output target is C++ because we have fast exception safe destructors there
14:10:52	Araq	the real problem is that we want proc p: string for convenience
14:10:54	apriori\|	(btw.. % operator similar to pythons... that rocks ;)
14:11:00	apriori\|	yeah...
14:11:08	Araq	but need proc p(result: var string) for performance
14:11:41	Araq	would be nice if we could transform one into the other
14:12:06	Araq	TR macros can do the transformation
14:12:23	Araq	but that still requires 2 different 'p' implementations
14:13:19	apriori\|	yes
14:13:24	apriori\|	btw...
14:13:30	apriori\|	how are those tr macros actually applied?
14:13:39	apriori\|	because.. in term rewriting order should matter
14:13:50	Araq	in an extra pass over the AST after semantic checking
14:14:31	Araq	the longest match should win but currently it's applied in reverse definition order
14:14:39	apriori\|	ok
14:15:06	Araq	well hm
14:15:10	*	zahary joined #nimrod
14:15:28	Araq	I'm wrong, it should already implement "longest match wins"
14:15:47	apriori\|	okay
14:18:22	Araq	hi zahary
14:18:44	zahary	hi
14:21:12	apriori\|	Araq: btw: say I got those types:
14:21:15	apriori\|	TMatrixNM*[M, N, T] = array[M, array[N, T]]
14:21:16	apriori\|	TVectorN*[A, T] = array[A, T]
14:22:12	apriori\|	assume I got a matrix with only say 1 row or 1 column.. will the codegen for array[0..1, array[0..n, T]] expand to the same as array[0..n, T] ?
14:23:04	apriori\|	I mean.. is that obvious useless extra dimension stripped?
14:23:16	Araq	no but GCC may do that
14:23:26	Araq	it should ;-)
14:24:01	apriori\|	hm, because I currently think I could also just treat vectors as special matrices
14:24:32	Araq	just try it and look the generated assembler
14:24:37	Araq	*look at
14:24:39	apriori\|	yeah, I guess so
14:24:47	Araq	it's the only thing that works with modern optimizers
14:25:15	apriori\|	okay
14:25:30	*	q66_ joined #nimrod
14:25:36	Araq	benchmarking may not help as much as that could trigger some cache anomalies
14:26:05	Araq	where the less efficient code runs faster because it's at an address that happens to not produce a cache conflict
14:26:22	apriori\|	yeah... cache aware programming sucks ;)
14:27:16	*	q66_ is now known as q66
14:30:12	Araq	zahary: read the history please about the formatFloat issue
14:31:07	Araq	any ideas how to solve the "excessive proc duplications for strings" problem?
14:38:25	zahary	what is the issue again?
14:38:42	Araq	for i in 0.. 1_000_000:
14:38:51	Araq	let s = formatFloat(i)
14:39:25	Araq	produces lots of garbage
14:39:58	Araq	and the GC is not really good for this
14:43:20	apriori\|	btw: how would I need to write an overload for a len function for ranges?
14:43:33	apriori\|	I mean.. what is the proper type of a range[T]?
14:43:42	*	zahary quit (Read error: No route to host)
14:43:51	apriori\|	because something like len*[T](x: range[T]): int does not work
14:43:52	*	zahary joined #nimrod
14:44:07	zahary	I guess it's possible to statically determine that this string is temporary and then some alternative allocation strategy could be used for it
14:45:07	Araq	yeah and you can do it with a TR macro
14:45:15	Araq	but the problem is you wnat
14:45:21	Araq	proc p: string
14:45:25	Araq	for convienience
14:45:27	Araq	and
14:45:38	Araq	proc p(result: var string) for performance
14:45:57	Araq	and you need to provide both
14:46:04	Araq	and then the TR macro can do its job
14:48:33	Araq	apriori\|: proc len*(x: typedesc[range]): int
14:48:38	zahary	i've said before that I think RVO should be used everywhere and when not explicitly told otherwise procs should be able to construct objects both on the stack and on the heap (if the result is a new heap location, the compiler just places an allocation call before the proc call)
14:49:39	Araq	I remember now, but that's not the only solution
14:50:00	Araq	if you have 'result: var string' you can easily do:
14:50:05	Araq	setLen(result, 0)
14:50:14	Araq	# fill 'result' then
14:50:34	Araq	with no new allocations unless it happens to need a buffer resize
14:50:51	Araq	and it doesn't matter that it's been placed on the heap
14:51:04	zahary	you mean, var string is more optimal when there is a chain of operations?
14:51:05	*	Boscop quit (Ping timeout: 244 seconds)
14:51:15	zahary	let me think about it
14:51:19	Araq	in fact, it may be beneficial to have it on the heap because it can stay there
14:51:50	Araq	whereas if it's on the stack you likely end up copying it to the heap later
14:55:59	zahary	the question boils down to whether it's possible to optimise a proc like uppercase(x: string): string to sometimes reuse existing memory
14:57:33	*	Boscop joined #nimrod
14:59:30	zahary	in C++, the current thinking is that if you accept the input as copy it would be possible for the compiler to reuse its buffer in the result string. also sometimes it would be possible to avoid copying the input string when calling the function (when the input is a rvalue)
14:59:53	zahary	obviously uppercase(x: var string) is different interface so we are ignoring it here
15:03:20	zahary	C++ is not completely optimal as there is still code for resource stealing and noop destructors being executed even in the best case - in theory it's possible to compile uppercase(x: string): string automatically in 2 forms (one of them being uppercase(result: var string))
15:10:37	apriori\|	Araq: that does not work :/
15:12:44	apriori\|	http://pastebin.com/Jcm06fDj
15:13:33	apriori\|	I know that test is silly, because there is array.len, but its just a test for extracting the range and passing that over to the proc
15:15:02	zahary	well, apriori\|, why are you passing the array to the test function. you should pass the T type
15:15:36	apriori\|	yep, right
15:15:42	apriori\|	but doesnt change the error message
15:16:16	zahary	let me try it. it's probably my fault :)
15:18:53	zahary	this works for me: http://pastebin.com/6zbeNhDz
15:19:10	zahary	indeed, there is some error when you say typedesc[range] - I'll look into it later
15:20:39	apriori\|	ok, testing
15:21:09	apriori\|	yup, working, thank you
15:21:23	Araq	zahary: well yeah the problem is the optimization changes semantics
15:21:43	Araq	'result: var T' is an inout as opposed to an 'out'
15:25:24	zahary	the optimisations I'm describing are possible if you know that the input string lifetime ends at the proc call - in C++ you can then move it to the param instead of copying it and in the other hypothetical optimisation you then choose the alternative "var version" of uppercase
15:30:33	Araq	it's not at all about copying vs. moving IMO
15:30:57	Araq	in fact the code snippet already does not copy, but move
15:36:40	Araq	the problem is that the current semantics of a string result prohibit a variant of loop invariant code motion
15:36:43	apriori\|	hrm
15:36:56	apriori\|	Error: illformed AST: m1[0, 0] .. what the hell? :/
15:37:12	Araq	for i: x = p(alloc())
15:37:20	Araq	-->
15:37:23	Araq	y = alloc()
15:37:28	Araq	for i: p(x)
15:38:03	Araq	er, loop hoisting, it's not really "invariant"
15:38:08	*	Trix[a]r_za is now known as Trixar_za
15:41:47	Araq	we could do this:
15:42:16	Araq	proc toUpper(x: string, result: var string) {.asfunction.} # in stdlib
15:42:23	Araq	and then allow usage as
15:42:35	Araq	a function that returns something
15:42:43	Araq	echo toUpper("abc")
15:42:57	Araq	and the compiler introduces a temporary for us:
15:43:07	Araq	var tmp1 = ""
15:43:22	Araq	toUpper("abc", tmp1)
15:43:26	Araq	echo tmp1
15:43:49	Araq	which is something along the lines what you suggested I guess?
15:44:52	*	moxie joined #nimrod
15:44:56	Araq	we can't hide it completely I think because it affects proc var compatibility
15:45:22	Araq	plus for 'int' it's stupid to transform it into 'result: var int'
15:47:13	Araq	thinking about it more, all we need a 'snoopResult' pragma
15:47:36	Araq	proc toUpper(s: string): string {.snoopResult.} =
15:47:56	Araq	if isNil(result): result = newStringOfCap(s.len)
15:48:14	Araq	else: result.ensureCap(s.len)
15:48:30	Araq	for i in 0.. <s.len: result[i] = toUpper(s[i])
15:49:28	Araq	'snoopResult' ensures its passed as 'var T' and that it's not reset/initialized to binary zero
15:50:26	apriori\|	I guess, another bug:http://pastebin.com/893ZZ0ZJ
15:51:34	Araq	it's not a really bug, if you overload [] for arrays the compiler is confused
15:51:54	Araq	and uses the builtin [] which does not support ',' notation
15:52:05	apriori\|	Araq: its not overloaded anywhere there
15:52:08	Araq	so it's an "illformed AST" ;-)
15:52:15	apriori\|	hm...
15:52:30	Araq	I can fix it easily I think
15:52:42	apriori\|	well, if that's the case, how should I overload [] for a matrix which is array[..,array[.., T]]?
15:53:03	Araq	well I expected people to wrap it in an object
15:53:14	Araq	but I don't mind supporting it for arrays
15:53:23	apriori\|	hm, ok
15:54:39	apriori\|	https://github.com/Araq/Nimrod/issues/205 I wonder how that would actually work there...
15:54:46	apriori\|	return type inference out of nowhere?
15:55:04	Araq	yeah somehow peope simply expect that to work ...
15:58:09	apriori\|	zahary: as an update to the ticket you were working on: https://github.com/Araq/Nimrod/issues/202
15:58:24	apriori\|	name doesn't cause ICE anymore, but segfaults
15:58:50	zahary	apriori\|, 2) works for me. can you send me the exact code for which you get a segfault
16:01:28	zahary	Araq, what problem do you see if there isn't snoopResult, but rather the compiler implements it for you automatically in certain cases?
16:02:18	apriori\|	zahary: sry, I'm stupid today.. it does segfault if you pass in a variable instead of a type. which should be expected. but maybe we should provide an overload
16:05:09	zahary	"we can't hide it completely I think because it affects proc var compatibility" - you mean that when the proc is converted to a proc var the optimization is no longer applicable? sure, but this sounds like a rarely triggered edge case. snoopResult can not be overloaded with regular upperCase proc I think, which is somewhat more limiting
16:05:54	zahary	apriori\|, the compiler should never segfault so it's bug after all
16:07:13	apriori\|	zahary: yup, right
16:11:18	zahary	… also you can have a pragma on the proc var rather then the proc itself (much like you define the calling convention)
16:23:36	Araq	'(result: var int)' is not compatible to '(): int'
16:24:02	Araq	'(): int' is returned in a register
16:24:14	Araq	so that's why it affects proc var compatibility
16:24:28	Araq	and yeah 'snoopResult' is for proc var types too
16:24:52	Araq	you can't make it the default because it changes semantics within toUpper's body
16:25:08	Araq	it's this semantic change that allows the optimization
16:25:19	Araq	in toUpper's body you can then do:
16:25:32	Araq	setLen(result, 0); fill result
16:25:48	Araq	this is otherwise not possible because 'result' starts with a 'nil' value
16:26:23	Araq	'snoopResult' transforms the 'result' from an 'out T' into an 'inout T'
16:26:29	fowl	Araq: i think you said not to, but im storing the address of a dereferenced ref and casting it back to a ref and its working, so whats the reason i shouldn't do this?
16:27:22	Araq	depends on what you mean exactly
16:27:41	fowl	i'm using it for c callbacks
16:27:47	Araq	it could be fine but addr x[] is not any different from cast[T](x)
16:27:59	Araq	except more confusing to read IMHO
16:28:25	Araq	so I'd use the 'cast' version
16:29:06	fowl	? i need a generic pointer to store
16:29:47	fowl	but all the related functions expect a ref type so casting it to ptr T isnt useful
16:33:07	Araq	but 'addr x[]' produces a ptr T
16:33:14	Araq	so I can't follow
16:33:36	fowl	because i cant store ref T as a generic pointer
16:34:34	Araq	well yeah so you need a 'ptr T'
16:34:49	Araq	so you can do: cast[ptr T](myRefT)
16:35:02	Araq	no need to do: addr myRefT[]
16:35:21	fowl	ooo ok
16:35:25	zahary	Araq, sure, I understand the implications. What I'm saying is that it's possible to compile 2 versions of the proc from a single definition. The compiler automatically uses the "snoop" version in regular chained calls like "foo".toUpper.trim.reverse and when converting the proc to a procvar, it just looks at the procvar type to determine which implementation to pick
16:36:14	zahary	no need for the user to mark the proc as snoop (thus disabling some of its uses in regular code)
16:36:28	Araq	well that's my suggestion as well
16:36:48	Araq	a snoop proc can be called like every other proc
16:37:03	Araq	client code is not affected except for proc vars
16:37:15	*	shevy quit (Read error: Operation timed out)
16:37:34	Araq	but in the implementation of toUpper the programmer needs to provide the optimized version
16:37:42	zahary	ok, but why should I enable my proc for snooping - this should be the default - have a pragma to disable it if you want
16:38:09	Araq	yeah maybe, I don't know
16:38:43	zahary	maybe we still disagree somewhere - snoop only in certain situations in my point of view
16:39:11	zahary	… when the input is a var about to be destroyed
16:40:09	zahary	what happens when you call a snoop proc in expression like
16:40:10	zahary	var a = x.toUpper ?
16:41:04	zahary	x is copied and then passed to the snooped proc?
16:48:07	Araq	why? x ain't copied today
16:48:44	Araq	it's not even a problem if you do: x = x.toUpper
16:49:04	Araq	though it could become a problem for other examples
16:49:22	Araq	but then we already have an alias analyses
16:51:22	zahary	ah, poor example indeed
16:51:57	Araq	well I think 'snoop' shouldn't be the default because it changes semantics in a subtle way
16:52:18	*	shevy joined #nimrod
16:52:40	Araq	and you can do another interesting thing with 'snoop': the proc can actually add to the result buffer instead of replacing its content
16:53:00	Araq	which is very useful for string handling
16:54:05	Araq	though I think it's still not enough to transform $(x: tree) into toStringAux(x: tree, result)
16:54:18	Araq	in the recursive case
16:54:47	Araq	a snoop proc should be allowed to be called like:
16:55:02	Araq	p(x, res)
16:55:10	Araq	instead of: res = p(x)
16:55:53	zahary	what about?
16:55:54	zahary	var a = x.reverse.toUpper # 1) both reverse and toUpper are snoop procs. 2) reverse is not snoop
16:55:54	zahary	so you see snoop as different semantics in which you expect the result to be already constructed so you are supposed to append to it? I seem to care more about automatic copy elision in certain situations
16:57:11	Araq	well it's at least an interesting idea
16:58:03	Araq	1) both are snoops: where is the problem?
16:58:12	Araq	var tmp = ""
16:58:21	Araq	x.reverse(tmp)
16:58:31	zahary	syntax assisted append-semantics are cool idea, but somewhat orthogonal to what I talk about
16:58:54	Araq	var tmp2 = ""; toUpper(tmp, tmp2)
16:59:04	Araq	a = tmp2
16:59:18	Araq	or maybe directly: toUpper(tmp, a)
16:59:34	Araq	2) reverse is not snoop:
16:59:54	Araq	var tmp = x.reverse
17:00:08	Araq	toUpper(tmp, a)
17:00:44	Araq	it's all very easy to do with TR macros except that the TR macro can't optimize the ordinary toUpper easily
17:01:07	Araq	because it can't transform its body
17:01:30	zahary	I suggest that it's possible to do
17:01:30	zahary	var a = x
17:01:30	zahary	reverse(var a)
17:01:30	zahary	toUpper(var a)
17:01:59	Araq	we often have:
17:02:12	Araq	proc `$`(x): string =
17:02:14	Araq	result = ""
17:02:19	Araq	toStringAux(x, result)
17:03:02	Araq	but the compiler can easily transform toStringAux to a proc with result type
17:03:31	Araq	the other direction is MUCH harder to do for the compiler
17:03:50	Araq	so the programmer should only have to provide toStringAux
17:04:14	Araq	which is what 'snoopResult' enables
17:05:23	Araq	your transformation only works with 1 argument procs I think?
17:05:35	Araq	I mean unary procs
17:06:04	zahary	no, the var part is the result - other arguments can still be passed along
17:06:57	zahary	what is hard in my suggestion is compiling the correct proc body to mutate the input in place
17:10:13	Araq	yeah ;-)
17:12:31	zahary	well, the body can be supplied by the user and then the rule is just that
17:12:31	zahary	proc (x: var T, …) is interchangeable to proc(x: T, …): T in certain situations
17:13:46	zahary	still thinking about automatically transforming the body - seems to be possible if you cache reads of certain fields into local variables
17:15:34	zahary	you know the input is immutable, so if you write to some field in the result and then attempt to read the same field from the input, the compiler have to ensure that the original value was cached before the mutation (sure, this will actually result in worst code sometimes)
17:18:42	zahary	what about my first rule instead snoopResult? the benefit is that intermediate variables in x.reverse.toUpper are eliminated this way
17:19:30	Araq	sorry, what is your first rule again?
17:19:40	zahary	proc (x: var T, …) is interchangeable to proc(x: T, …): T in certain situations
17:19:55	apriori\|	ohm.. how would I overload the [] operator so that it could be used for any operation (like +=, -= etc.) on the scalar type? (matrix scenario)
17:20:10	Araq	proc []: var T
17:20:38	fowl	how about making them return var strings, then variants that take var strings
17:21:04	Araq	but I will implement a[i, j] to mean a[j][i] if 'a' is an array, so that you don't have to overload any [] at all
17:21:37	apriori\|	Araq: I thought about your wrapping of the array as object
17:21:40	zahary	there is 2 definitions of toUpper supplied by me
17:21:41	zahary	proc toUpper(x: string): string
17:21:41	zahary	proc toUpper(x: var string)
17:21:41	zahary	the compiler notices that x.trim.toUpper.reverse matches the first overload, but it will nevertheless use the second one, because it can determine that x.trim produces a temporary value that can be mutated in place
17:22:09	apriori\|	it makes sense.. because I could then "annotate" the matrix, e.g. as being trilinear, etc... and allow procs to use special algorithms then
17:22:15	Araq	well, should a[x,y] mean a[x][y] or a[y][x]?
17:23:25	Araq	zahary: 2 definitions of toUpper is exactly the problem that I want to avoid
17:23:53	zahary	well, my first thesis was that it's possible to derive the second definition from the first one
17:23:55	Araq	your rule is easily done with a TR macro, no need to implement it in the compiler
17:24:14	Araq	but I want to get rid of the 2 definitions
17:24:33	Araq	because it means a strutils of twice the size it is today
17:25:17	Araq	and I doubt it's possible to derive the second implement from the first one
17:25:22	Araq	*implementatino
17:25:28	Araq	*implementation
17:25:34	apriori\|	Araq: [x][y] for column major
17:26:55	Araq	ok
17:27:06	Araq	that's easier to remember anyway
17:27:10	apriori\|	indeed
17:28:27	zahary	it certainly is, only not necessarily in efficient way
17:28:27	zahary	proc (x: var T) =
17:28:27	zahary	input = x.copy
17:28:27	zahary	… replace any occurrence of x with input in the original body
17:29:34	Araq	interesting
17:30:04	zahary	but that misses the point of reusing existing resources (which is hard problem I agree)
17:32:05	zahary	my point is that snoop doesn't buy that much in these chained calls - you still have temporary objects and reallocations?
17:32:08	Araq	apriori\|: there is a codegen bug returning a 'var array', watch out
17:32:22	apriori\|	I dont even get to that now
17:32:31	apriori\|	got some issues with [] for component access already
17:33:05	apriori\|	[]= catches the assignment just fine.. but I dont get other stuff to work (and actually, only a proc that returns the ref should do, but that's not working)
17:33:20	apriori\|	and I guess.. because [] is not allowed as a normal proc name? just a guess
17:33:43	Araq	dunno -.-
17:35:08	apriori\|	yup, [] not allowed as normal non-op proc
17:38:44	apriori\|	oh fuck
17:39:24	fowl	proc foo[]() = is foo() with no generics
17:39:41	apriori\|	yeah
17:42:56	apriori\|	hm
17:43:02	apriori\|	this _might_ be a bug too
17:43:12	apriori\|	in a funcion, with its return type as value type
17:43:20	apriori\|	shouldnt result be actually "var T"?
17:43:26	apriori\|	in the context of the function
17:43:49	Araq	what makes you think it isn't?
17:44:05	apriori\|	wait a esc
17:44:06	apriori\|	sc
17:44:09	apriori\|	grr... sec
17:45:06	apriori\|	Araq: http://pastebin.com/jGGWiKb6
17:46:06	Araq	why do you prove the ': T' version?
17:46:18	Araq	and the '[]=' version?
17:46:23	Araq	*provide
17:46:28	apriori\|	[]= is needed for the assigment
17:46:35	apriori\|	result[i, j] = 0
17:46:36	Araq	that's the bug then
17:47:24	apriori\|	and []: T is needed for a[i, k] in the line and b[k, j], because those are no var variables
17:48:02	Araq	hm
17:48:55	Araq	I see, we need overloading of 'var T' and 'T' I think
17:49:06	apriori\|	yep
17:49:24	Araq	C++ is actually much better designed than most people realize
17:49:37	apriori\|	yeah, I figured :P
17:49:45	apriori\|	because of ref etc...
17:50:09	apriori\|	brb.. and thank you guys :)
17:57:06	Araq	zahary: I'm still not worried about elimination of temporaries, it'll be done once I implemented the optimizer I have in mind
17:57:58	Araq	especially call chains are simple to optimize
17:58:24	zahary	how will it work?
17:59:16	Araq	GCSE with the effect analysis that I outlined days ago
17:59:41	Araq	plus copy elimination
18:00:01	Araq	temporaries can be dealt with much like 'let' variables
18:00:01	zahary	can you provide some example transformations?
18:00:11	*	Trixar_za is now known as Trix[a]r_za
18:00:26	Araq	essentially 'let' gives us an SSA representation
18:00:33	zahary	this is not about eliminating chained assignments
18:00:49	Araq	the lack of 'goto' also helps ;-)
18:01:47	Araq	zahary: but it is the same problem as register allocation, only simpler
18:01:56	Araq	as the number of registers is unbound
18:02:16	Araq	you simply merge tempories whose lifetime does not overlap
18:02:20	Araq	*temporaries
18:02:33	zahary	it's about mutating values in place. I don't see how the low level SSA optimisations can help here as the optimization makes use of high-level knowledge about he string type .
18:02:33	zahary	describe how x.trim.reverse will work?
18:03:02	Araq	well if we have snooping procs:
18:03:20	Araq	trim(x, tmp1)
18:03:28	Araq	reverse(tmp1, tmp2)
18:03:53	Araq	dest = tmp2
18:04:25	Araq	-->
18:04:42	Araq	trim(x, tmp1); reverse(tmp1, dest)
18:05:13	Araq	and then ... ok, this needs more thoughts ;-)
18:05:56	Araq	damn you :P
18:06:19	zahary	:)
18:07:16	zahary	btw check out this. I was a little bit blown away / humbled down by the description of what kind of optimisation is carried away by gcc
18:07:17	zahary	http://stackoverflow.com/questions/10250419/why-does-gcc-generate-such-radically-different-assembly-for-nearly-the-same-c-co/10251097#10251097
18:13:23	Araq	and yet I never saw a compiler perform anything like that with string ops ;-)
18:14:20	zahary	well, that would be because it requires high-level knowledge of the type as I said :)
18:14:46	Araq	which is why our optimizer should be concerned only with high level optimizations
18:15:17	Araq	I can't see anything expensive in fast_trunc_one btw, I wouldn't worry about it at all
18:15:33	Araq	it doesn't access memory
18:15:48	Araq	it's irrelevant ;-)
18:18:20	zahary	what amazed me is how the compiler figured out from sign = i & 0x80000000; that sign = -sign and then used that to make another non-trivial transformation
18:19:34	Araq	I saw the microsoft compiler transform a nontrivial recursion that implements integer add into the 'add' asm instruction :-)
18:19:59	Araq	there was still an unnecessary 'jmp' involved iirc
18:20:09	Araq	but the recursion had been completely eliminated
18:20:16	Araq	and no, it was not a tail call
18:21:21	zahary	so we should leave the SSA stuff to these guys for a while :)
18:21:43	Araq	well I don't plan to transform anything to SSA
18:22:02	Araq	but we get it for free: let vars are in SSA
18:22:18	Araq	and every compiler introduced temporary is in SSA too
18:22:36	zahary	yes, and the C compiler will happily take care of them
18:23:04	zahary	we should strive to eliminate high-level code (calls to user copy procs, etc)
18:23:45	Araq	true but I still think copy elimination for ints and strings is essentially the same
18:24:03	Araq	except that you get it for 'int' for free in the C backend
18:24:06	zahary	sure
18:28:21	Araq	the new profiler hooks work exactly like a "stop the world" mechanism could work
18:28:40	Araq	every proc entry and every loop body is injected with a call to 'nimProfile()'
18:29:58	Araq	but I have no idea how expensive that is as the profiler requires stack traces to be turned on
18:30:07	Araq	which are expensive
18:33:21	zahary	yeah, I snooped the diff :) btw, it may be a worthwhile to support some existing report format
18:33:21	zahary	http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html
18:33:21	zahary	there are some visualization scripts with packages like these
18:33:49	zahary	maybe dom96 could be interested in such feature?
18:35:42	Araq	he's away ;-)
18:36:28	Araq	but we should do important work instead, I already played too long with it
18:38:12	Araq	btw do you happen to know how fixed length software caches work?
18:39:48	Araq	I mean a count table which keeps the entries with the greaters count values
18:39:55	Araq	*greatest
18:40:00	zahary	most used?
18:40:04	Araq	yeah
18:40:24	Araq	but it should only use a fixed amount of memory
18:40:47	zahary	I've worked on something similar for textures - let me try to recall what was it all about
18:41:02	Araq	it can only work stochastically
18:41:12	Araq	I think
18:41:31	zahary	well, you could go with priority queue, but that wasn't the solution then
18:46:08	Araq	a fixed length priority queue does not work as we don't know the priority upfront
18:46:36	Araq	the priority is the usage counter which we don't know when profiling
18:49:15	zahary	don't quite follow. priority queue will have log(n) updates on each usage.
18:50:19	Araq	yes but it's not of a fixed length
18:50:31	Araq	if we start with [1, 1, 1]
18:51:18	Araq	well ok, let me elaborate: we have stack traces
18:51:53	Araq	and we count them
18:52:06	Araq	the stack trace with the highest count is the critical path
18:52:41	Araq	so if we have 3 slots in our count table, we can keep track of 3 different stack traces
18:52:58	Araq	but now a forth stack trace is encountered
18:53:12	Araq	and we have to replace 1 slot somehow
18:53:13	zahary	I looked at the old texture cache code and it's not overly sophisticated - it doesn't remove one item at a time so purges are linear scans over the whole cache with some value chosen as threshold for removal in heuristic way (there is a global counter of texture usages and you can divide the delta usages from the last purge to get the expected average usage)
18:53:45	*	shevy left #nimrod ("I'll be back ... maybe")
18:54:12	zahary	otherwise, these caches are called LFU (least frequently used) - here is a paper that provides O(1) for all operations
18:54:13	zahary	http://dhruvbird.com/lfu.pdf
18:54:14	Araq	but whatever slot we pick, it could be that that's the stack trace that ends up getting the greatest count value
18:54:29	zahary	uses hash table and linked lists
18:55:07	Araq	I'm not concerned about O(1), I'm concerned about a replacement strategy that guarantees say, "won't be wrong with p probability"
18:55:50	Araq	ok, so I want a fixed size LFU ;-)
18:56:17	zahary	you can have a max size before starting to purge with the above algorithm
18:56:27	zahary	seems simple to implement
18:56:27	Araq	which can only be done stochastically
18:57:47	zahary	the algorithm is not stochastic in nature, but the results still won't be precise (some stack can be purged and placed back into the cache multiple times)
18:58:59	zahary	I've supervised a memory profiler implementation that used stack traces are keys btw - let me try to recall how that worked too :)
18:59:15	zahary	* as keys *
19:01:34	Araq	currently I use a hash table that tracks the max chain length
19:02:29	Araq	if the table is full, one element on the hash chain is replaced
19:02:40	Araq	the element with the minimal count
19:03:21	Araq	the max chain length is used so that it doesn't compute the minimum over all slots
19:04:12	zahary	how do you get the element with the minimal count?
19:04:53	Araq	as I said, I do: for i in 0..maxChainLength-1: # follow the chain
19:05:19	Araq	when finding the element in the chain, it's not a new element and we can update its counter
19:05:41	Araq	but we also keep the index of the minimum while traversing the chain
19:05:52	zahary	the algorithm from the paper just adds a linked list next to the table that is updated with insertion sort - since the updates always change the usage count by 1, you know that the insertion sort with have O(1) cost
19:06:43	Araq	the problem is: every new element starts with counter 1
19:07:12	Araq	but it still needs to be added because it could become the critical stack trace
19:07:39	zahary	and after some time, the newcomers are always the prime candidates for purging?
19:08:17	Araq	no ;-)
19:08:28	Araq	I keep no "age"
19:08:39	Araq	once added it's treated equally to everything else
19:08:47	Araq	hrm, that sounds stupid
19:08:54	Araq	:D
19:09:26	Araq	oh well the table is big enough anyway
19:09:43	Araq	the code dealing with purging is never executed in practice
19:09:55	zahary	so what's the problem again?
19:10:04	zahary	newcomers start with 1, so?
19:10:07	Araq	I'm interested in what's out there
19:10:39	Araq	what I've implemented works good enough already
19:10:40	zahary	you can start newcomers as the current minimum so they don't get unfair disadvantage
19:11:27	Araq	that could work, yes
19:11:33	zahary	otherwise, O(1) sounds like you can't get better than this
19:12:39	Araq	as I said I'm interested in some numbers about how probable it is that the critical stack trace will get purged incidentically
19:14:02	zahary	aha
19:15:22	zahary	btw, most profilers I've worked with doesn't seem to purge anything (I'm not entirely sure tho) - you see functions that were called only once in the report
19:15:39	Araq	yeah I'm aware ;-)
19:16:07	Araq	but it seemed risky to let it allocate
19:18:31	Araq	now it allocates anyway, but I can easily change that
19:21:51	zahary	btw I've seen another inaccurate strategy used in some intrusive profilers
19:23:26	zahary	you don't insert the real stack traces in a cache, but rather assign a single record for each proc (this record resides in a static variable, so it's very cheap to update)
19:23:35	*	XAMPP joined #nimrod
19:24:05	zahary	so for functions with low fan-in it works relatively accurate, but for high fan-in you get results that are not very useful
19:24:45	Araq	I can't follow
19:24:58	Araq	there is a static variable per proc, ok
19:25:12	Araq	what does the variable contain? a counter?
19:25:26	zahary	and you just measure the time spent in that proc and its children
19:25:44	Araq	that's exactly what my old profiler did :-)
19:25:55	zahary	aha, ok
19:26:13	Araq	but it was as useful as gprof
19:26:17	zahary	in practice, when you get to pick your own functions to profile the inaccuracies potentially doesn't matter that much
19:26:49	zahary	hmm, I disagree in the comparision with gprof
19:27:04	Araq	so why reinvent gprof? the name mangling is not bad enough to justify it
19:28:03	zahary	you still get meaningful nested report like this:
19:28:03	zahary	readNetwork 10%
19:28:03	zahary	updateWorld 20%
19:28:03	zahary	simulatePhysics 14%
19:28:04	zahary	runAI 5%
19:28:04	zahary	runMobs 3%
19:28:05	zahary	renderWorld 40%
19:28:26	zahary	gprof only show leafs
19:28:41	Araq	no, it also shows total time
19:29:01	Araq	I think ...
19:29:05	zahary	but not for function that's up the stack
19:29:17	zahary	it just looks at the program counter to see in which function you are currently in
19:32:01	Araq	http://www.linuxselfhelp.com/gnu/gprof/html_chapter/gprof_5.html
19:32:17	Araq	"the call graph"
19:33:30	zahary	the disclaimer is that I haven't actually used it, but that's how people explain it usually:
19:33:30	zahary	http://stackoverflow.com/questions/6328673/which-is-the-most-reliable-profiling-tool-gprof-or-kachegrind
19:33:30	zahary	I've experience only with sampling profilers
19:38:21	Araq	ok
19:38:36	Araq	but gprof is also a "sampling" profiler, isn't it?
19:39:45	Araq	btw profiling shows the copyTree calls for semchecking of calls is really expensive
19:40:31	Araq	but we can get rid of them once I rewrote sigmatch with all the changes we have in mind
19:41:54	zahary	you mean the copyTree in DirectOp?
19:42:41	Araq	there is also a copyTree for each arg in sigmatch
19:44:05	zahary	when I tested it in 4 different profilers I got consistent results, but the copyTree in DirectOp at least was very cheap
19:44:19	apriori\|	Araq: can you give me a simple example for a tr macro which wants to replace a call to a generic function (not a specific one) with something?
19:45:37	Araq	the copyTree produces much GC pressure I think
19:46:52	zahary	or the profiler overestimates recursive procs
19:46:58	Araq	apriori\|: template t{f(X)}(f: expr, X: varargs[expr]): expr = g(X)
19:47:27	Araq	no because the stack trace did not show copyTree as the leaf
19:47:35	zahary	… small recursive procs - the issue with intrusive profilers I've brought before
19:47:38	Araq	it showed it as the reason :P
19:47:46	apriori\|	Araq: I meant a function with a generic parameter
19:48:02	apriori\|	Araq: and a named function.. not all possible calls
19:48:08	Araq	apriori\|: it's eliminated
19:48:15	apriori\|	hm?
19:48:35	apriori\|	so I can only match on specific calls to a generic function or what do you mean?
19:48:47	Araq	template f{nameHere(X)}(X: varargs[expr]): expr = g(X)
19:49:08	Araq	nameHere[int, int](a, b, c) # should be replaced with:
19:49:14	Araq	g(a, b, c)
19:49:33	apriori\|	hm, okay, thank you
19:52:19	Araq	btw even easier and should also work: template t{nameHere}: expr = g
19:53:02	apriori\|	well, let me be more precise:
19:53:31	apriori\|	template optimizeCrossOfSameVector{cross(a, b)}(a: expr{alias}, b: expr) = [ 0, 0, 0 ]
19:53:37	apriori\|	the problem here, is:
19:54:13	apriori\|	I need to know the actual type of an element in a or b so I can properly construct that array literal
19:54:42	Araq	a aliases b?
19:54:50	apriori\|	or vise versa
19:57:36	Araq	proc default[T](): T {.inline.} = nil
19:57:59	Araq	template ... = default[type(a)]()
19:58:48	Araq	you can also construct the [] in a macro but why bother
20:01:47	fowl	if you have a generic ref, how do you set the free proc
20:01:57	fowl	this is what i tried https://gist.github.com/3734130
20:02:39	Araq	new(result, free2[A])
20:03:38	fowl	ah
20:05:59	apriori\|	Araq: btw., are tr macros automatically active?
20:06:13	Araq	yeah
20:06:25	apriori\|	hm
20:11:52	apriori\|	Araq: what is the scope of tr macros?
20:12:20	Araq	everthing in the same module after declaration
20:12:30	apriori\|	no way to export it?
20:12:34	Araq	they don't respect scope
20:12:40	Araq	export with *
20:13:01	apriori\|	grrr...
20:13:15	apriori\|	actually obvious o_O
20:21:08	apriori\|	Araq: alias check doesn't seem to work (or works different than I think):
20:21:15	apriori\|	template optimizeCrossOfSameVector*{cross(a, b)}(a: expr{alias}, b: expr) : expr =
20:21:16	apriori\|	[ 0.0, 0.0, 0.0 ]
20:21:34	apriori\|	this should actually only replace, if a is the same array as b
20:21:51	apriori\|	I now went for the simpler pattern cross(a, a)(a: expr)
20:26:32	Araq	strange
20:27:15	apriori\|	I guess tomorrow is bug filing day ;)
20:27:28	Araq	yeah
20:30:27	zahary	Araq, how much of the total time is spent in copyTree according to the built-in profiler?
20:30:52	Araq	I don't know, I'm improving it and will measure again soon
20:31:16	zahary	I did some profiles with Instruments on mac os and Very Sleepy on windows so I can compare
20:31:46	Araq	well I can't do "total time spent in"
20:32:05	Araq	I can only do "frequently encountered stack traces"
20:33:42	*	moxie quit (Remote host closed the connection)
20:34:47	zahary	is there percent of the samples for which it was in the trace?
20:35:08	Araq	yeah, just wait a sec ;-)
20:41:38	Araq	copyTree 36/1992 = 1.8%
20:41:52	Araq	ok, can't reproduce now that I improved the profiler
20:42:51	Araq	well it's still significant, but not the worst offender,
20:42:55	Araq	the stack trace is:
20:42:58	Araq	Entry: 9/770 Calls: 4/879 = 0.46% [sum: 77; 77/879 = 8.8%]
20:43:00	Araq	add 11/1992 = 0.55%
20:43:02	Araq	addZCT 4/1992 = 0.20%
20:43:04	Araq	doOperation 5/1992 = 0.25%
20:43:06	Araq	nimGCvisit 5/1992 = 0.25%
20:43:07	Araq	forAllChildren 5/1992 = 0.25%
20:43:09	Araq	CollectZCT 8/1992 = 0.40%
20:43:10	Araq	collectCTBody 17/1992 = 0.85%
20:43:12	Araq	collectCT 17/1992 = 0.85%
20:43:13	Araq	rawNewObj 14/1992 = 0.70%
20:43:15	Araq	newObj 16/1992 = 0.80%
20:43:16	Araq	newNode 8/1992 = 0.40%
20:43:18	Araq	copyTree 36/1992 = 1.8%
20:43:19	Araq	matchesAux 24/1992 = 1.2%
20:43:21	Araq	matches 23/1992 = 1.2%
20:43:22	Araq	resolveOverloads 35/1992 = 1.8%
20:43:24	Araq	semOverloadedCall 44/1992 = 2.2%
20:43:25	Araq	... 96/1992 = 4.8%
20:43:27	Araq	CommandCompileToC 101/1992 = 5.1%
20:43:29	Araq	MainCommand 101/1992 = 5.1%
20:43:30	Araq	HandleCmdLine 101/1992 = 5.1%
20:44:14	Araq	the CRC stuff wins again
20:44:31	Araq	Entry: 0/770 Calls: 21/879 = 2.4% [sum: 21; 21/879 = 2.4%]
20:44:32	Araq	updateCrc32 4/1992 = 0.20%
20:44:34	Araq	newCrcFromRopeAux 3/1992 = 0.15%
20:44:36	Araq	crcFromRope 3/1992 = 0.15%
20:44:43	Araq	is the top entry
20:44:50	zahary	hmm, I'm getting much more actually, weird
20:45:40	zahary	I'm getting 12.5% now - indeed mostly because of GC functions after it
20:45:53	Araq	I'm using -d:release --stackTrace:on
20:46:24	Araq	well the copyTree is really bad in any case:
20:46:29	Araq	think about nested calls
20:46:39	Araq	we copy them again and again
20:46:54	Araq	we copy the whole call
20:47:04	Araq	and each argument again
20:47:09	zahary	my profile says the worst copyTrees are these in matchesAux
20:47:32	Araq	that's what I'm talking about, yeah
20:47:33	zahary	this makes sense as they are performed once per-candidate
20:48:06	Araq	Entry: 10/770 Calls: 4/879 = 0.46% [sum: 81; 81/879 = 9.2%]
20:48:08	Araq	genericAssignAux 45/1992 = 2.3%
20:48:10	Araq	genericAssignAux 45/1992 = 2.3%
20:48:11	Araq	genericAssignAux 45/1992 = 2.3%
20:48:13	Araq	genericAssignAux 45/1992 = 2.3%
20:48:15	Araq	genericAssignAux 45/1992 = 2.3%
20:48:16	Araq	genericAssignAux 45/1992 = 2.3%
20:48:18	Araq	genericAssignAux 45/1992 = 2.3%
20:48:19	Araq	genericAssignAux 45/1992 = 2.3%
20:48:21	Araq	genericAssignAux 45/1992 = 2.3%
20:48:22	Araq	genericAssign 6/1992 = 0.30%
20:48:24	Araq	resolveOverloads 35/1992 = 1.8%
20:48:25	Araq	this is also pretty bad
20:48:35	Araq	I'm not even sure where the call to genericAssign comes from
20:48:56	zahary	1.8% total for genericAssignAux here
20:49:39	Araq	well I take it that means my profiler works
20:50:27	Araq	but it's not suprising really, calls and nkDotExpr are everywhere
20:50:36	Araq	so it's important to optimize these
20:51:18	Araq	it doesn't make sense to optimize semYield because there are so few 'yield' statements in Nimrod code ;-)
20:52:17	Araq	oh and I compute the percentages wrong :D
20:52:32	zahary	how much do you get for collectbody ?
20:53:15	Araq	collectCTBody 17/1992 = 0.85%
20:53:32	Araq	but I'm not sure how to compute it
20:53:44	Araq	the 1992 refers to the total stack slots
20:53:57	Araq	but it should be for unique stack slots, right?
20:54:18	zahary	1992 total slots in your cache table?
20:54:42	zahary	it should be real active slots
20:55:00	Araq	no 1992 slots in total
20:55:15	Araq	879 stack traces
20:55:53	zahary	so what's 1992? capacity (that's what I meant)
20:56:32	Araq	well it occurs in 12 stack traces out of 49
20:56:58	Araq	1992 is the length of all stack traces together I think
20:57:29	zahary	12/49 is much closer to what I measure
20:57:46	Araq	yeah, I should record that instead
20:57:52	zahary	I get 22.3
20:57:55	zahary	%
20:58:23	Araq	close enough :-)
20:58:48	zahary	you don't have functions like setjmp or memcpy on your radar, right?
20:59:04	zahary	no way to get to them in iterruption point
20:59:20	Araq	that doesn't matter, does it?
20:59:35	Araq	it will be attributed to the proc calling these
21:00:17	zahary	yes
21:19:31	Araq	it's now collectCTBody 67/787 = 8.5%
21:19:55	Araq	it occurs in 67 out of 787 stack traces
21:20:10	Araq	however, that includes stack traces which occur only once
21:20:20	Araq	dunno if that's the correct way to count it
21:20:38	Araq	a stack trace that occurs only once is random noise ...
21:21:16	Araq	however only 44/787 occur more than once ...
21:22:29	zahary	why random noise - stack trace that occurs once in just half as good as stack trace that occurs twice
21:23:14	Araq	hm
21:23:50	Araq	I am thinking about init time
21:24:06	Araq	oh, so it's once in processCmdLineArgs() ...
21:26:12	Araq	nah, it's wrong
21:26:28	Araq	if it occurs in a stack trace that occured 3x
21:26:35	Araq	it should have weight 3
21:28:55	zahary	if there is 100 traces collected at equal intervals of time and proc X is present on 16 of them, then it's likely that 16% of the time of program can be eliminated by deleting the proc X
21:29:38	Araq	now that's a useful statement
21:30:18	Araq	so weight 3 for a stack trace that occured 3x is correct, right?
21:31:15	zahary	well, what do you do with that weight?
21:31:24	Araq	sum over it
21:31:34	zahary	ok, then it's right
21:31:55	Araq	I get: collectCTBody 77/789 = 9.8% then
21:32:25	Araq	nah arghhh
21:32:35	Araq	it's too late already
21:33:05	Araq	it should be 900 cause that's the number of samplings
21:33:46	Araq	making it 8.5%
21:33:59	Araq	that's not in line with your results, is it?
21:38:35	zahary	I'm getting 22% on both systems
21:39:14	zahary	what happens when you change the sampling frequency?
21:41:39	Araq	well it's a bit weird
21:41:46	Araq	the frequency is 5ms
21:42:06	Araq	and the program runs for 5.1 secs
21:42:17	Araq	which means I should 1000 samples
21:42:24	Araq	but I get only 899 samples
21:42:50	Araq	which is suspicious
21:44:02	Araq	I guess sampling itself is that expensive
21:46:22	Araq	3332 samples for 1ms frequency :-/
21:47:07	Araq	collectCTBody 278/3332 = 8.3% now
21:47:36	Araq	semExpr 1475/3332 = 44.% btw
21:50:21	zahary	what's 60% and more?
21:50:29	zahary	btw, there is big difference between clean build and rebuild
21:50:50	zahary	I usually benchmark the rebuild so the C compiler will be out of picture
21:50:57	Araq	so do I
21:51:12	Araq	koch boot -d:release --profiler:on --stackTrace:on
21:51:16	zahary	well, semExpr is 80+% for me
21:52:24	Araq	CommandCompileToC 3332/3332 = 1.0e+02%
21:52:29	Araq	aka 100%
21:53:52	Araq	well I know why it's only 44% for semExpr
21:54:03	Araq	I don't record whole stack traces
21:54:16	Araq	only 20 entries are in a stack trace
21:54:30	zahary	aha, I was about to ask this on the first result for collect
21:54:44	Araq	so if semExpr is in the "..." section, it is not recorded
21:55:12	Araq	last 3 entries are always the stack bottom
21:55:22	Araq	so it's accurate for CommandCompileToC
21:59:43	Araq	lookupInRecordAndBuildCheck 136/3332 = 4.1%
22:02:33	zahary	7% - it's consistently half the time
22:03:02	Araq	same as semexpr applies for it I guess
22:03:21	Araq	newCrcFromRopeAux 184/3332 = 5.5%
22:03:40	zahary	6.5%
22:03:50	Araq	crcFromFile 57/3332 = 1.7%
22:04:12	zahary	0.8%
22:04:29	zahary	but I have SSD disk
22:04:45	Araq	should be in RAM anyway
22:05:00	Araq	but maybe linux's caching isn't as good as I think it is
22:05:26	Araq	copyTree 387/3332 = 12.%
22:05:41	zahary	yep, me too
22:06:10	zahary	rawalloc is leaf for me 84% of the time
22:06:15	zahary	should be shallow for you too
22:06:31	Araq	indeed it is
22:06:46	zahary	10%
22:07:07	Araq	newObj 423/3332 = 13.%
22:07:47	Araq	newObjRC1 89/3332 = 2.7%
22:07:48	Araq	newSeqRC1 114/3332 = 3.4%
22:08:03	zahary	newObj 23.3%, leaf only 14%
22:08:37	zahary	newSeq 6.8%
22:08:45	Araq	hu, how can this ever be a leaf?
22:09:06	zahary	6.8% total time
22:09:37	zahary	any function can be leaf if the profiler interrupted execution somewhere within the body
22:10:01	Araq	I see
22:10:43	Araq	so it's accurate with a factor of 2 :-)
22:10:46	zahary	try raising the depth limit to see whether these 2x discrepancies are indeed coming from it
22:11:07	Araq	good idea
22:12:34	Araq	semExpr 2041/3349 = 61.% with a depth limit of 35
22:12:58	Araq	so I guess it'll become your number if I'd increase it further
22:13:24	Araq	newObj 427/3349 = 13.%
22:13:40	Araq	but it's always near top anyway
22:13:58	Araq	copyTree 373/3349 = 11.% same
22:16:47	zahary	how much do you get for rawalloc?
22:17:01	Araq	can't measure it
22:17:20	zahary	it has profiling off?
22:17:28	zahary	it's a nimrod proc, no?
22:17:38	Araq	yeah but I turned it off
22:19:19	Araq	semExpr 2041/3349 = 61.%
22:19:20	Araq	semExprWithType 1075/3349 = 32.%
22:19:33	Araq	this is interesting, I would have guessed they are the same
22:19:33	*	Trix[a]r_za is now known as Trixar_za
22:21:18	zahary	I get the same
22:22:19	zahary	if I break down semExpr, I can see that evalImport is called from it, which in turn carries all the passes on the module (sem, cgen, etc)
22:22:33	zahary	so that's how it takes all the time
22:22:44	Araq	ah I see
22:23:10	Araq	newObj 427/3349 = 13.%
22:23:12	Araq	ropef 249/3349 = 7.4%
22:23:14	Araq	genTraverseProc 28/3349 = 0.84%
22:23:48	zahary	ropef 8.9%
22:23:49	zahary	genTraverse 0.2%
22:24:24	zahary	newObj 23% - this is that last mystery
22:25:13	Araq	is it? maybe it's less severe when everything runs slower due to the stack tracing
22:26:20	Araq	scanComment 15/3349 = 0.45%
22:26:53	zahary	scanComment 0.2%
22:27:23	*	Trixar_za is now known as Trix[a]r_za
22:28:45	*	Trix[a]r_za is now known as Trixar_za
22:32:05	Araq	but I have to sleep now, good night
22:32:31	Trixar_za	Goodnite Araq
22:53:32	*	zahary quit (Quit: Leaving.)
22:54:53	*	q66 quit (Quit: Quit)
23:00:13	*	apriori\| quit (Remote host closed the connection)