<<24-07-2012>>

01:41:52*q66 quit (Quit: Leaving..)
02:07:02*JStoker quit (Ping timeout: 244 seconds)
02:07:19*mal`` quit (Ping timeout: 264 seconds)
02:07:36*mal`` joined #nimrod
02:20:58*JStoker joined #nimrod
04:55:49*Boscop is now known as Boscop_
07:33:49*Araq_ joined #nimrod
07:34:14*Araq_ quit (Client Quit)
08:33:54*XAMPP quit (Ping timeout: 260 seconds)
10:49:05*q66 joined #nimrod
11:12:58*zahary left #nimrod (#nimrod)
11:13:28*zahary joined #nimrod
11:14:58*zahary quit (Quit: Leaving.)
11:46:31*zahary joined #nimrod
12:35:11*Trix[a]r_za is now known as Trixar_za
13:25:28*Trixar_za is now known as Trix[a]r_za
13:47:30*zahary quit (Ping timeout: 264 seconds)
13:55:08*zahary joined #nimrod
14:15:29*zahary quit (Quit: Leaving.)
14:16:47*XAMPP joined #nimrod
14:21:42*XAMPP quit (Ping timeout: 264 seconds)
14:23:55*zahary joined #nimrod
14:50:25*XAMPP joined #nimrod
14:50:25*XAMPP quit (Changing host)
14:50:25*XAMPP joined #nimrod
15:02:21*Bosc0p joined #nimrod
15:04:56*Bo5cop joined #nimrod
15:05:00*Boscop_ quit (Ping timeout: 255 seconds)
15:05:14*zahary quit (Quit: Leaving.)
15:08:09*Bosc0p quit (Ping timeout: 255 seconds)
15:12:34*zahary joined #nimrod
15:37:19*zahary quit (Quit: Leaving.)
15:46:40*zahary joined #nimrod
16:38:22*zahary quit (Quit: Leaving.)
16:55:13*zahary joined #nimrod
16:55:28Araqhi zahary
16:56:23zaharyhi
16:56:48Araqthinking about coroutines
16:57:04Araqseems quite some work
16:57:22Araqunless I implement stack copying
16:59:11zaharyyou have plenty of time to implement coroutine library in nimrod when everything else is ready - just pick some existing library
16:59:37Araqas I said, i need to understand how it works for GC support
17:00:28*Bo5cop quit (*.net *.split)
17:00:28*q66 quit (*.net *.split)
17:00:29*jyyou quit (*.net *.split)
17:00:41Araqluacoco uses stack switching and some C++ lib whose name I forgot uses stack copying
17:00:53zaharywell, how complicated could it be? in the worst-case you'll have to modify the source of the library to tell you the current stack bounds
17:01:21zaharydoesn't everyone try to use fibers on windows?
17:01:43Araqwell it's interesting, so let me continue :P
17:02:37*Bosc0p joined #nimrod
17:02:37*Bo5cop joined #nimrod
17:02:37*q66 joined #nimrod
17:02:37*jyyou joined #nimrod
17:02:57*Bo5cop quit (Ping timeout: 247 seconds)
17:03:06Araqfor stack switching you need to reserve an address range and map the last page to some invalid page
17:03:16Araqotherwise stack overflows wouldn't lead to crashes
17:03:22zaharythis version of the coroutines won't be using the existing yield keyword, right?
17:03:29Araqright
17:04:02Araqso you reserve 32K .. 2MB for a coroutine
17:04:27Araqor you register a segfault handler to implement stack growing
17:04:43Araqwhich is even more work
17:04:53Araqand will suck when debugging
17:05:28zaharyand the duff's device approach to iterators is still on the table?
17:05:28zaharyis the a library that is implementing all of this?
17:06:06Araqthe duff's device approach is in addition to that; I'm thinking about proper full coroutines
17:06:37AraqI'll think about how to merge that with duff's device and the 'for' loop later
17:07:16zaharythey just have to share a common "interface"
17:07:26Araqyeah for instance
17:07:46Araqwell the duff's device approach is boring, I think I got it completely :P
17:08:01Araqso I'm thinking about full coroutines/greenthreads now
17:08:46zaharyyou probably have read all of this, but here are some notes from a C++ library that's supposed to become part of boost
17:08:46zaharyhttp://www.crystalclearsoftware.com/soc/coroutine/coroutine/implementation.html
17:10:35zaharyalthough, he is using the less efficient, but more official makecontext/swapcontext API
17:10:36zaharyhttp://www.crystalclearsoftware.com/soc/coroutine/coroutine/linuxasm.html
17:11:20Araqwindows fibers use the fixed amount of stack space approach
17:12:20Araqthanks for the links, haven't read them yet
17:12:41Araqstudied some source code instead
17:13:01Araqthere are two approaches:
17:13:07Araqa) stack copying
17:13:18Araqb) new stack with O(1) switching between them
17:13:38Araqa) can use much less memory/address space
17:14:10Araqand can be done portably without assembly
17:14:15zaharywell, I guess the OS will take care of committing only the used portion of the stack
17:14:31zaharybut reserving the address space, yeah
17:14:37Araqyes if you tell the OS to do so
17:14:52dom96hello
17:14:56Araqbut it may fragment the address space
17:14:58Araqhi dom96
17:15:11zaharyI don't know why everyone wants to use fibers - as I complained before it's quite inconvenient to use the ConvertThreadToFiber function
17:15:29AraqI don't plan to use fibers
17:15:37zaharythere is actually a new little-known API introduced in windows 7 that doesn't have this problem
17:15:42zaharylet me try to dig it up again
17:15:47Araqok
17:20:20AraqI think stack copying can be done this way:
17:20:39Araqold stack --> coroutine invokation --> afterwards:
17:20:54Araqold stack | coroutine's stack
17:21:05Araqcoroutine yields:
17:21:18Araqthe added part is popped off
17:21:24Araqback to:
17:21:26Araqold stack |
17:22:16Araqthis means the copying is O(coroutine's stack size)
17:22:22Araqand not O(stack size)
17:22:25zaharywell, ok, but now you have to move the coroutine stack somewhere otherwise it may get overwritten
17:22:39zaharyah, ok
17:22:51Araqyeah but only the (assumed to be very small) coroutine's stack must be copied around
17:22:53Araqand!
17:23:07Araqwe need to traverse this stack for the GC support anyway
17:23:32Araqthe GC would incRef/decRef potential roots on the coroutine's stack
17:23:49Araqand then the GC does not even need to keep a list of all coroutines
17:24:03Araqas it doesn't trace their stacks
17:24:56zaharyincRef/decRef as part of activation/deactivation?
17:25:05Araqyes
17:25:12Araqbut the stack switching approach allows for a very similar GC optimization
17:25:32Araqa coroutine would then a 'scanned' bool
17:25:46Araqthat only needs to be set to 'false' if the coroutine is resumed
17:26:16Araqas a coroutine can't modify its stack if it hasn't been run since the last GC run
17:26:22zaharywhat's nice is that coroutines as very likely to yield from a consistent stack depths
17:26:33zaharyso expanding the stack after the first copy will be rare I think
17:27:08Araqbut every context switch is done with copying with approach a) ...
17:27:12zaharyit's interesting if we can make the copying lazy too if we know that the coroutine stack won't be overwritten (but that sounds too fragile)
17:27:58zahary* are very likely *
17:28:28AraqI can't follow
17:28:40Araqthe stack looks like:
17:28:54Araqmain stack | coroutine's stack
17:29:11Araqyou need to append the coroutine's stack for a context switch
17:29:21AraqI can't see how you can do that lazily
17:29:48zaharywhen you yield from a coroutine, you immediately copy it's stack, right?
17:29:59Araqyes
17:30:34zaharywhat I'm saying is that you don't need to do that if you know that the stack top wont move past the coroutine user stack frame
17:31:17Araqyou can be sure that'll never happen
17:31:32zaharyfor items in coroutine.produce()
17:31:32zahary no calls here
17:31:47Araqseems unlikely
17:32:40zaharywith some forced inlining you may achieve something like this, but as I said - too fragile
17:32:50zaharyI don't know what the compiler would do
17:33:54Araqyeah plus *cough* Nimrod's write barrier ain't friendly
17:34:07Araqand can potentially call
17:34:37Araqit's of course fine if you don't use GC'ed memory in 'no calls here'
17:35:58Araqbtw can you take a look at bug #175
17:36:07AraqI don't feel like booting windows :-)
17:36:25Araqand the DLL stuff causes me headaches
17:36:51AraqI solved the issue by tinkering on linux as I figured linux doesn't really report errors for broken DLLs
17:36:59Araqinstead the library fails to load ...
17:38:57Araqback to topic:
17:39:21zaharyquite a wtf - ok, I'll try it a bit later - I'm currently booted in mac os
17:39:38AraqI guess it's a simple codegen issue
17:39:50AraqNimMain() invoked twice on windows or something
17:40:08zaharybut why at exit? that's the weird part
17:40:33zaharythat part could just be a mis-report
17:40:51Araqdon't think so, the guy knows what he's doing in general
17:43:32Araqbut back to topic:
17:43:49AraqO(1) context switching seems quite preferable, right?
17:44:03Araqthat requires the GC keeps a list of all coroutines
17:44:14Araqand that it knows their stack pointers
17:44:33Araqknowing the upper bounds of the stack's address space is not enough
17:44:47Araqas the memory page may not exist
17:44:56Araqand it would be too slow to scan the full range anyway
17:46:47Araqhowever this approach feels like implementing a full threading library ... except there is no concurrency ...
17:47:07zaharywell, it's easy to get the stack top when yield is called
17:47:15Araqtrue
17:48:04Araqin my mind there are 2 main use cases:
17:48:30Araq1) tree/graph traversal (potentially deep stacks)
17:48:49Araq2) gaming: mapping a unit to a coroutine
17:48:59zaharyalso, I think the GC will discover the stacks of the coroutines through the coroutines objects themselves which are in reachable memory
17:49:27zaharybut nimrod doesn't do full mark phase so this will not be true?
17:50:16Araqstacks are scanned conservatively
17:50:27Araqyou can't do anything else and have good C interop
17:50:48Araq(ok, full RC, but we talked about it ... lots of work and expensive ...)
17:50:49zaharyI meant the GC doesn't care to scan objects that are 2 lookups away from the stack
17:51:41Araqyes but how does that help?
17:52:04zaharyno, it invalidates my previous statement - I think the GC will discover the stacks of the coroutines through the coroutines objects themselves
17:52:18zaharyso yes, the GC must keep track of them in a list
17:53:23Araqthe stacks are under C's control and can contain roots so yeah
17:53:46Araqback to the use cases:
17:54:11Araq1) for graph iterators recursion is really nice and fast stack switching seems essential
17:54:21Araqbut I dunno for 2)
17:54:40AraqI'd expect the stacks in 2) are very small, a few words perhaps
17:54:48zaharya custom mark proc would be able to scan conservatively the coroutine stack if it was guaranteed the coroutine object itself would be reached in the GC scan
17:55:09Araqyeah
17:55:27Araqbut it's easy too to do:
17:55:35Araqfor c in allCoroutines:
17:55:43Araq scanStack(c)
17:55:50Araqin the GC
17:56:01Araqdoesn't really matter how we do that
17:56:18zaharypersonally I think recursion is much better handled with the 3rd implementation strategy I mentioned - transforming the consumer code into a closure that's passed to the recursive walker
17:56:48AraqI prefer an explicit stack in the iterator; ropes.nim does that for instance
17:57:01Araqit's not that hard and often allows for further optimizations
17:57:15zaharywhat do you mean by explicit stack?
17:57:31Araqa seq[T] in the iteratorr
17:58:51Araqso that leaves us with use case 2)
17:59:21zaharycase 2) is more real value for me too
17:59:25Araqit's nice to support millions of coroutines for 2)
17:59:45Araqand as I said, the stack is likely a few words only, right?
18:00:15zaharywell, can't be sure of such things, but it is likely
18:00:29zaharyalso, it's likely to yield from the same depth as I said before
18:00:48zaharyso if you leave a little room to grow in the first copy it's likely to be enough
18:01:33Araqhrm growing is no problem at all for the copying approach as you always know the required memory usage
18:01:57Araqwe 'reuse' the OS's way to do the stack growing
18:02:06zaharyis guard page the only way to implement growing?
18:02:36zaharywhat about some probing in the code after yield?
18:05:20zaharynah, I take this back - sounds complicated as the code that needs to grow the stack will probably be in some further unknown calls several frames below
18:06:09Araqprobing doesn't work unless you inject every ordinary function with that
18:06:24zaharyso, how can you grow the stack in less than a page granularity? it seems that keeping only a few bytes for the coroutine is not feasible with stack growing
18:06:43Araqthe stack looks like:
18:07:12Araqfunction frames (lots of), co-coroutine's frame, function frames
18:07:34Araqnow a function frame does a 'co.routine.yield'
18:07:55Araqyou have to copy from current stack position to the beginning of coroutine's frame
18:08:10Araqso you can allocate that much memory and be done with it
18:08:22zaharyok, then you resume and the coroutines performs some more calls
18:08:44zaharythe stack space of the coroutine ends and you need to grow it?
18:09:01Araqperhaps
18:09:10Araqit'll work like an ordinary 'seq'
18:09:26Araqno need to care about page boundaries etc.
18:10:28zaharyhow would it detect that the stack ended? we said a guard page would be used? but this means that we must have allocated at least one page to store the stack until it reaches the guard page
18:11:06Araqwith the copying approach you need no additional guard page
18:11:17Araqyou misuse the main thread's stack
18:11:29Araqand that already has the guard page
18:11:33zaharyah, you copy it back on the main stack.
18:11:34zaharyI get it
18:11:50zaharybut that makes switching even more expensive
18:11:58zaharyunless there a just a few bytes
18:12:03Araqyes
18:12:07zaharymaybe there could be some heuristics
18:12:13zaharyif the coroutine is small use copying
18:12:28Araqugh that is nice
18:12:35zaharyif it's big, you fast switching with page boundary stacks
18:12:38Araqbut it also means we have to implement both ways ...
18:13:06Araqand as I said the 'fast stack switching' approach is quite some work ...
18:13:36Araqand re-using some C library only helps so much as we need the GC interaction
18:14:21Araqthe copying approach seems really simple to implement
18:15:13Araqbtw I think in the C# paper you told me about
18:15:39Araqthe nested iterators are expensive because they don't support stack capturing
18:15:50zaharyI usually say that's it's more important to figure out what the best algorithms is now and just settle on interface that doesn't make it impossible to implement in the future
18:17:48zaharywell, they are expensive, because the duff's device approach really doesn't work all that good (there is no long jump, so if are several levels deep, you must first pop, pop, pop to the original consumer and then back enter, enter, enter the iteration)
18:18:26Araqyes
18:18:43zaharytheir nested approach removes such multiple pops and enters by storing a "current" iterator that may actually come from several levels deep
18:22:43Araqif we do the copying we can infact reserve X bytes as stack space directly in the coroutine
18:22:58Araqand need no memory allocation if we're lucky
18:23:37Araqthe API should deal with seqs of coroutines I think
18:24:01Araqso you get really good memory usage
18:24:20zaharyI don't follow. what do you mean by reserve stack space directly in the coroutine?
18:25:16Araqtcoroutine[T] = object
18:25:28Araq f: proc (): T {.closure.} # wrapped function
18:25:41Araq stack: array [0..40, int]
18:25:57Araq dynstack: pointer # != nil, if 'stack' doesn't suffice
18:26:36Araqas I said, you essentially know the needed memory
18:26:48zaharywell, it depends on what are you trying to optimize - the total amount of memory or the number of allocations
18:27:20zaharyI guess, the user knows best about the characteristics of his coroutine so maybe we can give him control over such buffer with a pragma
18:27:59zaharyyou don't know the needed memory statically? (not always at least). is that what you suggest
18:28:35AraqI suggest a common "small string" optimization
18:28:43Araqthat I learned from the STL
18:29:04zaharyyeah, I got that part - I meant you don't know the size of the coroutine at compile time
18:29:25Araqthe size of the stack, no
18:29:38Araqthe size of coroutine is fixed
18:29:41zaharyand I suggested that the user maybe interested in controlling the size of his "small stack" buffer
18:29:51Araqyeah got it
18:29:58Araqmaybe worthwhile, yes
18:36:57Araqabout the state machine iterators:
18:37:05Araq for i in foo(): nil
18:37:12Araqshould be transformed into:
18:37:27Araq while true:
18:37:29Araq let i = foo(cl)
18:37:31Araq if cl.state == -1: break
18:37:46Araqwe already need the closure and the 'state' enum in it
18:38:16Araqno need to use a hidden tuple[hasFinished: bool, i] return type
18:38:38Araqevery local of 'foo' is put into the closure
18:38:50Araqand 'yield' is transformed into:
18:39:20Araq label:
18:39:22Araq c.state = nextLabel
18:39:24Araq return value
18:39:36Araqit requires 2 additional AST kinds
18:39:53Araqparallel iteration works like this:
18:40:02Araq for i in foo(), j in bar():
18:40:03Araq nil
18:40:09AraqIs transformed to:
18:40:11Araq while true:
18:40:12Araq let i = foo(clA)
18:40:13Araq if clA.state == -1: break
18:40:16Araq let j = bar(clB)
18:40:17Araq if clB.state == -1: break
18:40:21Araqthat's it
18:40:30Araqsolved problem :-)
18:40:43Araqthat's why I'm concentrating on full coroutines
18:41:10zaharyyeah, but the exception details are bit more messy as we've discussed
18:41:26Araqyeah, no 'yield' in a 'try'
18:41:28Araq:P
18:41:29zaharyhttp://msdn.microsoft.com/en-us/library/windows/desktop/dd627187(v=vs.85).aspx here is the new windows API I mentioned
18:42:00zaharyhttp://msdn.microsoft.com/en-us/library/windows/desktop/dd627183(v=vs.85).aspx
18:42:33zaharyreading about it now again, it may even be more complicated to wrap actually
18:44:18dom96Araq: Any ideas about that strange tester issue?
18:44:38Araqrun the test on the test machine to see where the warning occurs
18:44:47AraqI'm as puzzled as you
18:44:55dom96will do
18:44:56Araqas it doesn't warn on my machine either
18:45:49Araqit's actually amazing how good the current implementation of Nimrod's iterators is
18:46:08Araqit works with 'try', is as efficient as it can get
18:46:22Araq:-)
18:47:10Araqand its implementation makes my brain hurt whenever I need to look at it :D
18:47:53dom96It meets all the requirements then :D
18:49:48dom96Well the test compiles with no warnings on the VPS too.
18:49:56Araqwhat?
18:50:10Araqthe tester uses an old compiler then?
18:50:25dom96... maybe? But... hrm
18:50:42dom96But on all test machines?
18:51:56dom96wait a second
18:53:28dom96nah, I don't think that's it
18:55:13Araqtry on the test machine to copy the latest nimrod + the tests somewhere
18:55:24Araqand run the test with these
18:56:30dom96hrm, just copy the whole Nimrod directory somewhere else?
18:56:42dom96How would that change anything?
18:57:00dom96hrm, I guess you're thinking $PATH issues
18:57:52Araqalso downloading latest from github may have failed
18:58:03Araqand it downloaded a previous version?
18:58:06Araqhrm unlikely
18:58:20Araqbut you know the saying about insanity
18:58:54dom96The logs say otherwise
18:59:12dom96And then it would have worked after you commited.
18:59:44dom96I'll just run the tester on the run tests locally to see...
19:00:47Araqok
19:07:40Araqzahary: the boost link mentions that longjmp may invoke destructors? o.O
19:11:59zaharyyes, I though about this too - another reason why this try blocks can prove useful
19:12:50zaharythere is one bug with them right now tho - temporary values are not detected yet - we must upgrade them to regular variables and rewrite the code to insert the try block
19:14:09AraqI can't follow
19:14:15Araqwhat's the problem?
19:14:41zaharyfoo(makeDestructableValue(a, b))
19:15:16dom96Araq: I'm really glad the tester now allows you to run single tests now.
19:15:44zaharyin C++, this temporary value will be properly destroyed - nimrod won't do anything about it right now (it looks for destructable variables only in var sections)
19:16:01Araqhrm I see
19:19:07dom96wow. I feel stupid now.
19:19:26dom96The test uses ssl, when you compile with -d:ssl that warning does show up heh
19:19:59Araqthat's why we need to use ttest.nimrod.cfg files
19:20:33dom96perhaps
19:21:00Araqzahary: why don't map Nimrod's destructors to C++'s destructors again?
19:21:29Araqthat would solve the issue too
19:21:41zaharymostly because I wanted for them to work in C and in the evaluator without 3 separate paths in the code
19:21:55Araqoh yeah lol
19:21:58Araqgood point
19:22:25Araqbut you can do the rewrite in transf, right?
19:22:34Araqthat seems the better place to do that
19:22:49Araqevals.nim only works on "tranformed" ASTs already
19:23:12Araqthe expr/stmt merge requires the same logic
19:24:57zaharyyes, it's not really relevant - I don't remember if there was anything in particular that stopped me, but I'm left with the impression that you don't quite like this sem/transf separation and I don't see much value in it either
19:26:37zaharyif I recall correctly I didn't want to bring new dependencies in transf that were already present in sem
19:27:03Araqalright np
19:27:16Araqyou're right the sem/transf split turned out to be annoying
19:27:32Araqon the other hand, inlining iterators in sem is not right either
19:28:03Araqthe seperation makes sense conceptually but the implementation always bites me
19:30:33zaharypasses makes sense in general "first do this, then do this", but this could be represented in number of ways - loop over the statements with semStmt, then execute this proc on the result, etc
19:31:50Araqyeah plus every independent pass is expensive
19:31:52zaharyand the split is obviously useful when you can substitute the parser or the generator, etc
19:32:35Araqdom96: why is openssl not part of tests/compile/tlibs?
19:32:42Araqthat would have caught it ...
19:32:53dom96Araq: Ask yourself that not me.
19:33:47Araqalright, adding it
19:34:15Araqeven 'nimrod check' needs the transf pass now ...
19:34:28Araqfor reasons that I forgot
19:34:36Araqquite annoying though
19:36:47dom96hrm, is github failing?
19:37:05dom96yep: https://status.github.com/
20:08:54*jyyou_ joined #nimrod
20:10:24*jyyou quit (Read error: Operation timed out)
20:10:39*Boscop joined #nimrod
20:11:08*Bosc0p quit (Ping timeout: 255 seconds)
20:11:08*q66 quit (Ping timeout: 255 seconds)
20:12:47*q66 joined #nimrod
20:13:20*zahary quit (Quit: Leaving.)
20:14:49*Bosc0p joined #nimrod
20:16:59*Boscop quit (Ping timeout: 255 seconds)
20:21:02*Bosc0p quit (Quit: OutOfTimeException: Allocation of TimeFrame failed due to lack of time. Free up time by cancelling unimportant events.)
20:34:13*zahary joined #nimrod
20:38:20Araqzahary: turns out we really can only do the copying approach for now
20:38:59Araqor we allocate substacks from the memory region reserved for the main stack
20:39:43AraqGC's unsureAsgnRef tests whether it's a pointer on the stack or not ...
20:40:00Araqcurrently this does a quick test against the stack pointer
20:40:29Araqmake that a list/tree of intervals instead and watch performance suffer ... :-/
20:40:30*zahary quit (Quit: Leaving.)
20:41:02Araqhrm that killed him
21:13:21dom96You murderer :O
21:45:46ccssnetO.o
21:45:49ccssnet -
22:07:21dom96Finally the test succeeds :D
22:08:03Araqyay
22:20:44*zahary joined #nimrod
22:22:41*zahary quit (Client Quit)
22:56:59*zahary joined #nimrod