<< 03-07-2013 >>

00:46:54*q66 quit (Quit: Leaving)
01:00:30*Raynes_ joined #nimrod
01:00:53*Raynes quit (Ping timeout: 246 seconds)
01:00:54*SirSkidmore quit (Ping timeout: 246 seconds)
01:00:55*Raynes_ is now known as Raynes
01:00:56*Raynes quit (Changing host)
01:00:56*Raynes joined #nimrod
01:01:00*Associat0r quit (Ping timeout: 246 seconds)
01:01:00*XAMPP quit (Ping timeout: 246 seconds)
01:01:06*SirSkidmore joined #nimrod
01:02:20*Associat0r joined #nimrod
01:02:20*Associat0r quit (Changing host)
01:02:20*Associat0r joined #nimrod
01:05:18*OnionPK joined #nimrod
01:07:11*BitPuffin quit (Ping timeout: 252 seconds)
01:08:44*comex` joined #nimrod
01:10:33*OrionPK quit (*.net *.split)
01:10:35*comex quit (*.net *.split)
01:10:36*mal`` quit (*.net *.split)
01:10:47apotheonHow does one guarantee that a particular area in memory a Nimrod app uses actually gets cleared at a particular time?
01:11:14*mal`` joined #nimrod
01:19:07*DAddYE quit (Remote host closed the connection)
01:36:18*comex` is now known as comex
02:20:24*DAddYE joined #nimrod
02:27:31*DAddYE quit (Ping timeout: 264 seconds)
03:23:20*DAddYE joined #nimrod
03:30:02*DAddYE quit (Ping timeout: 252 seconds)
03:30:28*DAddYE joined #nimrod
03:40:02*EXetoC joined #nimrod
03:44:40*DAddYE quit (Remote host closed the connection)
03:44:48*DAddYE joined #nimrod
03:44:55*DAddYE quit (Remote host closed the connection)
03:45:28*DAddYE joined #nimrod
03:49:47*DAddYE quit (Ping timeout: 256 seconds)
04:10:01*OnionPK quit (Quit: Leaving)
04:39:52*Associat0r quit (Quit: Associat0r)
04:46:01*DAddYE joined #nimrod
04:52:58*DAddYE quit (Ping timeout: 256 seconds)
05:09:48*DAddYE joined #nimrod
05:12:08*DAddYE quit (Remote host closed the connection)
06:08:23*DAddYE joined #nimrod
07:05:16*Araq_ joined #nimrod
07:07:31*ack006 quit (Quit: Leaving)
07:17:35*Araq_ quit (Remote host closed the connection)
07:30:12*Araq_ joined #nimrod
07:30:56Araq_apotheon: you can p = alloc(size) and then later zeroMem(p); no idea what you have in mind
07:56:54*DAddYE quit (Remote host closed the connection)
08:52:01*Associat0r joined #nimrod
08:52:01*Associat0r quit (Changing host)
08:52:01*Associat0r joined #nimrod
08:58:01*DAddYE joined #nimrod
09:04:23*DAddYE quit (Ping timeout: 240 seconds)
09:08:48*q66 joined #nimrod
10:00:57*DAddYE joined #nimrod
10:07:27*DAddYE quit (Ping timeout: 252 seconds)
11:04:06*DAddYE joined #nimrod
11:10:51*DAddYE quit (Ping timeout: 256 seconds)
12:07:21*DAddYE joined #nimrod
12:13:57*DAddYE quit (Ping timeout: 248 seconds)
12:19:48*Associat0r quit (Quit: Associat0r)
12:29:45*Araq_ quit (Quit: ChatZilla 0.9.90 [Firefox 22.0/20130618035212])
13:10:26*DAddYE joined #nimrod
13:16:48*DAddYE quit (Ping timeout: 245 seconds)
13:24:03*BitPuffin joined #nimrod
13:28:14*Araq_ joined #nimrod
13:53:53*Trix[a]r_za is now known as Trixar_za
13:54:37*Araq_ quit (Quit: ChatZilla 0.9.90 [Firefox 22.0/20130618035212])
14:04:14apotheonAraq: The zeroMem() thing looks like it's probably about what I have in mind, but I'd have to check.
14:13:25*DAddYE joined #nimrod
14:19:59*DAddYE quit (Ping timeout: 260 seconds)
16:06:48*Trixar_za is now known as Trix[a]r_za
16:17:17*DAddYE joined #nimrod
16:24:07*DAddYE quit (Ping timeout: 260 seconds)
16:41:10*q66 quit (Read error: Operation timed out)
16:41:35*q66 joined #nimrod
16:50:50*DAddYE joined #nimrod
17:36:27*gradha joined #nimrod
18:57:33Araqgradha: your gist with the let+case expression works for me
18:58:21gradhaI mentioned to have had a broken compiler, which is why I didn't submit an issue
19:00:13Araqok that's what I thought
19:31:35*Trix[a]r_za is now known as Trixar_za
21:04:51*gradha quit (Quit: bbl, need to watch https://www.youtube.com/watch?v=1ZZC82dgJr8 again)
21:07:17*Reisen quit (Ping timeout: 252 seconds)
21:11:44*Reisen joined #nimrod
21:23:59NimBotnimrod-code/nimforum master a9515d2 Grzegorz Adam Hankiewicz [+0 ±1 -0]: Adds info about libcairo runtime dependency.
21:23:59NimBotnimrod-code/nimforum master 3368b56 Dominik Picheta [+0 ±1 -0]: Merge pull request #12 from gradha/pr_cairo_notes... 2 more lines
21:40:08*Mat2 joined #nimrod
21:40:18Mat2good day
21:40:31Araqhi Mat2
21:43:20Mat2is there a chance nimrod will support GNU's fist-class label extension (like clang and ICC do) for manually building efficient jump-tables ?
21:44:02Mat2^first
21:44:31Araqthe chance is >95%
21:45:13AraqI'm still designing the pragma for it; it's a bit cumbersome
21:45:25Araqcase x
21:45:27Araqof 0:
21:45:32Araq {.jumptable.}
21:45:39Araqdoesn't really cut it
21:46:42Araqfyi: http://www.emulators.com/docs/nx25_nostradamus.htm
21:51:06Araqfor now the best solution looks like:
21:51:12Araqwhile ...:
21:51:20Araq {.interpreterloop.}
21:51:37Araq case opcode
21:51:40Araq ...
21:52:34Araqand let the compiler merge the loop and the case into a jump table implementation
21:53:31Mat2Araq: nice read, I used some of these dispatching techniques for my vm-design in addition to statical instruction-fusion
21:54:42AraqI'm currently improving nimrod's evaluation engine
21:55:01Araqit's a simple AST interpreter with ugly adhoc special cases
21:56:13AraqI am implementing a simple optimizer that recognizes common patterns and replaces these nodes by specialized operations (called "superops")
21:57:28Araqwith ASTs you can make 'for i in x..y: body' a single superop (that calls eval for the body)
21:58:36Araqno idea how fast it will be once I'm done with it; I aim for python-like speed
21:59:44AraqMat2: any opinion on my interpreterloop pragma idea?
22:00:52*Trixar_za is now known as Trix[a]r_za
22:01:12Mat2I see we share some similar ideas. My background: I'm just apply "superops" at vm-code level let them resolve implicite though an ISA specially designed for this task
22:02:14Mat2in this case the current form with while: ... would work fine
22:03:23*OrionPK joined #nimrod
22:06:19Araqnote that any bytecode requires at least 2 instructions for my 'for'-loop example though then you can get rid of the eval recursion which is likely more expensive than the 1 additional dispatching
22:07:07AraqMat2: if you have superops you likely have variable length instructions, right?
22:07:56*Trix[a]r_za is now known as Trixar_za
22:08:26Mat2I use a packed opcode format (16 instruction slots, 64 bit, each 4 bit wide)
22:09:22Araqwhy?
22:10:07Mat2because it's an design easily implementable in a FPGA and I can hold 16 instruction in a nativre register
22:10:13Mat2^instructions
22:11:07Mat2the dispatch is then reduced to shifting out instruction combinations (3 at current) and I can optmizate out instruction fetching
22:11:59Araqso every instruction is 64 bit?
22:12:12Araqand how many bits do you use for the opcode?
22:13:32Mat24 bit for each instruction, 2-3 instructions build an opcode so one dispatch can execute up to 16 of them though software-pipelining
22:15:05Mat2no virtual "register" references because its a dual-stack design
22:15:06Araqsorry you lost me. How can an instruction only be 4 bits?
22:15:42Mat2load immediate (push immediate value onto the data stack)
22:16:47Araqthe immediate value itself may require more than 4 bits
22:17:30Araqand what about jumps
22:18:00Mat2immediate values following each instruction bundle
22:18:31Mat2the jump address must be loaded onto the data-stack before a taken branch
22:19:50Mat2ADD, SHL, SHR, LOAD, STORE, AND, GOR, XOR, NEG, DUP, DROP, SWAP, OVER all handle the top-of stack value
22:20:22*EXetoC quit (Quit: WeeChat 0.4.1)
22:20:36Mat2some instruction combinations combined to no operation, like dup+drop
22:21:01Mat2these are replaced with additional instructions requiring two slots (for immediate values for example)
22:21:16Mat2all branch instructions are decoded this way for example
22:22:01Araqso 64 bits encode 16 opcodes. The immediates follow in other 64bit slots?
22:22:10Mat2yes
22:23:32Araqand you decode 3 opcodes (12 bit) at the same time using a combinatorial table?
22:23:43Mat2correct
22:24:30Mat2(the last two slots are decoded though a 8-bit table)
22:25:51Araqdo you analyse which combinations actually do occur?
22:27:03Araqfor the rare combinations you can use a default one-at-a-time instruction executor
22:30:14Mat2I have a generator analysing a given application and creating a reduced instuction-set and code format for some memory restricted envirionments but the interpreter in its current form compile to < 100 kB (gcc) so generally I see not much use of it
22:31:24Araqalright; so how is the speed? does it beat luajit's interpreter?
22:32:58Mat2my reference is the gforth interpreter, which is labeled an interpreter but in reality a simple native-code compiler
22:34:08Mat2on my old Atom N550 netbook, it beat gforth by a factor of 2, dependent on the test (raw dispatch performance)
22:34:43Araqnice
22:35:49Mat2it was once developed as replacement for ngaro (retroforth's vm)
22:39:42Mat2and seem to be a never-ending story because I always found some ways optimize it further so far
22:42:42*Trixar_za is now known as Trix[a]r_za
22:45:10Araqwhy not JIT it then?
22:50:33Mat2well, one design goal was portability and word-size agnostic dispatching because retro was designed to be used on a wide range of platforms (I heard of someone porting it to board featuring a old 8-bit MCU)
22:51:49Mat2(my vm is realistic only usable with 32 and 64 bit cpu's now)
22:52:42Mat2cross compilation ca be a nice feature
22:53:11AraqI see that a stack based VM makes sense for Forth but don't you agree register based is technically superior?
22:55:44Mat2no, because an interpreter is mainly limited to the efficience of its dispatch handling and because of this instruction bundling resuts to much more performance than the reducion in code-size can offer
22:56:00Mat2^reduction
22:57:17Mat2in addition, combining instructions reduce the code-size comparable to a register based design in my tests
22:57:29Mat2so I do not see any advantage
22:57:52*Associat0r joined #nimrod
22:57:52*Associat0r quit (Changing host)
22:57:52*Associat0r joined #nimrod
22:58:39Mat2important is a dual-stack design
22:59:32Mat2the Java VM for example is a bad example here because it depending on stack addressing though frames mapped to a single stack
23:00:31Araqdual stack: 1 for data, 1 for control flow?
23:01:07Mat2onbe data stack and a second one for control-flow and storing addresses
23:03:19Mat2the seperation of data and addresses results to a great reduction of needed stack arithmetic and these is the main factor of increased code-size compared to a register-based design
23:04:48Araqinteresting
23:05:35Araqbut for a real CPU other things matter: addi a, b, 4; addi x, y, 5; -- trivial to execute in parallel
23:05:48Mat2that'S true
23:07:11Mat2but stack-designs for CPU's can combine the fetch and execute stage
23:08:10Mat2most MISC designs for example execute most instructions in 1 clock without needing a pipeline
23:09:11Mat2and have very short branch penalities because of that
23:09:45Mat2(the J1 cpu execute each branch for free)
23:10:39Araqwell that doesn't mean much. A clock cycle can be as long as you need it to be.
23:12:12Mat2its of advantage if you want to combine such CPU cores to a field matrix (like the GA144, which is a 144 core cpu)
23:14:44Mat2ok, that's an asynchron design and as such not really comparable to common architectures
23:15:23Mat2however, there exist some experimental cpu's with 1024 up to 4096 cores
23:17:52Mat2that seem to be an upper limit (ratio performance/watt)
23:19:25Mat2get some sleep, ciao
23:19:36Araqsame here; good night
23:19:40*Mat2 quit (Quit: Verlassend)
23:20:58*DAddYE quit (Ping timeout: 245 seconds)
23:43:16*DAddYE joined #nimrod
23:55:02OrionPKso has anyone used SDL with nimrod on windows?
23:55:17OrionPKit seems to dislike the default main() being generated by nimrod
23:55:38OrionPKon windows, SDL replaces the main with SDL_Main, and it expects "int SDL_main(int argc, char *argv[])"
23:55:49OrionPKbut nimrod generates "int main(int argc, char** args, char** env)"
23:56:04OrionPKfowl