| 00:46:54 | * | q66 quit (Quit: Leaving) |
| 01:00:30 | * | Raynes_ joined #nimrod |
| 01:00:53 | * | Raynes quit (Ping timeout: 246 seconds) |
| 01:00:54 | * | SirSkidmore quit (Ping timeout: 246 seconds) |
| 01:00:55 | * | Raynes_ is now known as Raynes |
| 01:00:56 | * | Raynes quit (Changing host) |
| 01:00:56 | * | Raynes joined #nimrod |
| 01:01:00 | * | Associat0r quit (Ping timeout: 246 seconds) |
| 01:01:00 | * | XAMPP quit (Ping timeout: 246 seconds) |
| 01:01:06 | * | SirSkidmore joined #nimrod |
| 01:02:20 | * | Associat0r joined #nimrod |
| 01:02:20 | * | Associat0r quit (Changing host) |
| 01:02:20 | * | Associat0r joined #nimrod |
| 01:05:18 | * | OnionPK joined #nimrod |
| 01:07:11 | * | BitPuffin quit (Ping timeout: 252 seconds) |
| 01:08:44 | * | comex` joined #nimrod |
| 01:10:33 | * | OrionPK quit (*.net *.split) |
| 01:10:35 | * | comex quit (*.net *.split) |
| 01:10:36 | * | mal`` quit (*.net *.split) |
| 01:10:47 | apotheon | How does one guarantee that a particular area in memory a Nimrod app uses actually gets cleared at a particular time? |
| 01:11:14 | * | mal`` joined #nimrod |
| 01:19:07 | * | DAddYE quit (Remote host closed the connection) |
| 01:36:18 | * | comex` is now known as comex |
| 02:20:24 | * | DAddYE joined #nimrod |
| 02:27:31 | * | DAddYE quit (Ping timeout: 264 seconds) |
| 03:23:20 | * | DAddYE joined #nimrod |
| 03:30:02 | * | DAddYE quit (Ping timeout: 252 seconds) |
| 03:30:28 | * | DAddYE joined #nimrod |
| 03:40:02 | * | EXetoC joined #nimrod |
| 03:44:40 | * | DAddYE quit (Remote host closed the connection) |
| 03:44:48 | * | DAddYE joined #nimrod |
| 03:44:55 | * | DAddYE quit (Remote host closed the connection) |
| 03:45:28 | * | DAddYE joined #nimrod |
| 03:49:47 | * | DAddYE quit (Ping timeout: 256 seconds) |
| 04:10:01 | * | OnionPK quit (Quit: Leaving) |
| 04:39:52 | * | Associat0r quit (Quit: Associat0r) |
| 04:46:01 | * | DAddYE joined #nimrod |
| 04:52:58 | * | DAddYE quit (Ping timeout: 256 seconds) |
| 05:09:48 | * | DAddYE joined #nimrod |
| 05:12:08 | * | DAddYE quit (Remote host closed the connection) |
| 06:08:23 | * | DAddYE joined #nimrod |
| 07:05:16 | * | Araq_ joined #nimrod |
| 07:07:31 | * | ack006 quit (Quit: Leaving) |
| 07:17:35 | * | Araq_ quit (Remote host closed the connection) |
| 07:30:12 | * | Araq_ joined #nimrod |
| 07:30:56 | Araq_ | apotheon: you can p = alloc(size) and then later zeroMem(p); no idea what you have in mind |
| 07:56:54 | * | DAddYE quit (Remote host closed the connection) |
| 08:52:01 | * | Associat0r joined #nimrod |
| 08:52:01 | * | Associat0r quit (Changing host) |
| 08:52:01 | * | Associat0r joined #nimrod |
| 08:58:01 | * | DAddYE joined #nimrod |
| 09:04:23 | * | DAddYE quit (Ping timeout: 240 seconds) |
| 09:08:48 | * | q66 joined #nimrod |
| 10:00:57 | * | DAddYE joined #nimrod |
| 10:07:27 | * | DAddYE quit (Ping timeout: 252 seconds) |
| 11:04:06 | * | DAddYE joined #nimrod |
| 11:10:51 | * | DAddYE quit (Ping timeout: 256 seconds) |
| 12:07:21 | * | DAddYE joined #nimrod |
| 12:13:57 | * | DAddYE quit (Ping timeout: 248 seconds) |
| 12:19:48 | * | Associat0r quit (Quit: Associat0r) |
| 12:29:45 | * | Araq_ quit (Quit: ChatZilla 0.9.90 [Firefox 22.0/20130618035212]) |
| 13:10:26 | * | DAddYE joined #nimrod |
| 13:16:48 | * | DAddYE quit (Ping timeout: 245 seconds) |
| 13:24:03 | * | BitPuffin joined #nimrod |
| 13:28:14 | * | Araq_ joined #nimrod |
| 13:53:53 | * | Trix[a]r_za is now known as Trixar_za |
| 13:54:37 | * | Araq_ quit (Quit: ChatZilla 0.9.90 [Firefox 22.0/20130618035212]) |
| 14:04:14 | apotheon | Araq: The zeroMem() thing looks like it's probably about what I have in mind, but I'd have to check. |
| 14:13:25 | * | DAddYE joined #nimrod |
| 14:19:59 | * | DAddYE quit (Ping timeout: 260 seconds) |
| 16:06:48 | * | Trixar_za is now known as Trix[a]r_za |
| 16:17:17 | * | DAddYE joined #nimrod |
| 16:24:07 | * | DAddYE quit (Ping timeout: 260 seconds) |
| 16:41:10 | * | q66 quit (Read error: Operation timed out) |
| 16:41:35 | * | q66 joined #nimrod |
| 16:50:50 | * | DAddYE joined #nimrod |
| 17:36:27 | * | gradha joined #nimrod |
| 18:57:33 | Araq | gradha: your gist with the let+case expression works for me |
| 18:58:21 | gradha | I mentioned to have had a broken compiler, which is why I didn't submit an issue |
| 19:00:13 | Araq | ok that's what I thought |
| 19:31:35 | * | Trix[a]r_za is now known as Trixar_za |
| 21:04:51 | * | gradha quit (Quit: bbl, need to watch https://www.youtube.com/watch?v=1ZZC82dgJr8 again) |
| 21:07:17 | * | Reisen quit (Ping timeout: 252 seconds) |
| 21:11:44 | * | Reisen joined #nimrod |
| 21:23:59 | NimBot | nimrod-code/nimforum master a9515d2 Grzegorz Adam Hankiewicz [+0 ±1 -0]: Adds info about libcairo runtime dependency. |
| 21:23:59 | NimBot | nimrod-code/nimforum master 3368b56 Dominik Picheta [+0 ±1 -0]: Merge pull request #12 from gradha/pr_cairo_notes... 2 more lines |
| 21:40:08 | * | Mat2 joined #nimrod |
| 21:40:18 | Mat2 | good day |
| 21:40:31 | Araq | hi Mat2 |
| 21:43:20 | Mat2 | is there a chance nimrod will support GNU's fist-class label extension (like clang and ICC do) for manually building efficient jump-tables ? |
| 21:44:02 | Mat2 | ^first |
| 21:44:31 | Araq | the chance is >95% |
| 21:45:13 | Araq | I'm still designing the pragma for it; it's a bit cumbersome |
| 21:45:25 | Araq | case x |
| 21:45:27 | Araq | of 0: |
| 21:45:32 | Araq | {.jumptable.} |
| 21:45:39 | Araq | doesn't really cut it |
| 21:46:42 | Araq | fyi: http://www.emulators.com/docs/nx25_nostradamus.htm |
| 21:51:06 | Araq | for now the best solution looks like: |
| 21:51:12 | Araq | while ...: |
| 21:51:20 | Araq | {.interpreterloop.} |
| 21:51:37 | Araq | case opcode |
| 21:51:40 | Araq | ... |
| 21:52:34 | Araq | and let the compiler merge the loop and the case into a jump table implementation |
| 21:53:31 | Mat2 | Araq: nice read, I used some of these dispatching techniques for my vm-design in addition to statical instruction-fusion |
| 21:54:42 | Araq | I'm currently improving nimrod's evaluation engine |
| 21:55:01 | Araq | it's a simple AST interpreter with ugly adhoc special cases |
| 21:56:13 | Araq | I am implementing a simple optimizer that recognizes common patterns and replaces these nodes by specialized operations (called "superops") |
| 21:57:28 | Araq | with ASTs you can make 'for i in x..y: body' a single superop (that calls eval for the body) |
| 21:58:36 | Araq | no idea how fast it will be once I'm done with it; I aim for python-like speed |
| 21:59:44 | Araq | Mat2: any opinion on my interpreterloop pragma idea? |
| 22:00:52 | * | Trixar_za is now known as Trix[a]r_za |
| 22:01:12 | Mat2 | I see we share some similar ideas. My background: I'm just apply "superops" at vm-code level let them resolve implicite though an ISA specially designed for this task |
| 22:02:14 | Mat2 | in this case the current form with while: ... would work fine |
| 22:03:23 | * | OrionPK joined #nimrod |
| 22:06:19 | Araq | note that any bytecode requires at least 2 instructions for my 'for'-loop example though then you can get rid of the eval recursion which is likely more expensive than the 1 additional dispatching |
| 22:07:07 | Araq | Mat2: if you have superops you likely have variable length instructions, right? |
| 22:07:56 | * | Trix[a]r_za is now known as Trixar_za |
| 22:08:26 | Mat2 | I use a packed opcode format (16 instruction slots, 64 bit, each 4 bit wide) |
| 22:09:22 | Araq | why? |
| 22:10:07 | Mat2 | because it's an design easily implementable in a FPGA and I can hold 16 instruction in a nativre register |
| 22:10:13 | Mat2 | ^instructions |
| 22:11:07 | Mat2 | the dispatch is then reduced to shifting out instruction combinations (3 at current) and I can optmizate out instruction fetching |
| 22:11:59 | Araq | so every instruction is 64 bit? |
| 22:12:12 | Araq | and how many bits do you use for the opcode? |
| 22:13:32 | Mat2 | 4 bit for each instruction, 2-3 instructions build an opcode so one dispatch can execute up to 16 of them though software-pipelining |
| 22:15:05 | Mat2 | no virtual "register" references because its a dual-stack design |
| 22:15:06 | Araq | sorry you lost me. How can an instruction only be 4 bits? |
| 22:15:42 | Mat2 | load immediate (push immediate value onto the data stack) |
| 22:16:47 | Araq | the immediate value itself may require more than 4 bits |
| 22:17:30 | Araq | and what about jumps |
| 22:18:00 | Mat2 | immediate values following each instruction bundle |
| 22:18:31 | Mat2 | the jump address must be loaded onto the data-stack before a taken branch |
| 22:19:50 | Mat2 | ADD, SHL, SHR, LOAD, STORE, AND, GOR, XOR, NEG, DUP, DROP, SWAP, OVER all handle the top-of stack value |
| 22:20:22 | * | EXetoC quit (Quit: WeeChat 0.4.1) |
| 22:20:36 | Mat2 | some instruction combinations combined to no operation, like dup+drop |
| 22:21:01 | Mat2 | these are replaced with additional instructions requiring two slots (for immediate values for example) |
| 22:21:16 | Mat2 | all branch instructions are decoded this way for example |
| 22:22:01 | Araq | so 64 bits encode 16 opcodes. The immediates follow in other 64bit slots? |
| 22:22:10 | Mat2 | yes |
| 22:23:32 | Araq | and you decode 3 opcodes (12 bit) at the same time using a combinatorial table? |
| 22:23:43 | Mat2 | correct |
| 22:24:30 | Mat2 | (the last two slots are decoded though a 8-bit table) |
| 22:25:51 | Araq | do you analyse which combinations actually do occur? |
| 22:27:03 | Araq | for the rare combinations you can use a default one-at-a-time instruction executor |
| 22:30:14 | Mat2 | I have a generator analysing a given application and creating a reduced instuction-set and code format for some memory restricted envirionments but the interpreter in its current form compile to < 100 kB (gcc) so generally I see not much use of it |
| 22:31:24 | Araq | alright; so how is the speed? does it beat luajit's interpreter? |
| 22:32:58 | Mat2 | my reference is the gforth interpreter, which is labeled an interpreter but in reality a simple native-code compiler |
| 22:34:08 | Mat2 | on my old Atom N550 netbook, it beat gforth by a factor of 2, dependent on the test (raw dispatch performance) |
| 22:34:43 | Araq | nice |
| 22:35:49 | Mat2 | it was once developed as replacement for ngaro (retroforth's vm) |
| 22:39:42 | Mat2 | and seem to be a never-ending story because I always found some ways optimize it further so far |
| 22:42:42 | * | Trixar_za is now known as Trix[a]r_za |
| 22:45:10 | Araq | why not JIT it then? |
| 22:50:33 | Mat2 | well, one design goal was portability and word-size agnostic dispatching because retro was designed to be used on a wide range of platforms (I heard of someone porting it to board featuring a old 8-bit MCU) |
| 22:51:49 | Mat2 | (my vm is realistic only usable with 32 and 64 bit cpu's now) |
| 22:52:42 | Mat2 | cross compilation ca be a nice feature |
| 22:53:11 | Araq | I see that a stack based VM makes sense for Forth but don't you agree register based is technically superior? |
| 22:55:44 | Mat2 | no, because an interpreter is mainly limited to the efficience of its dispatch handling and because of this instruction bundling resuts to much more performance than the reducion in code-size can offer |
| 22:56:00 | Mat2 | ^reduction |
| 22:57:17 | Mat2 | in addition, combining instructions reduce the code-size comparable to a register based design in my tests |
| 22:57:29 | Mat2 | so I do not see any advantage |
| 22:57:52 | * | Associat0r joined #nimrod |
| 22:57:52 | * | Associat0r quit (Changing host) |
| 22:57:52 | * | Associat0r joined #nimrod |
| 22:58:39 | Mat2 | important is a dual-stack design |
| 22:59:32 | Mat2 | the Java VM for example is a bad example here because it depending on stack addressing though frames mapped to a single stack |
| 23:00:31 | Araq | dual stack: 1 for data, 1 for control flow? |
| 23:01:07 | Mat2 | onbe data stack and a second one for control-flow and storing addresses |
| 23:03:19 | Mat2 | the seperation of data and addresses results to a great reduction of needed stack arithmetic and these is the main factor of increased code-size compared to a register-based design |
| 23:04:48 | Araq | interesting |
| 23:05:35 | Araq | but for a real CPU other things matter: addi a, b, 4; addi x, y, 5; -- trivial to execute in parallel |
| 23:05:48 | Mat2 | that'S true |
| 23:07:11 | Mat2 | but stack-designs for CPU's can combine the fetch and execute stage |
| 23:08:10 | Mat2 | most MISC designs for example execute most instructions in 1 clock without needing a pipeline |
| 23:09:11 | Mat2 | and have very short branch penalities because of that |
| 23:09:45 | Mat2 | (the J1 cpu execute each branch for free) |
| 23:10:39 | Araq | well that doesn't mean much. A clock cycle can be as long as you need it to be. |
| 23:12:12 | Mat2 | its of advantage if you want to combine such CPU cores to a field matrix (like the GA144, which is a 144 core cpu) |
| 23:14:44 | Mat2 | ok, that's an asynchron design and as such not really comparable to common architectures |
| 23:15:23 | Mat2 | however, there exist some experimental cpu's with 1024 up to 4096 cores |
| 23:17:52 | Mat2 | that seem to be an upper limit (ratio performance/watt) |
| 23:19:25 | Mat2 | get some sleep, ciao |
| 23:19:36 | Araq | same here; good night |
| 23:19:40 | * | Mat2 quit (Quit: Verlassend) |
| 23:20:58 | * | DAddYE quit (Ping timeout: 245 seconds) |
| 23:43:16 | * | DAddYE joined #nimrod |
| 23:55:02 | OrionPK | so has anyone used SDL with nimrod on windows? |
| 23:55:17 | OrionPK | it seems to dislike the default main() being generated by nimrod |
| 23:55:38 | OrionPK | on windows, SDL replaces the main with SDL_Main, and it expects "int SDL_main(int argc, char *argv[])" |
| 23:55:49 | OrionPK | but nimrod generates "int main(int argc, char** args, char** env)" |
| 23:56:04 | OrionPK | fowl |