00:46:54 | * | q66 quit (Quit: Leaving) |
01:00:30 | * | Raynes_ joined #nimrod |
01:00:53 | * | Raynes quit (Ping timeout: 246 seconds) |
01:00:54 | * | SirSkidmore quit (Ping timeout: 246 seconds) |
01:00:55 | * | Raynes_ is now known as Raynes |
01:00:56 | * | Raynes quit (Changing host) |
01:00:56 | * | Raynes joined #nimrod |
01:01:00 | * | Associat0r quit (Ping timeout: 246 seconds) |
01:01:00 | * | XAMPP quit (Ping timeout: 246 seconds) |
01:01:06 | * | SirSkidmore joined #nimrod |
01:02:20 | * | Associat0r joined #nimrod |
01:02:20 | * | Associat0r quit (Changing host) |
01:02:20 | * | Associat0r joined #nimrod |
01:05:18 | * | OnionPK joined #nimrod |
01:07:11 | * | BitPuffin quit (Ping timeout: 252 seconds) |
01:08:44 | * | comex` joined #nimrod |
01:10:33 | * | OrionPK quit (*.net *.split) |
01:10:35 | * | comex quit (*.net *.split) |
01:10:36 | * | mal`` quit (*.net *.split) |
01:10:47 | apotheon | How does one guarantee that a particular area in memory a Nimrod app uses actually gets cleared at a particular time? |
01:11:14 | * | mal`` joined #nimrod |
01:19:07 | * | DAddYE quit (Remote host closed the connection) |
01:36:18 | * | comex` is now known as comex |
02:20:24 | * | DAddYE joined #nimrod |
02:27:31 | * | DAddYE quit (Ping timeout: 264 seconds) |
03:23:20 | * | DAddYE joined #nimrod |
03:30:02 | * | DAddYE quit (Ping timeout: 252 seconds) |
03:30:28 | * | DAddYE joined #nimrod |
03:40:02 | * | EXetoC joined #nimrod |
03:44:40 | * | DAddYE quit (Remote host closed the connection) |
03:44:48 | * | DAddYE joined #nimrod |
03:44:55 | * | DAddYE quit (Remote host closed the connection) |
03:45:28 | * | DAddYE joined #nimrod |
03:49:47 | * | DAddYE quit (Ping timeout: 256 seconds) |
04:10:01 | * | OnionPK quit (Quit: Leaving) |
04:39:52 | * | Associat0r quit (Quit: Associat0r) |
04:46:01 | * | DAddYE joined #nimrod |
04:52:58 | * | DAddYE quit (Ping timeout: 256 seconds) |
05:09:48 | * | DAddYE joined #nimrod |
05:12:08 | * | DAddYE quit (Remote host closed the connection) |
06:08:23 | * | DAddYE joined #nimrod |
07:05:16 | * | Araq_ joined #nimrod |
07:07:31 | * | ack006 quit (Quit: Leaving) |
07:17:35 | * | Araq_ quit (Remote host closed the connection) |
07:30:12 | * | Araq_ joined #nimrod |
07:30:56 | Araq_ | apotheon: you can p = alloc(size) and then later zeroMem(p); no idea what you have in mind |
07:56:54 | * | DAddYE quit (Remote host closed the connection) |
08:52:01 | * | Associat0r joined #nimrod |
08:52:01 | * | Associat0r quit (Changing host) |
08:52:01 | * | Associat0r joined #nimrod |
08:58:01 | * | DAddYE joined #nimrod |
09:04:23 | * | DAddYE quit (Ping timeout: 240 seconds) |
09:08:48 | * | q66 joined #nimrod |
10:00:57 | * | DAddYE joined #nimrod |
10:07:27 | * | DAddYE quit (Ping timeout: 252 seconds) |
11:04:06 | * | DAddYE joined #nimrod |
11:10:51 | * | DAddYE quit (Ping timeout: 256 seconds) |
12:07:21 | * | DAddYE joined #nimrod |
12:13:57 | * | DAddYE quit (Ping timeout: 248 seconds) |
12:19:48 | * | Associat0r quit (Quit: Associat0r) |
12:29:45 | * | Araq_ quit (Quit: ChatZilla 0.9.90 [Firefox 22.0/20130618035212]) |
13:10:26 | * | DAddYE joined #nimrod |
13:16:48 | * | DAddYE quit (Ping timeout: 245 seconds) |
13:24:03 | * | BitPuffin joined #nimrod |
13:28:14 | * | Araq_ joined #nimrod |
13:53:53 | * | Trix[a]r_za is now known as Trixar_za |
13:54:37 | * | Araq_ quit (Quit: ChatZilla 0.9.90 [Firefox 22.0/20130618035212]) |
14:04:14 | apotheon | Araq: The zeroMem() thing looks like it's probably about what I have in mind, but I'd have to check. |
14:13:25 | * | DAddYE joined #nimrod |
14:19:59 | * | DAddYE quit (Ping timeout: 260 seconds) |
16:06:48 | * | Trixar_za is now known as Trix[a]r_za |
16:17:17 | * | DAddYE joined #nimrod |
16:24:07 | * | DAddYE quit (Ping timeout: 260 seconds) |
16:41:10 | * | q66 quit (Read error: Operation timed out) |
16:41:35 | * | q66 joined #nimrod |
16:50:50 | * | DAddYE joined #nimrod |
17:36:27 | * | gradha joined #nimrod |
18:57:33 | Araq | gradha: your gist with the let+case expression works for me |
18:58:21 | gradha | I mentioned to have had a broken compiler, which is why I didn't submit an issue |
19:00:13 | Araq | ok that's what I thought |
19:31:35 | * | Trix[a]r_za is now known as Trixar_za |
21:04:51 | * | gradha quit (Quit: bbl, need to watch https://www.youtube.com/watch?v=1ZZC82dgJr8 again) |
21:07:17 | * | Reisen quit (Ping timeout: 252 seconds) |
21:11:44 | * | Reisen joined #nimrod |
21:23:59 | NimBot | nimrod-code/nimforum master a9515d2 Grzegorz Adam Hankiewicz [+0 ±1 -0]: Adds info about libcairo runtime dependency. |
21:23:59 | NimBot | nimrod-code/nimforum master 3368b56 Dominik Picheta [+0 ±1 -0]: Merge pull request #12 from gradha/pr_cairo_notes... 2 more lines |
21:40:08 | * | Mat2 joined #nimrod |
21:40:18 | Mat2 | good day |
21:40:31 | Araq | hi Mat2 |
21:43:20 | Mat2 | is there a chance nimrod will support GNU's fist-class label extension (like clang and ICC do) for manually building efficient jump-tables ? |
21:44:02 | Mat2 | ^first |
21:44:31 | Araq | the chance is >95% |
21:45:13 | Araq | I'm still designing the pragma for it; it's a bit cumbersome |
21:45:25 | Araq | case x |
21:45:27 | Araq | of 0: |
21:45:32 | Araq | {.jumptable.} |
21:45:39 | Araq | doesn't really cut it |
21:46:42 | Araq | fyi: http://www.emulators.com/docs/nx25_nostradamus.htm |
21:51:06 | Araq | for now the best solution looks like: |
21:51:12 | Araq | while ...: |
21:51:20 | Araq | {.interpreterloop.} |
21:51:37 | Araq | case opcode |
21:51:40 | Araq | ... |
21:52:34 | Araq | and let the compiler merge the loop and the case into a jump table implementation |
21:53:31 | Mat2 | Araq: nice read, I used some of these dispatching techniques for my vm-design in addition to statical instruction-fusion |
21:54:42 | Araq | I'm currently improving nimrod's evaluation engine |
21:55:01 | Araq | it's a simple AST interpreter with ugly adhoc special cases |
21:56:13 | Araq | I am implementing a simple optimizer that recognizes common patterns and replaces these nodes by specialized operations (called "superops") |
21:57:28 | Araq | with ASTs you can make 'for i in x..y: body' a single superop (that calls eval for the body) |
21:58:36 | Araq | no idea how fast it will be once I'm done with it; I aim for python-like speed |
21:59:44 | Araq | Mat2: any opinion on my interpreterloop pragma idea? |
22:00:52 | * | Trixar_za is now known as Trix[a]r_za |
22:01:12 | Mat2 | I see we share some similar ideas. My background: I'm just apply "superops" at vm-code level let them resolve implicite though an ISA specially designed for this task |
22:02:14 | Mat2 | in this case the current form with while: ... would work fine |
22:03:23 | * | OrionPK joined #nimrod |
22:06:19 | Araq | note that any bytecode requires at least 2 instructions for my 'for'-loop example though then you can get rid of the eval recursion which is likely more expensive than the 1 additional dispatching |
22:07:07 | Araq | Mat2: if you have superops you likely have variable length instructions, right? |
22:07:56 | * | Trix[a]r_za is now known as Trixar_za |
22:08:26 | Mat2 | I use a packed opcode format (16 instruction slots, 64 bit, each 4 bit wide) |
22:09:22 | Araq | why? |
22:10:07 | Mat2 | because it's an design easily implementable in a FPGA and I can hold 16 instruction in a nativre register |
22:10:13 | Mat2 | ^instructions |
22:11:07 | Mat2 | the dispatch is then reduced to shifting out instruction combinations (3 at current) and I can optmizate out instruction fetching |
22:11:59 | Araq | so every instruction is 64 bit? |
22:12:12 | Araq | and how many bits do you use for the opcode? |
22:13:32 | Mat2 | 4 bit for each instruction, 2-3 instructions build an opcode so one dispatch can execute up to 16 of them though software-pipelining |
22:15:05 | Mat2 | no virtual "register" references because its a dual-stack design |
22:15:06 | Araq | sorry you lost me. How can an instruction only be 4 bits? |
22:15:42 | Mat2 | load immediate (push immediate value onto the data stack) |
22:16:47 | Araq | the immediate value itself may require more than 4 bits |
22:17:30 | Araq | and what about jumps |
22:18:00 | Mat2 | immediate values following each instruction bundle |
22:18:31 | Mat2 | the jump address must be loaded onto the data-stack before a taken branch |
22:19:50 | Mat2 | ADD, SHL, SHR, LOAD, STORE, AND, GOR, XOR, NEG, DUP, DROP, SWAP, OVER all handle the top-of stack value |
22:20:22 | * | EXetoC quit (Quit: WeeChat 0.4.1) |
22:20:36 | Mat2 | some instruction combinations combined to no operation, like dup+drop |
22:21:01 | Mat2 | these are replaced with additional instructions requiring two slots (for immediate values for example) |
22:21:16 | Mat2 | all branch instructions are decoded this way for example |
22:22:01 | Araq | so 64 bits encode 16 opcodes. The immediates follow in other 64bit slots? |
22:22:10 | Mat2 | yes |
22:23:32 | Araq | and you decode 3 opcodes (12 bit) at the same time using a combinatorial table? |
22:23:43 | Mat2 | correct |
22:24:30 | Mat2 | (the last two slots are decoded though a 8-bit table) |
22:25:51 | Araq | do you analyse which combinations actually do occur? |
22:27:03 | Araq | for the rare combinations you can use a default one-at-a-time instruction executor |
22:30:14 | Mat2 | I have a generator analysing a given application and creating a reduced instuction-set and code format for some memory restricted envirionments but the interpreter in its current form compile to < 100 kB (gcc) so generally I see not much use of it |
22:31:24 | Araq | alright; so how is the speed? does it beat luajit's interpreter? |
22:32:58 | Mat2 | my reference is the gforth interpreter, which is labeled an interpreter but in reality a simple native-code compiler |
22:34:08 | Mat2 | on my old Atom N550 netbook, it beat gforth by a factor of 2, dependent on the test (raw dispatch performance) |
22:34:43 | Araq | nice |
22:35:49 | Mat2 | it was once developed as replacement for ngaro (retroforth's vm) |
22:39:42 | Mat2 | and seem to be a never-ending story because I always found some ways optimize it further so far |
22:42:42 | * | Trixar_za is now known as Trix[a]r_za |
22:45:10 | Araq | why not JIT it then? |
22:50:33 | Mat2 | well, one design goal was portability and word-size agnostic dispatching because retro was designed to be used on a wide range of platforms (I heard of someone porting it to board featuring a old 8-bit MCU) |
22:51:49 | Mat2 | (my vm is realistic only usable with 32 and 64 bit cpu's now) |
22:52:42 | Mat2 | cross compilation ca be a nice feature |
22:53:11 | Araq | I see that a stack based VM makes sense for Forth but don't you agree register based is technically superior? |
22:55:44 | Mat2 | no, because an interpreter is mainly limited to the efficience of its dispatch handling and because of this instruction bundling resuts to much more performance than the reducion in code-size can offer |
22:56:00 | Mat2 | ^reduction |
22:57:17 | Mat2 | in addition, combining instructions reduce the code-size comparable to a register based design in my tests |
22:57:29 | Mat2 | so I do not see any advantage |
22:57:52 | * | Associat0r joined #nimrod |
22:57:52 | * | Associat0r quit (Changing host) |
22:57:52 | * | Associat0r joined #nimrod |
22:58:39 | Mat2 | important is a dual-stack design |
22:59:32 | Mat2 | the Java VM for example is a bad example here because it depending on stack addressing though frames mapped to a single stack |
23:00:31 | Araq | dual stack: 1 for data, 1 for control flow? |
23:01:07 | Mat2 | onbe data stack and a second one for control-flow and storing addresses |
23:03:19 | Mat2 | the seperation of data and addresses results to a great reduction of needed stack arithmetic and these is the main factor of increased code-size compared to a register-based design |
23:04:48 | Araq | interesting |
23:05:35 | Araq | but for a real CPU other things matter: addi a, b, 4; addi x, y, 5; -- trivial to execute in parallel |
23:05:48 | Mat2 | that'S true |
23:07:11 | Mat2 | but stack-designs for CPU's can combine the fetch and execute stage |
23:08:10 | Mat2 | most MISC designs for example execute most instructions in 1 clock without needing a pipeline |
23:09:11 | Mat2 | and have very short branch penalities because of that |
23:09:45 | Mat2 | (the J1 cpu execute each branch for free) |
23:10:39 | Araq | well that doesn't mean much. A clock cycle can be as long as you need it to be. |
23:12:12 | Mat2 | its of advantage if you want to combine such CPU cores to a field matrix (like the GA144, which is a 144 core cpu) |
23:14:44 | Mat2 | ok, that's an asynchron design and as such not really comparable to common architectures |
23:15:23 | Mat2 | however, there exist some experimental cpu's with 1024 up to 4096 cores |
23:17:52 | Mat2 | that seem to be an upper limit (ratio performance/watt) |
23:19:25 | Mat2 | get some sleep, ciao |
23:19:36 | Araq | same here; good night |
23:19:40 | * | Mat2 quit (Quit: Verlassend) |
23:20:58 | * | DAddYE quit (Ping timeout: 245 seconds) |
23:43:16 | * | DAddYE joined #nimrod |
23:55:02 | OrionPK | so has anyone used SDL with nimrod on windows? |
23:55:17 | OrionPK | it seems to dislike the default main() being generated by nimrod |
23:55:38 | OrionPK | on windows, SDL replaces the main with SDL_Main, and it expects "int SDL_main(int argc, char *argv[])" |
23:55:49 | OrionPK | but nimrod generates "int main(int argc, char** args, char** env)" |
23:56:04 | OrionPK | fowl |