Projects

Ticket #424 (new defect)

Opened 2 years ago

Last modified 14 months ago

performance regression from ruby 1.8 and 1.9

Reported by: jordan.breeding@… Owned by: lsansonetti@…
Priority: blocker Milestone: MacRuby Later
Component: MacRuby Keywords:
Cc:

Description

I have some code that previously ran incorrectly on MacRuby 0.4, and until recently crashed MacRuby trunk. Now that it works I would like to report that the performance of it is not as fast as 1.8 or 1.9.

I know that MacRuby working is a primary consideration and speed only comes after that, but as people start looking at speed more and more I wanted to get this on the radar.

This also supersedes #348 which was a smaller also based on this code, but was a reduced file since the main code would not run.

Performance of Ruby 1.8, Ruby 1.9, and MacRuby compared using a faster and a slower version of the same basic algorithm:

35 jordan@thetourist ~/priority_queue_bug > for LOOP in ruby ruby1.9 macruby; do for LOOP2 in ./search.pqueue ./search_slow.pqueue; do eval time "${LOOP}" "${LOOP2}" -m -c a-star input_files/input13_7.txt >/dev/null; done; done
ruby ./search.pqueue -m -c a-star input_files/input13_7.txt  4.51s user 0.10s system 98% cpu 4.657 total
ruby ./search_slow.pqueue -m -c a-star input_files/input13_7.txt  9.84s user 0.15s system 98% cpu 10.087 total
ruby1.9 ./search.pqueue -m -c a-star input_files/input13_7.txt  2.74s user 0.08s system 98% cpu 2.851 total
ruby1.9 ./search_slow.pqueue -m -c a-star input_files/input13_7.txt  5.80s user 0.11s system 99% cpu 5.960 total
macruby ./search.pqueue -m -c a-star input_files/input13_7.txt  14.30s user 0.69s system 171% cpu 8.722 total
macruby ./search_slow.pqueue -m -c a-star input_files/input13_7.txt  27.30s user 0.88s system 150% cpu 18.745 total

performance of various versions of the script specific to MacRuby:

36 jordan@thetourist ~/priority_queue_bug > for LOOP in *.cocoa; do eval time macruby "${LOOP}" -c -m a-star input_files/input13_7.txt >/dev/null; done
macruby search.cocoa -c -m a-star input_files/input13_7.txt  14.38s user 0.79s system 161% cpu 9.392 total
macruby search_group_add.cocoa -c -m a-star input_files/input13_7.txt  13.61s user 0.71s system 171% cpu 8.345 total
macruby search_group_add_slow.cocoa -c -m a-star input_files/input13_7.txt  26.97s user 0.92s system 146% cpu 19.027 total
macruby search_slow.cocoa -c -m a-star input_files/input13_7.txt  27.34s user 0.94s system 150% cpu 18.730 total

MacRuby information:

37 jordan@thetourist ~/priority_queue_bug > macruby -e 'p MACRUBY_VERSION'
"0.5"
38 jordan@thetourist ~/priority_queue_bug > macruby -e 'p MACRUBY_REVISION'
"git commit 19b2d7811662bf636cbcc2fe1ae99afef3243225"

Attachments

priority_queue_bug.zip Download (156.7 KB) - added by jordan.breeding@… 2 years ago.
performance_test.rb Download (6.6 KB) - added by jordan.breeding@… 2 years ago.

Change History

Changed 2 years ago by jordan.breeding@…

Another example (I think this file is the worst case for my sample program):

12 jordan@thetourist ...tificial Intelligence/Assignment No. 1/priority_queue_bug > for LOOP in ruby ruby1.9 macruby; do eval time "${LOOP}" ./search.pqueue -m -c a-star input_files/input13_1.txt >/dev/null; done
ruby ./search.pqueue -m -c a-star input_files/input13_1.txt  12.22s user 0.19s system 99% cpu 12.513 total
ruby1.9 ./search.pqueue -m -c a-star input_files/input13_1.txt  7.10s user 0.13s system 99% cpu 7.251 total
macruby ./search.pqueue -m -c a-star input_files/input13_1.txt  34.86s user 1.52s system 180% cpu 20.195 total

Changed 2 years ago by mattaimonetti@…

Jordan, could you please find out what exactly is slower in your script? If you can extract it out it will be easier to work on the isolated piece of code that's slower.

Thanks,

- Matt

Changed 2 years ago by jordan.breeding@…

Well, unless I am losing my mind it must be something to do with iterating over things in MacRuby. I say this because even in the example below the amount of nodes being pushed onto the queue should be dominated by all of the other work going on, almost eliminating performance differences of the queue itself.

57 jordan@thetourist ~/priority_queue_bug > for LOOP in search.*; do eval time macruby "${LOOP}" -c -m a-star input_files/input13_1.txt >/dev/null; done
macruby search.cocoa -c -m a-star input_files/input13_1.txt  33.69s user 1.58s system 182% cpu 19.363 total
macruby search.pqueue -c -m a-star input_files/input13_1.txt  35.01s user 1.49s system 179% cpu 20.278 total

However, if I use two new scripts that are strictly adding a large amount of nodes the the queue the difference is greater, in this case the .pqueue file would still be taking the hit of iterating, etc. strictly on the MacRuby side (where MacRuby appears to be slower than 1.8 or 1.9), the .cocoa example ends up doing most of the heaving lifting in this case through CFBinaryHeap.

Here are ruby, ruby1.9, and macruby using the .pqueue file:

62 jordan@thetourist ~/priority_queue_bug > for LOOP in ruby ruby1.9 macruby; do eval time "${LOOP}" ./speed_test.pqueue; done
ruby ./speed_test.pqueue  82.38s user 3.10s system 96% cpu 1:28.39 total
ruby1.9 ./speed_test.pqueue  55.25s user 3.02s system 99% cpu 58.845 total
macruby ./speed_test.pqueue  69.08s user 11.54s system 47% cpu 2:49.53 total

Here is macruby using the .pqueue file and the .cocoa file:

63 jordan@thetourist ~/priority_queue_bug > for LOOP in speed_test.*; do eval time macruby "${LOOP}"; done
macruby speed_test.cocoa  6.73s user 0.34s system 117% cpu 6.005 total
macruby speed_test.pqueue  65.32s user 10.76s system 50% cpu 2:31.36 total

Here is the .pqueue file:

#!/usr/bin/env ruby1.9

# put the directory this file is in into the search path so that we
# can get ruby_priority_queue below

$:.unshift(File.dirname($0))

require 'lib/revision'
require 'lib/pqueue'

if (Revision.new(RUBY_VERSION) < Revision.new("1.9"))
    class String
        def chars
            return self.split("")
        end
    end
end

class TestNode
    include Comparable

    attr_reader :priority

    def initialize(newPriority)
        $counter = $counter + 1
        @priority = newPriority
    end

    def <=>(rhs)
        @priority <=> rhs.priority
    end

    def getChildren
        (@priority * 5).upto((@priority * 5) + 2).map { |childPriority|
            TestNode.new(childPriority)
        }
    end
end

$counter = 0

$searchQueue = PQueue.new
$searchQueue.push(TestNode.new(0))

while ((currentNode = $searchQueue.pop) && ($counter < 250000)) do
    currentNode.getChildren.each { |child|
        $searchQueue.push(child)
    }
end

exit 0

Here is the .cocoa file:

#!/usr/bin/env macruby

# put the directory this file is in into the search path so that we
# can get ruby_priority_queue below

$:.unshift(File.dirname($0))

require 'lib/revision'

framework 'lib/JBBPriorityQueue.framework'

if (Revision.new(RUBY_VERSION) < Revision.new("1.9"))
    class String
        def chars
            return self.split("")
        end
    end
end

class TestNode
    include Comparable

    attr_reader :priority

    def initialize(newPriority)
        $counter = $counter + 1
        @priority = newPriority
    end

    def <=>(rhs)
        @priority <=> rhs.priority
    end

    def compareToNode(rhs)
        @priority <=> rhs.priority
    end

    def getChildren
        (@priority * 5).upto((@priority * 5) + 2).map { |childPriority|
            TestNode.new(childPriority)
        }
    end
end

$counter = 0

$searchQueue = JBBPriorityQueue.alloc.init
$searchQueue.push(TestNode.new(0))

while ((currentNode = $searchQueue.pop) && ($counter < 250000)) do
    currentNode.getChildren.each { |child|
        $searchQueue.push(child)
    }
end

exit 0

The library files are in the attached zip file already, these files were run from that directory.

I will keep looking for other places in my main script that might have large differences between ruby, ruby1.9, and macruby. Please let me know if there is anything else I can specifically look for.

Changed 2 years ago by mattaimonetti@…

Please let me know if there is anything else I can specifically look for.

Can you break it down to a smaller example really showing where the problem is without requiring your script?

Thanks,

- Matt

Changed 2 years ago by jordan.breeding@…

I can look into it more later, the ones posted above actually only require two smallish ruby files out of the lib directory from my archive, I could just paste them into the top of the other file and upload it as a new attachment if that would help in any way.

In the mean time I used DTrace to get some timings for functions/methods from ruby and MacRuby

ruby:

13 jordan@thetourist ~/priority_queue_bug > sudo dtrace -s ruby_method_analysis.d -c "ruby ./search.pqueue -m -c a-star input_files/input13_4.txt"
Password:
Target pid: 352


Step 0: bbwxbwbwbwwbw (c=0)
Step 1: move  6 bbwbbwxwbwwbw (c=3)
Step 2: move  5 bbwbbxwwbwwbw (c=1)
Step 3: move  8 bbwbbbwwxwwbw (c=3)
Step 4: move  2 bbxbbbwwwwwbw (c=6)
Step 5: move 11 bbbbbbwwwwwxw (c=9)
Step 6: move  6 bbbbbbxwwwwww (c=5)


                                   CLASS       METHOD                                     COUNT    AVG(us)      SUM(us)
-----------------------------------------------------------------------------------------------------------------------
                                  String       split                                       3298         36       121861
                                      IO       write                                         16         19          307
                                  String       %                                              6         16           98
                               Exception       initialize                                     1         16           16
                         JBBNodePriority       initialize                                  3308         16        53831
                               Exception       set_backtrace                                  1         15           15
                                   Range       each                                        3299         15        51517
                                  String       initialize_copy                            42886         14       619271
                               Exception       backtrace                                      1         14           14
                                  Fixnum       to_s                                          15         14          213
                                 JBBNode       priority                                   38826         14       544505
                                 JBBNode       to_s                                           7         13           97
                                    Hash       []=                                          381         13         5291
                                    Hash       include?                                    8629         13       117828
                                  String       index                                       3298         13        44610
                                  String       length                                      3300         13        43452
                                  String       []=                                        79176         13      1034814
                                   Array       push                                         381         13         4971
                                   Array       to_a                                        3299         12        42753
                                   Array       []=                                        18440         12       237676
                                  Fixnum       div                                         7236         12        93047
                                  Fixnum       <=>                                         5068         12        64921
                                  String       ==                                         53172         12       679790
                                   Array       []                                         87152         12      1109695
                                  String       []                                         79176         12      1007480
                                  Fixnum       ==                                         42887         12       544730
                                  Fixnum       abs                                        20086         12       255014
                                  Fixnum       /                                           9894         12       125471
                                    Hash       []                                         16788         12       211988
                                  Fixnum       >                                          31024         12       390794
                                  Fixnum       *                                          21067         12       264933
                                  Fixnum       <                                          28374         12       355145
                                  Fixnum       -                                          24145         12       302109
                                  Fixnum       +                                          81377         12      1017731

MacRuby:

14 jordan@thetourist ~/priority_queue_bug > sudo dtrace -s macruby_method_analysis.d -c "macruby ./search.pqueue -m -c a-star input_files/input13_4.txt"
Target pid: 360


Step 0: bbwxbwbwbwwbw (c=0)
Step 1: move  6 bbwbbwxwbwwbw (c=3)
Step 2: move  5 bbwbbxwwbwwbw (c=1)
Step 3: move  8 bbwbbbwwxwwbw (c=3)
Step 4: move  2 bbxbbbwwwwwbw (c=6)
Step 5: move 11 bbbbbbwwwwwxw (c=9)
Step 6: move  6 bbbbbbxwwwwww (c=5)


                                   CLASS       METHOD                                     COUNT    AVG(us)      SUM(us)
-----------------------------------------------------------------------------------------------------------------------
                                  PQueue       initialize                                     1       9068         9068
                                   Class       protected:                                     1       6113         6113
                                Revision       initialize:                                    2       1456         2912
                                   Class       private:                                      11       1398        15381
                      OptionParser::List       each_option                                    6        861         5167
                                   Class       attr_reader:                                  11        685         7541
                                   Class       gm:                                            1        663          663
                                   Class       include:                                       5        646         3231
                          NSMutableArray       extend:                                        1        613          613
                          NSMutableArray       <=>:                                           1        571          571
                         NSMutableString       %:                                            13        550         7151
                                   Class       attr_accessor:                                 4        545         2180
                 OptionParser::OptionMap       invert                                        14        524         7346
                              Enumerator       to_a                                        3634        517      1880180
                                 JBBNode       <=>:                                       22168        429      9528017
                                   Class       extend_object:                                 3        422         1267
                          NSMutableArray       each                                        3654        364      1333418
                              FalseClass       ===:                                          20        345         6901
                         NSMutableString       scan:                                          1        344          344
                                NilClass       ===:                                          20        343         6878
                               TrueClass       ===:                                          20        342         6852
                                   Range       to_a                                        3633        323      1176485
                     NSMutableDictionary       update:                                        7        319         2239
                                   Class       attr:                                          1        314          314
                                   Class       attr_writer:                                   5        297         1488
                                   Class       new                                            7        289         2024
                 OptionParser::OptionMap       []:                                            2        283          566
                      OptionParser::List       __send__:                                     86        281        24238
                          NSMutableArray       each_with_index                                1        275          275
                                   Class       new:                                        3639        270       985681
                          NSMutableArray       delete_if                                      7        254         1782
                                    Proc       call:                                          4        253         1015
                                   Class       append_features:                               5        247         1236
                              ByteString       swap!:                                     43596        241     10536678
                                     Set       initialize                                     1        199          199
            OptionParser::CompletingHash       []=:                                           7        182         1279
                      OptionParser::List       initialize                                     3        148          445
                                   Class       readlines:                                     1        139          139
        OptionParser::Switch::NoArgument       initialize:                                    6        135          810
  OptionParser::Switch::RequiredArgument       initialize:                                    1        132          132
                                   Class       alloc                                       3636        124       452700
                         NSMutableString       *:                                             4        121          487
                                  Method       to_proc                                        1        119          119
                                 NSArray       <<:                                            2        119          238
                         NSMutableString       +:                                             2        117          234
                         JBBNodePriority       initialize:                                 3632        116       423058
                 OptionParser::OptionMap       default:                                       2        115          231
                                   Class       extended:                                      2        114          229
                                    Proc       initialize                                     4        114          458
                      OptionParser::List       atype                                         81        114         9275
                                   Range       each                                        3634        114       415928
                                NilClass       ==:                                           20        113         2277
                                   Class       included:                                      5        113          565
                              FalseClass       ==:                                           20        112         2259
                                   Class       >=:                                            2        112          224
                               TrueClass       ==:                                           20        112         2240
                                NilClass       singleton_method_added:                        1        110          110
                                   Class       method_added:                                232        110        25593
                                   Class       ==:                                            4        110          440
                                NSObject       singleton_method_added:                        1        109          109
                                    Proc       to_proc                                      119        108        12946
                                   Class       inherited:                                    23        107         2462
  OptionParser::Switch::OptionalArgument       ==:                                            1        106          106
  OptionParser::Switch::RequiredArgument       ==:                                           12        106         1274
                                 JBBNode       to_s                                           7        105          738
        OptionParser::Switch::NoArgument       ==:                                           33        104         3458
                                  Regexp       ==:                                            6        102          617
                      OptionParser::List       short                                          6        100          605
                         NSMutableString       chars                                          2         94          189
                                   Class       exists?:                                       1         92           92
                              ByteString       chars                                       7266         90       659583
                              ByteString       strip                                          1         89           89
                                   Class       dirname:                                       1         87           87
                              ByteString       downcase                                       1         81           81
                         NSMutableString       =~:                                            1         70           70
                                 JBBNode       puts:                                          7         63          445
                          NSMutableArray       reject                                      3633         59       217170
                          NSMutableArray       method:                                        1         58           58
                              ByteString       count:                                         3         56          169
                          NSMutableArray       join:                                          1         55           55
                                   Class       lambda                                         5         55          275
                          NSMutableArray       instance_eval                                  1         53           53
                               JBBSearch       start_node                                     1         53           53
                                   Class       file?:                                         1         48           48
                                 JBBNode       puts                                           2         47           95
                          NSMutableArray       length                                         2         46           93
                                  Fixnum       to_s                                           1         41           41
                         NSMutableString       split:                                         3         40          121
                                  PQueue       lambda                                         1         40           40
                            OptionParser       version=:                                      1         38           38
                          NSMutableArray       at:                                            6         37          225
            OptionParser::CompletingHash       respond_to?:                                   2         36           72
                          NSMutableArray       unshift:                                       1         35           35
                                   Class       basename:                                      1         34           34
                                NSObject       include?:                                      1         33           33
                         NSMutableString       downcase                                       8         33          267
                                  PQueue       loop                                           1         33           33
                            OptionParser       banner=:                                       1         32           32
                 OptionParser::OptionMap       []=:                                          17         32          546
                     NSMutableDictionary       fetch:                                        81         31         2548
                                 JBBNode       save_path:                                     1         30           30
                              ByteString       []:                                        87192         30      2689254
                              ByteString       =~:                                            1         30           30
                          NSMutableArray       []=:                                           1         29           29
                                Revision       contents                                       1         29           29
                     NSMutableDictionary       keys                                          20         29          587
                                   Class       incompatible_argument_styles:                  1         29           29
                              ByteString       dup                                        47229         29      1378244
                 OptionParser::OptionMap       fetch:                                         5         29          145
                          NSMutableArray       []:                                            1         28           28
                          NSMutableArray       push:                                        426         28        12089
        OptionParser::Switch::NoArgument       block                                          2         28           56
                                   Class       const_set:                                     9         28          254
                                     Set       add:                                         413         27        11538
        OptionParser::Switch::NoArgument       conv                                           4         27          110
  OptionParser::Switch::RequiredArgument       match_nonswitch?:                              2         27           55
        OptionParser::Switch::NoArgument       match_nonswitch?:                             12         27          330
                                 JBBNode       child_for_goal                                 7         26          187
                      OptionParser::List       long                                           3         26           79
                                  Regexp       respond_to?:                                  11         26          289
                            OptionParser       top                                            5         26          130
                            OptionParser       base                                           3         25           77
                      OptionParser::List       list                                           6         25          154
                          NSMutableArray       size                                           1         25           25
                     NSMutableDictionary       freeze                                         1         25           25
                                 JBBNode       state                                        413         25        10589
                            OptionParser       notwice:                                       2         25           51
                         NSMutableString       freeze                                        12         25          300
                      OptionParser::List       block_given?                                   9         25          225
                          NSMutableArray       freeze                                         5         24          124
                              ByteString       index:                                      3633         24        90215
                                   Class       pattern                                        6         24          148
                                   Class       proc                                           3         24           74
                            OptionParser       block_given?                                  30         24          733
                                 JBBNode       node_cost                                      7         24          170
                          NSMutableArray       empty?                                         8         24          194
                                 JBBNode       parent                                       832         24        20204
                          NSMutableArray       shift                                          5         24          121
                                   Class       top                                           13         24          312
                         JBBNodePriority       second_level                                5258         23       126144
                         NSMutableString       to_i                                          12         23          286
                                 JBBNode       expanded_children                            412         23         9833
                     NSMutableDictionary       include?:                                   9376         23       221179
                         JBBNodePriority       first_level                                38373         22       871790
                                 JBBNode       blank                                       3638         22        82600
                                NilClass       nil?                                           3         22           67
                              ByteString       []=:                                       87192         22      1945721
                                 JBBNode       level                                       4103         22        91287
                                 JBBNode       cost                                        7264         22       161085
                                  Fixnum       <=>:                                        5260         21       115443
                                 JBBNode       heuristic                                   3632         21        76630
                                 JBBNode       priority                                   40704         20       850370
                                  Fixnum       abs                                        22199         20       460471
                                  Fixnum       div:                                        7921         19       153603

Changed 2 years ago by jordan.breeding@…

Another interesting data point: The stripped down test that just generates and inserts X number of nodes and pops some along the way (to generate new nodes) always runs incredibly fast when using the CFBinaryHeap version of the script:

54 jordan@thetourist ~/priority_queue_bug > for LOOP in *_{125,187,218,250}.cocoa; do eval time macruby "${LOOP}"; done
macruby speed_test_125.cocoa  3.45s user 0.17s system 145% cpu 2.495 total
macruby speed_test_187.cocoa  4.97s user 0.22s system 150% cpu 3.457 total
macruby speed_test_218.cocoa  5.92s user 0.26s system 152% cpu 4.060 total
macruby speed_test_250.cocoa  6.85s user 0.29s system 153% cpu 4.657 total

When running the pure ruby version it is slower, and periodically can even be slower than ruby and ruby1.9, which are both fairly consistent, even with large data sets (250,000 items):

ruby speed_test_250.pqueue  81.52s user 2.80s system 99% cpu 1:24.36 total
ruby speed_test_250.pqueue  81.48s user 2.82s system 99% cpu 1:24.40 total
ruby speed_test_250.pqueue  81.77s user 3.03s system 99% cpu 1:24.86 total
ruby speed_test_250.pqueue  81.51s user 2.89s system 99% cpu 1:24.44 total

while the MacRuby version tends to fluctuate, especially with large data sets, the interesting part is that when it is slower with large data sets it is using a decent amount of memory (at least according to iStat Menus), and not as much CPU, possibly an issue with garbage collection?

macruby speed_test_250.pqueue  76.62s user 9.61s system 160% cpu 53.618 total
macruby speed_test_250.pqueue  78.58s user 9.93s system 159% cpu 55.512 total
macruby speed_test_250.pqueue  82.74s user 13.04s system 98% cpu 1:36.89 total
macruby speed_test_250.pqueue  78.36s user 11.76s system 58% cpu 2:33.71 total

The _125 variant generates 125000 nodes, _187 generates 187500 nodes, _218 generates 218750 nodes, 250 generates 250000 nodes.

MacRuby tends to perform more slowly with large data sets with several applications open, but not doing anything (they are taking up memory, but not CPU). In the same scenario the timings for ruby and ruby1.9 are still consistent and ruby1.9 still performs quickly.

Changed 2 years ago by jordan.breeding@…

Just tried a new version that generates 300000 nodes.

ruby and ruby1.9 end up slowing down significantly, memory usage is way up, cpu is down, macruby doesn't even finish running:

60 jordan@thetourist ~/priority_queue_bug > for LOOP in ruby ruby1.9 macruby; do eval time "${LOOP}" ./speed_test_300.pqueue; done
ruby ./speed_test_300.pqueue  124.68s user 5.30s system 56% cpu 3:50.52 total
ruby1.9 ./speed_test_300.pqueue  80.61s user 7.13s system 47% cpu 3:05.52 total
macruby(1524,0x7fff70940be0) malloc: *** auto malloc[1524]: agc error: Can not allocate new region
2009-11-10 22:30:32.742 macruby[1524:903] -[Bignum message]: unrecognized selector sent to instance 0x200088220
2009-11-10 22:30:34.888 macruby[1524:903] *** Terminating app due to uncaught exception 'NoMethodError', reason: 'undefined method `message' for #<Bignum:0x200088220>'
*** Call stack at first throw:
(
	0   CoreFoundation                      0x00007fff80849444 __exceptionPreprocess + 180
	1   libobjc.A.dylib                     0x00007fff881a60f3 objc_exception_throw + 45
	2   libmacruby.dylib                    0x000000010017aef5 rb_vm_raise + 437
	3   libmacruby.dylib                    0x0000000100040f79 rb_exc_raise + 9
	4   libmacruby.dylib                    0x000000010017a350 rb_vm_method_missing + 560
	5   libmacruby.dylib                    0x000000010015fbcc rb_vm_call_with_cache2 + 5740
	6   libmacruby.dylib                    0x0000000100161a34 _ZL14method_missingmP13objc_selectorP11rb_vm_blockiPKm29rb_vm_method_missing_reason_t + 324
	7   libmacruby.dylib                    0x00000001001623cd rb_vm_call + 2125
	8   libmacruby.dylib                    0x00000001001796c3 rb_vm_print_current_exception + 99
	9   macruby                             0x0000000100000dec main + 172
	10  macruby                             0x0000000100000d34 start + 52
)
terminate called after throwing an instance of 'NSException'
zsh: abort      macruby ./speed_test_300.pqueue
macruby ./speed_test_300.pqueue  69.00s user 15.48s system 27% cpu 5:03.12 total

Changed 2 years ago by lsansonetti@…

There is definitely something wrong going on but I have higher priority bugs to fix at this point and can't help you now. If you want to help you may want to use shark and identify the slow code paths. Try to reproduce the performance problems with a small snippet and let us know of what you find.

Changed 2 years ago by jordan.breeding@…

Here is a shark profile, I was running 10.6.2 and MacRuby is git revision "git commit 554fe2e5930a8c5cb8d0f798cfb1d3e1cec8b378", it was 2 MB, so I am just linking to it.

 Shark Profile

Changed 2 years ago by jordan.breeding@…

Well, I don't have any new deep insights, I need to run shark or dtrace on the new version later, but I am updating the archive attached to this bug, and upon creating a new completely ruby based version of a priority queue based on my Objective-C version I was able to improve the run times of everything except for MacRuby.

The old pqueue.rb:

/usr/bin/ruby ./search.pqueue.old.rb -c -m a-star input_files/input13_1.txt  19.01s user 0.21s system 99% cpu 19.338 total
/opt/homebrew/bin/ruby ./search.pqueue.old.rb -c -m a-star   13.60s user 0.15s system 99% cpu 13.779 total
macruby ./search.pqueue.old.rb -c -m a-star input_files/input13_1.txt  33.11s user 1.53s system 179% cpu 19.251 total

The new pqueue.rb:

/usr/bin/ruby ./search.pqueue.rb -c -m a-star input_files/input13_1.txt  9.92s user 0.17s system 99% cpu 10.151 total
/opt/homebrew/bin/ruby ./search.pqueue.rb -c -m a-star   6.38s user 0.12s system 99% cpu 6.552 total
macruby ./search.pqueue.rb -c -m a-star input_files/input13_1.txt  32.25s user 1.11s system 142% cpu 23.348 total

For comparison, the Objective-C based version:

macruby ./search.cocoa.rb -c -m a-star input_files/input13_1.txt > /dev/null  28.56s user 1.45s system 179% cpu 16.679 total

I don't think it is JIT overhead since a small sample still runs quickly in macruby, I was going to try AOT but the file from macrubyc gives me errors.

Small sample:

macruby ./search.cocoa.rb -c -m a-star input_files/input3.txt > /dev/null  1.00s user 0.08s system 107% cpu 1.002 total
macruby ./search.cocoa.rb -c -m a-star input_files/input7.txt > /dev/null  1.03s user 0.08s system 107% cpu 1.027 total

error with compile:

286 jordan@thetourist ~/priority_queue_bug > macrubyc -o search.cocoa search.cocoa.rb
287 jordan@thetourist ~/priority_queue_bug > ./search.cocoa
core:in `dirname:': can't convert false into String (TypeError)

Changed 2 years ago by jordan.breeding@…

Changed 2 years ago by jordan.breeding@…

Alright, after taking another look I don't know that this is the only place where it is slower, but String is definitely causing some of the problem. I add a method to String to swap characters inside the String, this seems to be consistently slower in MacRuby.

#!/usr/bin/eval ruby

class String
    def swap(first, second)
        self.dup.swap!(first, second)
    end
    
    def swap!(first, second)
        temp_store = self[first]
        self[first] = self[second]
        self[second] = temp_store
        return self
    end
end

# first test
puts
puts("Test 001 -- String character swap, 2500000 times")

$staticString = "asdfghjkl;123456789"
$firstTime = 0.0
2500000.times {
    randOne = rand($staticString.length)
    randTwo = rand($staticString.length)
    innerTime = Time.new
    newString = $staticString.swap(randOne, randTwo)
    $firstTime += Time.new - innerTime
}

puts("Test 001: %.4f seconds" % [$firstTime])
puts

63 jordan@thetourist ~/Tests/performance > ruby ./small_performance_test.rb 

Test 001 -- String character swap, 2500000 times
Test 001: 7.8250 seconds

64 jordan@thetourist ~/Tests/performance > /opt/homebrew/bin/ruby ./small_performance_test.rb

Test 001 -- String character swap, 2500000 times
Test 001: 7.3780 seconds

65 jordan@thetourist ~/Tests/performance > macruby ./small_performance_test.rb

Test 001 -- String character swap, 2500000 times
Test 001: 23.7841 seconds

Changed 2 years ago by jordan.breeding@…

Changed 2 years ago by jordan.breeding@…

So, I wrote a self contained group of tests to test the performance of two variants of swapping characters in strings, and test the performance of push/pop and pushObjects/pop in combination with inserting into a Set.

The attachment is performance_test.rb Download

Results:

Ruby 1.8

Test 001 -- String character swap, 2500000 times
Test 001: 13.5609 seconds
Overall: 19.4610 seconds


Test 002 -- String character swap (alternate), 2500000 times
Test 002: 8.5599 seconds
Overall: 14.6769 seconds


Test 003 and 004 -- Insert 500000 nodes into a priority queue using pop/push (003), plus Set calls (004)
Test 003: 71.3116 seconds
Test 004: 1.1641 seconds
Overall: 73.5490 seconds


Test 005 and 006 -- Insert 500000 nodes into a priority queue using pop/pushObjects (005), plus Set calls (006)
Test 005: 77.3168 seconds
Test 006: 1.0839 seconds
Overall: 74.4669 seconds

Ruby 1.9

Test 001 -- String character swap, 2500000 times
Test 001: 9.7238 seconds
Overall: 14.9907 seconds


Test 002 -- String character swap (alternate), 2500000 times
Test 002: 8.4750 seconds
Overall: 13.9169 seconds


Test 003 and 004 -- Insert 500000 nodes into a priority queue using pop/push (003), plus Set calls (004)
Test 003: 23.5861 seconds
Test 004: 1.2312 seconds
Overall: 27.4714 seconds


Test 005 and 006 -- Insert 500000 nodes into a priority queue using pop/pushObjects (005), plus Set calls (006)
Test 005: 22.4412 seconds
Test 006: 1.1627 seconds
Overall: 23.1751 seconds

MacRuby

Test 001 -- String character swap, 2500000 times
Test 001: 27.9602 seconds
Overall: 36.8786 seconds


Test 002 -- String character swap (alternate), 2500000 times
Test 002: 24.1841 seconds
Overall: 32.5536 seconds


Test 003 and 004 -- Insert 500000 nodes into a priority queue using pop/push (003), plus Set calls (004)
Test 003: 22.8907 seconds
Test 004: 2.4607 seconds
Overall: 28.7985 seconds


Test 005 and 006 -- Insert 500000 nodes into a priority queue using pop/pushObjects (005), plus Set calls (006)
Test 005: 28.8811 seconds
Test 006: 2.5541 seconds
Overall: 27.3512 seconds

Changed 2 years ago by vincent.isambart@…

MacRuby's strings will be rewritten at least partially to fix lots of problems with the current implementation so it's no use benchmarking them yet.

Changed 2 years ago by jordan.breeding@…

Okay, I will keep it around just in case so I can test again after the changes. String appears to be the biggest loss of performance for me right now, so even though push/pushObjects/pop seem like they could be slightly faster (compared to 1.9) they aren't as far behind to begin with (I am assuming it is really Array which is slightly slower).

I will test again after general performance work has been done and String is reworked. Do you know if the String reworking is targeted for 0.5, or later?

Changed 2 years ago by lsansonetti@…

The new String is for later (0.6).

I didn't shark your swap benchmark but it creates many temporary objects (one String#dup, 2 String#[]) and our object allocator is currently slow, which could explain why it's slower than 1.9.

This should definitely be faster once our Strings are no longer pure CF objects, and can benefit of the thread local GC optimization.

Changed 20 months ago by martinlagardette@…

With the new strings:

ruby 1.9

Test 001 -- String character swap, 2500000 times
Test 001: 4.7923 seconds
Overall: 9.3636 seconds


Test 002 -- String character swap (alternate), 2500000 times
Test 002: 4.1403 seconds
Overall: 8.8592 seconds


Test 003 and 004 -- Insert 500000 nodes into a priority queue using pop/push (003), plus Set calls (004)
Test 003: 10.0445 seconds
Test 004: 0.4814 seconds
Overall: 13.3296 seconds


Test 005 and 006 -- Insert 500000 nodes into a priority queue using pop/pushObjects (005), plus Set calls (006)
Test 005: 10.9869 seconds
Test 006: 0.5723 seconds
Overall: 11.6866 seconds

MacRuby

Test 001 -- String character swap, 2500000 times
Test 001: 13.6413 seconds
Overall: 17.3031 seconds


Test 002 -- String character swap (alternate), 2500000 times
Test 002: 11.3426 seconds
Overall: 15.7529 seconds


Test 003 and 004 -- Insert 500000 nodes into a priority queue using pop/push (003), plus Set calls (004)
Test 003: 12.6970 seconds
Test 004: 1.1502 seconds
Overall: 15.9674 seconds


Test 005 and 006 -- Insert 500000 nodes into a priority queue using pop/pushObjects (005), plus Set calls (006)
Test 005: 16.4765 seconds
Test 006: 1.2515 seconds
Overall: 16.1109 seconds

Maybe still not as fast as ruby19, but still much faster than before :-)

Changed 14 months ago by lsansonetti@…

  • milestone set to MacRuby Later

Screening for Later.

Note: See TracTickets for help on using tickets.