Projects

Ticket #578 (new defect)

Opened 2 years ago

Last modified 8 months ago

Garbage Collector in 0.6

Reported by: jsn@… Owned by: lsansonetti@…
Priority: major Milestone: MacRuby 0.6
Component: MacRuby Keywords:
Cc: babs.devs@…

Description

Reporting this as a bug, since i'm fairly sure this is not the way it's intended to work.

Running the following code on a bunch of 600mb - 2gb files will (eventually) cause a SIGSEGV when memory allocation fails.

   def get_digest(file)
     digest = Digest::MD5.new()
     fil = File.open(file,'r')
     while((l = fil.read(READ_BUFFER_SIZE))!=nil)
          digest << l
    end
    fil.close()
    digest.hexdigest
   end

If I run it on a smaller set of files, 1.8gb to be exact, the memory allocated to the process jumps to 700-900mb, and just stays there. I let the process sit there for 30 mins, without noticing any drop in allocated memory.

I've tried with various sizes for READ_BUFFER_SIZE, from 16kb to 32mb, and also tried with File.read(), and it all behaves the same way, although the higher read buffer, the faster memory usage shoots up. setting "l=nil" for every read operation and/or "digest=nil" before returning doesn't seem to make a difference. Using Digest::MD5.file(fname).hexdigest results in "writing to non-bytestrings is not supported at this time."

I'm not sure if this is a "string cache" thing gone wrong, or it's simply not collecting garbage.

Change History

Changed 2 years ago by jsn@…

I've been debugging on this all day now, so here's a quick status update in case some bright mind goes "AHA" and fixes it :-)

It seems the GC is running just fine, but macruby holds on to way more data than it needs to. In the above example, the IO layer will :

*) Cache the entire file in the first call to io_read, regardless of READ_BUFFER_SIZE.
*) allocate (filesize / READ_BUFFER_SIZE) * ByteString objects.
Effectively doubling the amount of RAM needed to read a file.

All of the above objects live on long after the block has ended, and the file has been closed.

I've had a little bit of success adding the following code to the end of io_close, which is basically just a brute force attempt at getting the GC to do something to the cached data.

    if(io_struct->buf!=NULL){
	CFDataSetLength(io_struct->buf, 0);
	io_struct->buf = NULL;
	io_struct->buf_offset = 0;			
    }

Changed 16 months ago by babs.devs@…

  • cc babs.devs@… added

Cc Me!

Changed 8 months ago by watson1978@…

It look like does not crash with MacRuby latest. I tried with 900MB as READ_BUFFER_SIZE.

When I specified over 1GB, raise a exception which is "get_digest': given size `1073741824' is too big (ArgumentError)".

This exception is caused by following, so it seems to be correct.

io.c : line 1337

    if (size > 1000000000) {
	rb_raise(rb_eArgError, "given size `%ld' is too big", size);
    }
Note: See TracTickets for help on using tickets.