modulo optimization | PCs are fast enough | up
From: Joe Keane <jgk@jgk.org>
Subject: Re: garbage collection
Date: 1997/06/15
Message-ID: <5o24nf$rcm@shellx.best.com>
Newsgroups: comp.sys.super,comp.arch,comp.lang.misc
In article <5npmkv$p5b$1@joe.rice.edu>
Preston Briggs <preston@cs.rice.edu> writes:
>GC isn't intended to allow us to loosen up and program sloppily;
Yes it is.
>it's supposed to allow us to accomplish otherwise tedious tasks quickly
>and concisely.
There are better ways.
In article <paik-1306972211500001@192.168.13.2>
"Samuel S. Paik" <paik@webnexus.com> writes:
>That is, reference counting is expensive in cpu cycles but has very
>good locality of reference.
It's not that expensive, and improving locality is worth it.
People constantly miss this point.
Uniform memory access may be the rule on old C-64s or Crays, but it's
not true for any modern machine, even if people try to ignore this.
Good memory allocation should be reasonably fast in terms of CPU cycles,
but the most important goal is to make good use of memory: application
data should have good locality and it should not take up too much space.
It's not worth saving a few cycles in allocation at the cost of wasting
space or scattering data all over the place. It's a bad trade-off.
For example, stack allocation is very nice when it works. One thing
good is that it's fast: adding a fixed-size object to the stack frame
takes no time, and calling `alloca' for a buffer should take a cycle.
But the more important thing is that putting data on the stack gives
near optimal locality; caches and VM work about as well as possible.
That is more important than saving a few cycles.
Conversely, consider the usual `optimization' of keeping a free list of
unused objects. It's also fast, just a couple instructions to add or
remove something from the list. It works well at first. But it doesn't
consider locality, so after a while you end up with a list of randomized
locations. It also wastes space. It's a bad trade-off, don't do it.
Unfortunately, many people don't understand enough to see the problem.
After all, the allocation is always very fast; it's everything else in
the application that becomes slow. It's not so obvious to many people
that bad memory allocation is causing the problems.
This is why, when garbage-collection fans brag about how they can scan a
million zillion words in one instruction, they're missing the point.
>Why else are many garbage collectors proposed for distributed systems
>use reference counts for inter-system references?
That's not garbage collection, it's creating and deleting objects at the
right time. In a distributed system, this is critical, not just sort of
nice, when there are many users updating things and running queries.
In most cases, it's quite easy to see when you should create and delete
objects, if you think about it instead of ignoring it. In other cases,
reference counting may be a useful technique. Of course, it's better to
have some basic classes that handle such details and build from there.
In article <5lsoylqdmg.fsf@tequila.systemsz.cs.yale.edu>
Stefan Monnier <monnier+/news/comp/arch@tequila.cs.yale.edu> writes:
>No, the reason is that "garbage" is a global property for a tracing GC,
>whereas it is a local property for a reference-counting GC.
To be blunt, garbage collection just doesn't work for really large,
distributed systems.
Show me a distributed system that uses garbage collection and i'll show
you a fairly small system. Maybe it works well with 100 users and 50 GB
of data on one machine, but what about when you have something a hundred
times as big running on hundreds of machines? How will it work then?
>Note that you can move the ref-count from the object to the pointer
>itself to get rid of (or at least reduce) the problem.
Or you can use an intermediate object to hold the count. Maybe this
works better, maybe not. The point is, these are different ways to
solve the problem. This is different from not solving the problem.
--
Joe Keane, amateur mathematician