Logo 
Search:

Java Forum

Ask Question   UnAnswered
Home » Forum » Java       RSS Feeds

Article on Garbage Collection

  Date: Nov 24    Category: Java    Views: 514
  

Garbage collection is a very key part of Java . Before reading this article
i only knew that Garbage collector was a low level thread that clears the
memory after the object is no longer needed. This is a very good Article
and is taken from the book Thinking in Java .. which you can find in stores
very easly ...

Level : Intermediate

How a garbage collector works
If you come from a programming language where allocating objects on the
heap is expensive, you may naturally assume that Java's scheme of
allocating everything (except primitives) on the heap is expensive.
However, it turns out that the garbage collector can have a significant
impact on increasing the speed of object creation. This might sound a bit
odd at first?that storage release affects storage allocation?but it's the
way some JVMs work and it means that allocating storage for heap objects in
Java can be nearly as fast as creating storage on the stack in other
languages.
For example, you can think of the C++ heap as a yard where each object
stakes out its own piece of turf. This real estate can become abandoned
sometime later and must be reused. In some JVMs, the Java heap is quite
different; it's more like a conveyor belt that moves forward every time you
allocate a new object. This means that object storage allocation is
remarkably rapid. The "heap pointer" is simply moved forward into virgin
territory, so it's effectively the same as C++'s stack allocation. (Of
course, there's a little extra overhead for bookkeeping but it's nothing
like searching for storage.)
Now you might observe that the heap isn't in fact a conveyor belt, and if
you treat it that way you'll eventually start paging memory a lot (which is
a big performance hit) and later run out. The trick is that the garbage
collector steps in and while it collects the garbage it compacts all the
objects in the heap so that you've effectively moved the "heap pointer"
closer to the beginning of the conveyor belt and further away from a page
fault. The garbage collector rearranges things and makes it possible for
the high-speed, infinite-free-heap model to be used while allocating
storage. To understand how this works, you need to get a little better idea
of the way the different garbage collector (GC) schemes work. A simple but
slow GC technique is reference counting. This means that each object
contains a reference counter, and every time a reference is attached to an
object the reference count is increased. Every time a reference goes out of
scope or is set to null, the reference count is decreased. Thus, managing
reference counts is a small but constant overhead that happens throughout
the lifetime of your program. The garbage collector moves through the
entire
list of objects and when it finds one with a reference count of zero it
releases that storage. The one drawback is that if objects circularly refer
to each other they can have nonzero reference counts while still being
garbage. Locating such self-referential groups requires significant extra
work for the garbage collector. Reference counting is commonly used to
explain one kind of garbage collection but it doesn't seem to be used in
any JVM implementations. In faster schemes, garbage collection is not based
on reference counting. Instead, it is based on the idea that any nondead
object must ultimately be traceable back to a reference that lives either
on the stack or in static storage. The chain might go through several
layers of objects. Thus, if you start in the stack and the static storage
area and walk through all the
references you'll find all the live objects. For each reference that you
find, you must trace into the object that it points to and then follow all
the references in that object, tracing into the objects they point to,
etc., until you've moved through the entire web that originated with the
reference on the stack or in static storage. Each object that you move
through must still
be alive. Note that there is no problem with detached self-referential
groups?these are simply not found, and are therefore automatically garbage.
In the approach described here, the JVM uses an adaptive garbage-collection
scheme, and what it does with the live objects that it locates depends on
the variant currently being used. One of these variants is stop-and-copy.
This means that?for reasons that will become apparent the program is first
stopped (this is not a background collection scheme).
Then, each live object that is found is copied from one heap to another,
leaving behind all the garbage. In addition, as the objects are copied into
the new heap they are packed end-to-end, thus compacting the new heap (and
allowing new storage to simply be reeled off the end as previously
described). Of course, when an object is moved from one place to another,
all references that point at (i.e., that reference) the object must be
changed. The reference that goes from the heap or the static storage area
to the object can be changed right away, but there can be other references
pointing to this object that will be encountered later during the walk.
These are fixed up as they are found (you could imagine a table that maps
old addresses to new ones). There are two issues that make these so-called
"copy collectors inefficient. The first is the idea that you have two heaps
and you slosh allthe memory back and forth between these two separate
heaps, maintaining twice as much memory as you actually need. Some JVMs
deal with this by allocating the heap in chunks as needed and simply
copying from one chunk to another. The second issue is the copying. Once
your program becomes stable it might be generating little or no garbage.
Despite that, a copy collector will still copy all the memory from one
place to another, which is wasteful. To prevent this, some JVMs detect that
no new garbage is being generated and switch to a different scheme (this is
the adaptive part). This other scheme is called mark and sweep, and it's
what earlier versions of Sun's JVM used all the time. For general use, mark
and sweep is fairly slow, but when you know you're generating little or no
garbage it's fast. Mark and sweep follows the same logic of starting from
the stack and
static storage and tracing through all the references to find live objects.
However, each time it finds a live object that object is marked by setting
a flag in it, but the object isn't collected yet. Only when the marking
process is finished does the sweep occur. During the sweep, the dead
objects are released. However, no copying happens, so if the collector
chooses to compact a fragmented heap it does so by shuffling objects
around. The stop-and-copy refers to the idea that this type of garbage
collection is not done in the background; instead, the program is stopped
while the GC occurs. In the Sun literature you'll find many references to
garbage collection as a low-priority background process, but it turns out
that the GC was not implemented that way, at least in earlier versions of
the Sun JVM. Instead, the Sun garbage collector ran when memory got low. In
addition, mark-and-sweep requires that the program be stopped. As
previously mentioned, in the JVM described here memory is allocated in big
blocks. If you allocate a large object, it gets its own block. Strict
stop-and-copy requires copying every live object from the source heap to a
new heap before you could free the old one, which translates to lots of
memory. With blocks, the GC can typically use dead blocks to copy objects
to as it collects. Each block has a generation count to keep track of
whether it's alive. In the normal case, only the blocks created since the
last GC are compacted; all other blocks get their generation count bumped
if they have been referenced from somewhere. This handles the normal case
of lots of short-lived temporary objects. Periodically, a full sweep is
made?large objects are still not copied (just get their generation count
bumped) and blocks containing small objects are copied and compacted. The
JVM monitors the efficiency of GC and if it becomes a waste of time
because all objects are long-lived then it switches to mark-and-sweep.
Similarly, the JVM keeps track of how successful mark-and-sweep is, and if
the heap starts to become fragmented it switches back to stop-and-copy.
This is where the "adaptive" part comes in, so you end up with a mouthful:
"adaptive generational stop-and-copy mark-and-sweep."
There are a number of additional speedups possible in a JVM. An especially
important one involves the operation of the loader and Just-In-Time (JIT)
compiler. When a class must be loaded (typically, the first time you want
to create an object of that class), the .class file is located and the byte
codes for that class are brought into memory. At this point, one
approach is to simply JIT all the code, but this has two drawbacks: it
takes a little more time, which, compounded throughout the life of the
program, can add up; and it increases the size of the executable (byte
codes are
significantly more compact than expanded JIT code) and this might cause
paging, which definitely slows down a program. An alternative approach is
lazy evaluation, which means that the code is not JIT compiled until
necessary. Thus, code that never gets executed might never get JIT
compiled.

Share: 

 

No Answers Found. Be the First, To Post Answer.

 
Didn't find what you were looking for? Find more on Article on Garbage Collection Or get search suggestion and latest updates.




Tagged: