Class Collector

  extended by com.faunos.skwish.sys.mgr.Collector

public class Collector
extends Object

Responsible for maintaining a consistent view of committed segments.

The collector maintains a union of segments on the file system that make up the logical whole segment. At any point in time, a subset of the segments under the "committed" directory comprise the segment data. We'll call these the live segments. For any given [entry] ID, at most one live segment contains the corresponding entry [data]. That is, the live segments do not contain any overlapping IDs. Here's an illustration:

The horizontal axis represents entry IDs, and each block of X's represents a segment spanning a range of IDs on the file system. The live segments, together with the union of all [file-backed] delete-sets comprise the complete logical segment.

Typically, there are other segments under the committed directory that don't contribute to the data model. These are segments that have already been appended to another segment and have consequently left the model (see below). We call a segment on the file system that no longer contributes to the model a dead segment.

Segments and delete-set files are maintained in unit directories (UnitDir). Each unit may contain backing files for either a segment, or a delete-set, or both. Entry IDs in file-backed delete-sets are periodically applied to (deleted from) the (live) segments. If all the IDs in a file-backed delete-set have already been applied to the live segments, then the given delete-set is considered dead; otherwise, it is considered live.

At any point in time, a unit directory may contain any combination of live, dead, or no segment, and live, dead, or no delete-set. That is a unit can even be empty. The collector periodically purges (deletes) dead segment and dead delete-set files, and if empty, the unit directory itself.

Operational Invariants


The collector maintains the following lifecycles for segments, delete-sets, and the unit directories they live in. Segments are merged and delete-sets are processed asynchronously. So it's worthwhile to layout some lifecycle milestones.

Segment lifecycle

  1. Added.

    When a new segment (and it parent unit directory) is first added, it contains only the new entries inserted in the committed transaction. Its base ID is at the next ID property of the logical union as it was exactly prior to the commit.

    The segment is live. (It's in the data model.) DeleteSets may be applied to it.

  2. Appended (optional)

    Once a segment is added, it may also be appended to. Segments are periodically merged by appending one (the source) to another (the destination). This segment state corresponds to that of the destination segment after it has been appended to.

    The segment continues to be live. (It's in the data model.) DeleteSets may be applied to it--even while the merge is underway.

  3. Overridden (a.k.a. covered)

    This segment state corresponds to that of the source segment after it has been the argument of an append. After the merge has completed, the source segment is no longer in the model.

    The segment is logically dead, even though its backing files still exist. DeleteSets are not applied to source segments while the merge is underway.

  4. Zombie (optional)

    The just overridden segment is logically dead, but its backing files are still in use by one or more clients. Specifically, UnitDir.Seg.getUnitDir().getUsageCount() is greater than zero.

    Note that once the segment store is closed, all zombie segments become dead the next time the store is loaded.

  5. Dead

    The overridden segment is dead, but its backing files have yet to be purged.

  6. Purged

    The overridden segment is dead, and its backing files have been purged.

Delete-set lifecycle

  1. Added

    The delete-set file was added as part of a committed transaction in which pre-existing entries were deleted.

  2. Dead

    The entry IDs listed in the delete-set have been deleted in (applied to) the relevant segments. The delete-set is now redundant, but its backing file still exists.

  3. Purged

    The applied delete-set's backing file has been deleted.

Unit directory lifecycle

  1. Added

    Under the collector, a unit directory begins life by being moved to the "committed" directory. The directory initially contains only the transaction data being committed. That is it will either contain a segment of new entries, a delete-set of pre-existing entries, or both.

    Once added, the segment and/or delete-set in the unit follow the course of their own lifecycles (as described above), until..

  2. Purgeable

    Both the segment and delete-set, if any, have been purged. I.e. there are no files in the unit directory, and it is eligible for purging.

  3. Purged

    The directory has been removed.

Babak Farhang
See Also:
An illustration of how the system determines which segments are live.

Constructor Summary
Collector(File committedDir)
Collector(File committedDir, boolean create)
Method Summary
 void close()
 void commitUnit(UnitDir unit)
 File getDirectory()
 Segment getReadOnlySnapShot()
          Returns a read-only snap shot of the committed segments.
 Segment getReadOnlyView()
          Returns a read-only view of the committed segments.
 int nextUnitId()
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public Collector(File committedDir)
          throws IOException


public Collector(File committedDir,
                 boolean create)
          throws IOException
Method Detail


public int nextUnitId()


public File getDirectory()


public Segment getReadOnlyView()
                        throws IOException
Returns a read-only view of the committed segments.



public Segment getReadOnlySnapShot()
                            throws IOException
Returns a read-only snap shot of the committed segments.

Note this is not a true snap shot; new deletes may occasionally become visible. What's frozen is the number of entries; new insertions are not seen by this view.



public void commitUnit(UnitDir unit)
                throws IOException


public void close()
           throws IOException
IOException Logo