Basic concepts

LVM and JFS are not exactly new in the OS/2 world, they were first introduced
in OS/2 Warp Server for e-business, aka WSeB, about two years ago. But
only few lucky OS/2 users (including me) use WSeB on their home/office
machines. Thus the upcoming Serenity Systems' eComStation (eCS) will be
the first exposure to LVM/JFS for many OS/2 users. Many of the prospective
eCS owners have understandable concerns about these new concepts, sometimes
they perhaps even fear them. In this article I will try to explain the
basic concepts introduced by LVM and JFS and some of the logic behind them.

First off, I'll finally explain what those acronyms are: LVM stands
for Logical Volume Manager and JFS is a Journaled File System. Not much
clearer, is it? It will be later - I hope.

LVM and JFS didn't originate on OS/2. They were created for AIX, IBM's
high-end Unix clone running on IBM's RS/6000 hardware. For the users that
means that all the really nasty bugs were ironed out long ago.

The role of LVM is to present a simple logical view of underlying physical
storage space, ie. harddrive(s). LVM manages individual physical disks
- or to be more precise, the individual partitions
present on them (for a short glossary of terms, look at the end
of the article
). LVM hides the numbers, size and location of physical
partitions from users. Instead it presents the concept of logical volume.
A logical volume may correspond to a physical partition (but that obviously
almost defeats the purpose of LVM) but it doesn't have to. One volume may
be composed of several partitions located on multiple physical disks. Not
only that, the volumes can even be extended (not shrunk - people usually
want more space, not less). They can even be extended while the OS is running
and the filesystem is being accessed! Of course, most home and SOHO users
don't have the hardware required for this.

The more experienced readers are now probably wondering how 'traditional'
file systems like FAT or HPFS could be extended at runtime. The answer
is, they can't. To take full advantage of LVM, it is necessary to use a
filesystem designed for it. This file system is of course JFS. JFS is not
really tied to LVM, both LVM and JFS can exist separately, but only when
working in concert both can reach their full potential.

JFS volume structure

JFS is organized like a traditional Unix-ish file system, it presents a
logical view of files and directories linked together to form a tree-like
structure. This is the concept that spread from the Unix world pretty much
everywhere else and that we all know. I can only speculate about IBM's
motives for incorporating JFS into WSeB, but it has some obvious advantages
when compared to HPFS and HPFS386 (some shortcomings too). I see two significant
advantages:

  • capacity - JFS allows much larger file and volume sizes than HPFS. Basically
    JFS is a 64-bit file system while HPFS structures are at most 32 bits large.
  • recovery - thanks to the journaling techniques employed by JFS (described
    in more detail later), CHKDSK times for JFS are significantly faster
    than for equivalent HPFS volumes. Roughly speaking, where HPFS checkdisk
    after a crash takes minutes, JFS takes seconds.

JFS is created on top of a logical volume. To maintain information about
files and directories, it uses the following important internal structures:

  • the superblock
  • the i-nodes
  • the data blocks
  • the allocation groups

The superblock lies at the heart of JFS (and many other file systems).
It contains essential information such as size of file system, number of
blocks it contains or state of the file system (clean, dirty etc.).

The entire file system space is divided into logical blocks that
contain file or directory data. For JFS, the logical blocks are always
4096 bytes (4K) in size, but can be optionally subdivided into smaller
fragments (512, 1024 or 2048 bytes).

An i-node is a logical entity that contains information about
a file or directory. There is a 1:1 relationship between i-nodes and files/directories.
An i-node contains file type, access permissions, user/group ID (UID/GID
- unused on OS/2), access times and points to actual logical blocks where
file contents are stored. The maximum file size allowed in JFS is 2TB (HPFS
and FAT allow 2GB max). It should be noted that the number of i-nodes is
fixed. It is determined at file system creation (FORMAT) time and depends
on fragment size (which is user selectable). In theory users could run
out of i-nodes, meaning that they would be unable to create more files
even if there was enough free space. In practice this is extremely rare.

Fragments were already briefly mentioned in the discussion of
logical blocks. The JFS logical block size is fixed at 4K. This is a reasonable
default but it means that the file system cannot allocate less than 4K
for file storage. If a file system stores large amounts of small files
(< 2K), the disk space waste becomes significant. We've all got to know
and hate this problem from FAT (cluster size of 32K leads to massive waste
of space, in some cases over 50%). JFS attacks this by allowing fragmentation
of logical blocks into smaller units, as small as 512 bytes (this is sector
size on harddrives and it is not possible to read or write less than 512
bytes from/to disk). However users should be careful because fragmentation
incurs additional overhead and hence slows down disk access. I would recommend
using fragments smaller than 4K only when the users know for sure that
they will store very large amounts of small files on the file system.

The entire JFS volume space is subdivided into allocation groups. Each allocation group contains i-nodes and data blocks. This enables the
file system to store i-nodes and their associated data in physical proximity
(HPFS uses a very similar technique). The allocation group size varies
from 8MB to 64MB and depends on fragment size and number of fragments it
contains.

Journaling

As the name of JFS implies, journaling is a very important feature of this
file system. It should be noted that journaling is actually independent
of JFS's structure described above. The journaling technique has its roots
in database systems and it is employed to ensure maximum consistency of
the file system, hence minimizing the risk of data loss - a very important
feature for servers, but even home/SOHO users hate to lose data.

JFS uses a special log device to implement circular journal. On AIX,
several JFS volumes can share single log device. I'm not sure this is possible
on OS/2, I believe each JFS volume (corresponding to a drive letter) has
its own 'inline' log located inside the JFS volume - its size is 
selectable at FORMAT time.

It is important to note that JFS does not log (or journal) everything.
It only logs all changes to file system meta-data. Simply speaking,
the log contains a record of changes to everything in the file system except
actual file data, ie. changes to the superblock, i-nodes, directories and
allocation structures. It is clear that there must be some overhead here
and indeed, performance may suffer when applications are doing lots of
synchronous (uncached) I/O or creating and/or deleting many files in short
amount of time. The performance loss is however not noticeable in most
cases and is well worth the increased security.

The log (or journal) occupies a dedicated area on disk and is written
to immediately when any meta-data change occurs. When the disk becomes
idle, the actual file system structure is updated according to the log.
After a crash, all it usually takes to restore the file system to full
consistency is replaying the log, ie. performing the recorded transactions.
Of course, if a process was in the middle of writing a file when the system
crashed or power died, the file could be inconsistent (the app might not
be able to read it again),  but you will not lose this file nor other
files, as is often the case with other file systems.

OS/2 considerations

The above was mostly a generic description of LVM and JFS and applies to
both AIX and OS/2 and perhaps even to Linux (at least the JFS part). Now
I will discuss how exactly LVM/JFS differ from the solutions previously
available on OS/2.

LVM

From users' point of view LVM replaces FDISK. On WSeB, FDISK is no longer
available. In fact, if you try to run fdisk, you get the following message:

FDISK.COM has been replaced by LVM.EXE and FDISKPM.EXE has been
replaced by LVMGUI.CMD.  Please use one of these utilities.

It should be noted here that LVMGUI is a GUI app (as the name
implies) and requires Java, while LVM is a VIO app and can be
run from a command line boot. It looks and feels similar to FDISK,
but it presents two views: logical and physical. FDISK didn't differentiate between the two. These views corresponds to the concepts
described at the beginning of this article. Basically the physical view
shows physical disks and lets users manage partitions while logical view
presents volumes. One important concept must be introduced here, and that
is a compatibility volume.  A compatibility volume corresponds
to old FDISK partitions. During WSeB installation, the installer automatically
converts all existing partitions to compatibility volumes. This conversion
technically means that the installer writes a special block of LVM data
to the sector following the partition table. OSes other than WSeB won't
see any difference at all. It is however necessary to manage all partitions/volumes
exclusively with LVM after this conversion.

All FAT, HPFS, FAT32 etc. partitions can reside on either compatibility
or LVM volumes, however other OSes will only be able to access them on
compatibility volumes.  JFS on the other hand must be created on LVM
volumes
. Those were already described above and enjoy all the flexibility
of LVM, such as spanning multiple physical disks or online expansion.

Each volume, compatibility or LVM, represents a single drive letter
on an OS/2 system. LVM however is significantly more flexible than FDISK because the drive letters are not assigned by a fixed algorithm. Instead,
users can assign arbitrary drive letters to volumes. The drive letters
can even be changed at runtime, but users have to understand the dangers
before doing that. If you reassign the drive letter of the boot volume,
it doesn't require a genius to understand that a system crash will be the
most likely result.

JFS

OS/2 users often ask what exactly the difference is between the various
file systems available on OS/2. The following table, taken almost verbatim
from WSeB's Quick Beginnings book, summarizes the most important differences
between the file systems available for WSeB from IBM.

Characteristic Journaled File System (JFS) 386 High Performance File System (386HPFS) High Performance File System (HPFS) FAT File System
Max volume size 2TB (terabytes) 64GB (gigabytes) 64GB (gigabytes) 2GB (gigabytes)
Max file size 2TB (terabytes) 2GB (gigabytes) 2GB (gigabytes) 2GB (gigabytes)
Allows spaces and periods in file names Yes Yes Yes No (8.3 format)
Standard directory and file attributes Within file system Within file system Within file system Within file system
Extended Attributes (64KB text or binary data with keywords) Within file system Within file system Within file system In separate file
Max path length 260 characters 1) 260 characters 260 characters 64 characters
Bootable No 2) Yes Yes Yes
Allows dynamic volume expansion Yes No No No
Scales with SMP Yes No No No
Local security support No Yes No No
Average wasted space per file 256 to 2048 bytes 256 bytes 256 bytes 1/2 cluster (1KB to 16KB)
Allocation information for files Near each file in its i-node Near each file in its FNODE Near each file in its FNODE Centralized near volume beginning
Directory structure Sorted B+tree Sorted B-tree Sorted B-tree, must be searched exhaustively Unsorted linear
Directory location Close to files it contains Near seek center of volume Near seek center of volume Root directory at beginning of volume; others scattered
Write-behind (lazy write) Optional Optional Optional Optional
Maximum cache size Physical memory available Physical memory available 2MB 14MB
Caching program None (parameters set in CONFIG.SYS) CACHE386.EXE CACHE.EXE None (parameters set in CONFIG.SYS)
LAN Server access control lists Within file system Within file system In separate file (NET.ACC) In separate file

1) JFS stores file and directory names in Unicode. This allows
JFS to always maintain proper sort order, regardless of active codepage.

2) This is not a permanent limitation. Only no one wrote a
JFS micro- and mini-IFS yet.

It might perhaps interest some users that JFS also seems to have built-in
support for DASD limits. I have however never tried
to use this feature. DASD limits, aka Directory Limits feature of LAN Server
allows administrators to control how much space a directory can take, effectively
enabling them to limit disk space usage of users. Previously this feature
only worked on HPFS386 volumes. Obviously this is of no use to home users
who have all their disk space for themselves but it can be very useful
for system administrators.

JFS Utilities

WSeB comes with several new JFS-specific utilities, in addition to the
usual ones like CHKDSK and FORMAT. I'll only give a quick
overview of them here, the important ones are documented in the Command
Reference.

  • DEFRAGFS - can be used to defragment and reorganize a JFS volume.
    It is similar in spirit to equivalent FAT or HPFS utilities. It should
    be noted that just like HPFS, JFS tries not to fragment files. However
    especially on nearly full volumes, this is not always possible. In addition
    to defragmenting files, DEFRAGFS will try to rearrange internal
    JFS structures by placing certain pieces of data physically close to each
    other to speed up disk access. DEFRAGFS is designed to be run
    in the background.
  • EXTENDFS - after enlarging a LVM volume, this utility must be
    used to tell the JFS file system that it should take up all the extra space
    now available.
  • CACHEJFS - not documented in Command Reference, this utility can
    be used to query the settings of the JFS cache and set its lazy writer
    parameters.
  • CHKLGJFS - again undocumented. This is a diagnostic tool and will
    show a formatted log of the last (or one before last) checkdisk process.
    Not very useful to normal users.

In addition to the above utilities that are supplied with WSeB, I also
managed to build several extra utilities from the OpenJFS sources thanks
to invaluable help from several friends. Those are not available publicly
in binary form to my knowledge, though I could probably e-mail them to
interested readers - but beware, these are for experts only and not guaranteed
to work!

  • LOGDUMP - as the name suggests, this tool dumps formatted contents
    of the current JFS log (journal) to a file.
  • CSTATS - lists current statistics of the JFS cache.
  • XPEEK - perhaps the most useful of the bunch, this one is the
    closest thing to a JFS disk editor I've seen. This utility lets users dump
    and optionally modify various internal JFS structures. It has a very crude
    interface but it worked for me. Needless to say, this utility is extremely
    dangerous and you can easily destroy your data if you don't know exactly
    what you're doing.

Conclusion

I have deliberately skipped some of the more advanced and less widely used
LVM/JFS concepts. Interested readers will find more in the books and files
I listed in the reference section. I hope I managed
to present the features and benefits of LVM and JFS in a clear and concise
manner. I believe these two pieces of software brought/will bring new levels
of flexibility, manageability and reliability to WSeB and shortly all eCS
users. Don't be afraid of them!

Parting note: Everything said here about WSeB will equally apply to
eCS.

Glossary of Terms:

  • Partition - a portion of physical hard disk
    space. A hard disk may contain one or more partitions. Partitions are defined
    by PC BIOS and described by partition tables stored on a harddrive. Every
    PC OS understands partitions.
  • Volume - a logical concept which hides the
    physical organization of storage space. A compatibility volume directly
    corresponds to a partition while LVM volume may span more than one partition
    on one or more physical disks. A volume is seen by users as a single drive
    letter. Only WSeB and eCS understand LVM volumes.
  • DASD - Direct Access Storage Device. A term often
    used by IBM instead of 'hard disk' to confuse mere mortals.