Tips for Creating CD-ROMs to Distribute Web-Oriented Material
Maintained by David E. Bernholdt, <bernhold@home.com>
Last update $Date: 2001/06/21 16:16:22 $
Disclaimer: This information comes from a combination of
personal experience, research (mostly using the web), and discussions
with others. Despite my best efforts, information here may be
incomplete or inaccurate. Please use it with caution.
Introduction
These guidelines are intended to help people successfully produce
CD-ROMs as a means to distribute primarily web-oriented material. Some
issues arise because of the use of CD-ROMs, others arise because the
content is web-oriented, or from a combination of both factors. These
recommendations can be applied at any stange of the development of the
content, but it will become clear that in many cases, they are most
easily applied from the very beginning.
These guidelines are the result of our experience creating the first
edition of the "Overview of Computational Science: HPCC Technology
and Applications" CD-ROM for the CEWES MSRC PET Program. This CD-ROM
contains more than 300 MB of educational material, standards documents,
technical reports, and two online books.
These guidelines were developed in a particular software/hardware
environment, and it is possible they are influcenced by our particular
environment (in which case we would be very interested to hear about
other experience). The web material, originally produced on a variety
of platforms (but probably mostly on unix or Windows machines) was
collected on a unix filesystem. An HTTP server was setup to point to
the filesystem for convenience during development and testing. To
burn discs, files were transfered to a Windows NT filesystem, and
burners were used on machines running either Windows NT or 95. We
used
Adaptec
DirectCD and Corel (now Adaptec) CD
Creator to burn discs. Actually, after some initial
experiments with DirectCD, we did most burning with CD
Creator. It may be that we didn't delve deep enough, but CD
Creator seemed to offer more control over the type of filesystem
being written. (DirectCD seemed to be ISO9660 Level 3 only.)
It also seemed to us that CD Creator was significantly faster
in actually marshalling and burning files.
I hope you will find this information useful. I would appreciate
feedback on this material, as well as help filling in some of the
holes that obviously remain.
Filesystem and Cross-Platform Issues
There are a number of things to watch out for which stem from
distinctive characteristics of the CD-ROM filesystems available, or
the need for the CD-ROM to work correctly in a multi-platform
environment. Though some observations below may seem redundant
(i.e. "Avoid Mixed-Case Filenames" and "Use Only
Lowercase Filenames"), they are presented separately because they
arise from different sources which may not be relevant to all
situations. For more information the various CD-ROM filesystems,
please see the appendix "A Primer on CD-ROM
Filesystems".
- Use Only ISO 9660 Level 1 or ISO 9660 Level 2 Filesystems For Portability
- These filesystems should work on (nearly) all platforms. ISO 9660 Level
2, which may be refered to simply as "long filenames" by the
CD burner software, will not work with MS-DOS or Windows 3.11. For
more information, see the appendix "A Primer on CD-ROM Filesystems".
- Limit Length and Character Set of File Names
- For ISO 9660 Level 1 filesystems, limit filenames to 8.3 format,
from the characterset [A-Z0-9_].
- For ISO 9660 Level 2 filesystems, use no more than 30 character
filenames from the character set []. When long filenames are selected in
Corel (now Adaptec) CD Creator, it warns that filenames must still be unique
at the 8.3 level or discs/files might not be readable on some Windows 95
systems. We have checked this on a number of Windows 95 machines of varying
ages and found that in practice, it does not appear to be a problem.
- In general use only one period (".") in a filename.
The CD burner software we have worked with seems to truncate longer filenames,
and it is possible for long filenames to map on top of each other.
- Avoid Mixed-Case Filenames
- Use of mixed-case filenames in source material increases the chances
that multiple filenames will map on top of each other on the CD-ROM. The
easiest way to avoid this problem is use only a single case for filenames
from the beginning.
- Use Only Lowercase Filenames in URLs
- This requirement is driven by the various operating systems' interpretations
of CD-ROM file names. Since unix is case sensitive and generally maps CD-ROM filenames
to lowercase, this is the way to go for portability. Macintosh and Microsoft
systems treat CD-ROM filenames in a case insensitive fashion, so this works
across the board. Of course if you are developing the CD-ROM material on
a unix system, this means you should also make the filenames themselves
entirely lowercase.
- Use Only Relative URLs
- Each operating system has a different way of referring to the root
of the CD-ROM filesystem. Microsoft systems may refer to D:\ or
some other drive letter, while Macintosh uses the CD-ROMs volume name (set
when the disc is burned) at the head of the path (or Untitled if
the disc has no volume name). On unix systems, the system administrator
typically sets the mount point, but /CDROM is one common example.
Consequently, it is impossible to reproduce the server-oriented approach
of referring to the server's document root, as in /icons/pic.gif.
Instead, all local URLs must be relative to the current location, such
as ../../icons/pic.gif.
- Avoid "//" in Path Part of URLs
- Two slashes ("//") instead of one ("/") in the
resource path part of a URL is a common error in web content -- especially
when URLs are constructed or processed mechanically. In general, Windows
and unix platforms will treat a double slash as a single slash, however
this is not the case with the Macintosh. Errors of this type should be
detected by careful use of the grep command, or by running a link check
on a Macintosh (however, see general cautions below on the use of
link checkers).
- Use only "/" (vs. "\" or ":") in
URLs
- Different operating systems use different symbols to separate the
terms in a file path (unix: "/", MS-DOS and Windows:
"\", Macintosh: ":"). RFC 1630 designates the
forward slash, "/", as the correct symbol to use in all
URLs. It is possible that some browser implementations on some
platforms will accept certain other separators, but this cannot be
relied upon. The Netscape browser under unix, to cite a specific
example, will recognize only the forward slash as a separator.
- Line Termination in Text Files Does Not Seem to Matter
- Operating systems terminate lines variously with CR, LF, or
CR-LF. In our experience, browsers seem to handle all cases
regardless of platform. Developer convenience may, of course, be
another matter.
-
Web Browser-Related Issues
Using a web browser to access material from a filesystem is not always
the same as accessing it through an HTTP server.
- Make index.html Explicit
- HTTP servers typically append index.html to URLs which end in
a directory rather than a file (i.e. http://host/directory or http://host/directory/).
When used on a filesystem, browsers will instead produce a directory listing,
thus exposing the user to all of the files in the directory, and losing
the desired link. Consequently, all URLs which refer to the CD-ROM should
end with an explicit file name, with index.html being the usual
default. Note that this problem is hard to detect with a link checker or
with basic tools like grep.
Content Issues
Some of these observations apply to network distribution as well as
CD-ROM distribution, but others are unique to the fact that you're using
a filesystem.
- JavaScript
- Most link checkers and other tools do not deal with JavaScript, which
may contain URLs, so it is easy to miss problems which may crop up in placing
JavaScript-containing material on a CD-ROM. Separate JavaScript source
files may also present problems, but we do not have any direct
experience yet with these.
- Java
- We have not dealt with Java so far, but it seems the the
convention of mixed-case naming of classes is likely to be
problematic. One contributor recommends the use of jar files:
"I have a document http://www.npac.syr.edu/projects/k12javaspring98/exercises/graphics.html
which loads numerous applets onto a single page. This page has always
taken a long time to load. As an experiment, I packaged all of the
.class files together in a single .jar file with the command jar
cvf graphics.jar *.class and added the attribute
ARCHIVE="graphics.jar" to each <APPLET> tag in the HTML
document. The results were mildly surprising: the page loaded
noticeably faster and, of course, you're happier since the file
graphics.jar now meets your (stringent) naming standards."
Java is also likely to share problems with JavaScript in terms of
passing URLs -- link checkers and other software probably won't catch
them! Further input on handling Java would be very welcome, is its
importance is sure to increase.
- Graphics File Formats
- GIF and JPEG formats seem to be widely implemented. XBM files were
also read by both Netscape Communicator 4 and MS Internet Explorer 4.
- Use PDF Rather Than PostScript
- Relatively few PC users have access to PostScript printers or
on-line viewers. PDF files can be viewed online and printed using
Adobe's freely available Acrobat Reader software, which is available
for Mac, PC, and many unix platforms. PDF files are also generally a
good deal smaller than their PostScript counterparts. Existing
PostScript files can be "distilled" into PDF format by the
Distiller component of the complete Adobe Acrobat package, or
generated directly from most application through special
"printer" drivers (on Mac and Windows platforms). The
complete Acrobat package is not free, but is (at this writing)
well under $200 street price, and much lower than that with the
academic discount. The complete Acrobat package is available for Mac,
Windows, and several unix platforms. There is also a web service that
converts many file formats, including PS to PDF, at http://tom.cs.cmu.edu/intro.html.
- Use RealAudio/RealVideo/RealPlayer for Audio and Video
- There are a variety of ways to present audio and video content on the
web. Portability of some formats (i.e. WAVaudio players and QuickTime video
players appear to be available only for Mac and Windows platforms) is
a concern, but a substantial portion of this particular issue is simply
the need to decide on one format for all content-contributors on a project
to use.
Tools Issues
These are things we found useful, or "gotchas" we discovered.
- Most Link Checkers Aren't Designed for Filesystems
- Even if a link checker operates on a filesystem rather than actually
accessing the HTTP server (as most seem to, for speed), they generally
will not check for the above problems. On the other hand, they will catch
a lot of basic problems and should definitely be used. Just be aware of
their limitations.
- Beware of Transferring tar Files
- Our development work took place under unix, and the CD burner we used
was on a Windows 95 system. We tried to transfer the entire tree by taring
it up, FTPing it to the PC (using binary mode), and using WinZip to untar
it. Unfortunately, this seems to have corrupted some PDF files and images.
Our guess is that WinZip was trying to be helpful by converting end-of-line
characters from unix to PC norms, but this is not necessarily the right
thing to do for all files.
Useful Resources
Acknowledgements
This document was compiled with help from John Eberle, Deepak
Ramanathan, and Tom Scavo.
Appendix: A Primer on CD-ROM Filesystems
- ISO 9660 Level 1
- In practice, nearly all CD-ROMs produced use the ISO 9660 standard
filesystem (also known as High Sierra). CD-ROMs produced to this standard
are readable on nearly all modern computers. In order to achieve this portability,
the ISO 9660 filesystem is designed for the lowest common denominator system,
which (at the time of the standard) was MS-DOS. As a result, the basic
ISO 9660 filesystem allows names in "8.3" format (8 character
filename, ".", 3 character extension) with the charcters [A-Z0-9_].
Note that only uppercase letters are allowed, and only one period.
Also, only 8 levels of directories are allowed. explain how measured.
CD-ROMs written in this format are reported to be readable by all
"interesting" platforms: MS-DOS, Windows3.11, Windows95,
WindowsNT, Macintosh, and unix.
Not surprisingly, given the restrictive nature of the basic ISO 9660
filesystem (ISO 9660 Level 1, though it seems rarely to be referred to
in that way), a number of extensions have been developed. Also not surprisingly,
the extensions do not seem to offer the same breadth of implementation
as ISO 9660 Level 1.
- ISO 9660 Level 2
- ISO 9660 Level 2 offers longer filenames (32 characters total,
but two are taken up by the file version number (see below)), and more
freedom in the character set ([A-Z][0-9]_- ???). Only one
period may appear in a filename. directory depth??? Empirical
evidence indicates ISO 9660 Level 2 filesystems can be read by Windows95,
WindowsNT, Macintosh, and unix platforms. On MS-DOS and Windows3.11 systems,
we expect that an ISO 9660 Level 2 disc should either appear with 8.3-style
names, or be unreadable.
- ISO 9660 Level 3
- There is an ISO 9660 Level 3, but as far as I can tell, it is
primarily about how the CD-R is written ("packetizing"), and
the filename and directory depth limitations are the same as Level 2. It
is not clear, however, if Level 3 discs can be read on "all"
platforms.
- Rockridge Extensions
- The Rockridge Extensions to ISO 9660 were created to allow unix platforms
to capture file permissions and longer POSIX-style filenames. Directory
depth is ???. The Rockridge extensions are widely implemented
on unix platforms, but not elsewhere.
- Joliet Filesystem
- The Joliet filesystem is a Microsoft extension which allows
???, and of course is only implemented on recent Windows
platforms, though there is a Linux kernel patch that apparently supports
the Joliet filesystem. Windows3.11 and MS-DOS systems supposedly see the
truncated 8.3 name of the same form as if they were reading a Windows95
filesystem (what is proper name?)
- Hierarchical File System
- HFS is the Macintosh filesystem, and can be written on CD-ROM as
well. It allows 31 character filenames, mixed case, and a larger character
set.
Mixing Multiple Standards/Extensions
From my reading HFS can be combined with ISO 9660-based
filesystems by writing a "hybrid" disc, with separate partitions
(tracks???) for each. Obviously this cuts the space available
roughly in half.
It also appears that Rockridge and Joliet extensions can
be combined on the same disc, in this case without a price in storage capacity.
From my reading, this fill not be useful on WindowsNT 3.51, nor
Windows 3.11 or MS-DOS (of course). It is not clear if Macintosh widely
support either of these two extensions.
Operating System Treatment of CD-ROM Filesystems
One must also consider the fact that different operating systems
may treat CD-ROM filesystems differently. All current Microsoft
operating systems (MS-DOS, Windows3.11, Windows95, WindowsNT), as
well as MacOS are case insensitive. In other words the case of
the filename or any reference to it (i.e. in an operating system
command or HTML document) is irrelevant. By contrast, unix
filesystems are case sensitive, and most unix implementations
of the ISO 9660 filesystem either map all filenames to
lowercase, or offer some control over the mapping at the system
level (root access typically required to change).
File Versioning
The ISO 9660 filesystem provides for (really requires) that every
filename also have a version number. These are typically represented as
";1" (or another integer) appended to the filename (in the style
of VMS). Most ISO 9660 implementations on interesting platforms appear
to more or less ignore the file version information. Microsoft systems
don't display it at all; Macintoshes display it in directory listings,
but otherwise seem to ignore it; unix systems either ignore it or offer
some control over the behavior. It appears that from the user (or CD-ROM
producer) point of view, ISO 9660 file versions are irrelevant and can
be safely ignored. Note, however, that in ISO 9660 Level 2, two of the
32 characters allowed for the filename are reserved for the version identifier,
but the user still need not worry about the version number.