copyright ©David Ness, 8 November 2001
[Draft
#3 --- 2003/01/12 1534 UTC]
11/8/2001 3:34 AM
The purpose of this note is to begin discussion of some of the problems associated with presenting information in several different visual forms. The note introduces a concept called Super-Literate Programming, an extension of the Literate Programming ideas advanced by Knuth and many others.
The problem considered here has both specific and general characteristics. The general aspects of the problem relate to any programming language. In addition there are some particular characteristics of J that present some special problems.
In addition, there are at least two distinct problem domains. One is a rather broad category of problems that people use J to discuss. J is, for example, used to develop course material in several different problem areas that could be called computational mathematics. In these domains we are probably particularly concerned with the way that J output might be displayed.
In addition, there is the problem of presenting J code itself, both for expository and documentation purposes. Here we have the problem of how to present code. We are probably less concerned with J output presentation and more concerned with how J syntax is presented.
Increasingly, these days, we want to present information not only on paper, but also on a computer screen. The problem is that this is a more complicated proposition than it might appear to be at first glance.
The complications have to do with two principal causes:
These each impose some some burdens on a design which tries to let us accomplish many different objectives without a lot of extra work.
It may not be conventional to think of paper and screens in terms of bandwidth, but it is instructive. A typical page of typescript is about 100 square inches, and each square inch, printed on a medium grade laser printer, will contain about 100,000 bits---assuming that we are only concerned with black and white images (no gray-scale). This is about 10,000,000 pixels.
By contrast, a typical computer screen is about the same size but only has about 1/4 as many pixels in each dimensions. And, in addition, there are also some contexts where very small computer screens are used to display information. Palm devices, for example, have screens that are only 160x160 pixels (about 25,000 or 1/4 of 1% of a full printed page).
Differences of this magnitude imply more than just a change in degree. They actually imply a change in kind. And yet we would like to be able to have our information presented in many different ways without having to make a special adaptation for each different presentation medium.
Tutorials are sometimes best presented in action. And computers afford us the opportunity to manage this interaction in an effective and instructive way. However, even in circumstances where we have access to interactive tutorial technology, we may still want to be able to make some static presentations. And we would like to be able to construct paper or static screen representations without having to do a massive amount of hand adaptation.
Documents of any type may be accessed in other than a front-to-back fashion. While such random or perhaps better non-sequential access may be rare in novels and other prose, it is not uncommon in tutorial circumstances.
While the potential of non-sequential access may have some influence on how we expose some particular set of issues in a document, it becomes a particularly complex issue if there is executable code involved in such a process.
The computational milieu also affords us the opportunity to automate processes. This is something that conventional printed documents don't allow. However, worrying about this would introduce so much complexity at this stage that this opportunity will be neglected for the purposes of this document.
Structured documents could also be used to generate other kinds of visual presentations. Slide presentations come immediately to mind, but there are probably other kinds as well. In this early stage of consideration we will not explicitly consider this kind of presentation, but we should keep this, and other possible alternative forms of display, in mind as we structure the approach.
J presents some special opportunities and special problems. J is typographically an easy language to typeset, perhaps as a response to having had so much trouble with the typesetting of APL, a predecessor of many of the ideas in J.
However, J's use of boxed output presents some challenges. The fact that this is a problem area for J may be signalled by the fact that there are two modes of J execution as far as output is concerned. They are called ASCII and Linedraw, In ASCII mode the boxes that are sometimes needed in output displays are created out of conventional ASCII characters, hyphens, vertical bars, plus signs, etc. In linedraw mode the boxing characters which have been a part of the computer milieu since the early days of PCs are used to draw much nicer cleaner looking boxes.
The tradeoff is straightforward. I can't imagine anyone would prefer the look and feel of the ASCII characters, but they exist in virtually every computational environment and are supported by all operating systems. Thus if ASCII characters are chosen, displays are guaranteed to be adequate, from an appearance standpoint, so long as the font chosen is monospaced.
On the other side, the linedraw characters are very attractive, but they don't exist in many fonts, and the output produced is particularly bizzare looking if a font happens to get called into play that has other characters in the key positions. In many fonts these characters are the accented vowels, and obviously output is very odd looking indeed if they happen to appear instead of the line characters.
Presenting the syntax of J code should not be a particularly complicated problem as J is a relatively simple language. One beneficial side-effect of using literate programming tools is that it is easy to regulaize the presentation of code fragments, as the parsing rules enforced in the code presentation process can be made to conform to standard.
It is perhaps worth discussing the look of J a bit, in an of itself. J, it must be said, is quite ugly. It should be noted that this doesn't, necessarily, make it unloveable---as many quite ugly things are loved by someone. But any language that has as a principal linguistic characteristic the occurrence of an unbalanced right parenthesis in column 1 or source text can hardly be regarded as beautiful.
I would rank J as being `about as ugly as TeX', Knuth's language for describing how to (carefully) set type. The substantial ugliness of Knuth's TeX is perhaps more noteworthy than that of J because TeX's problem domain is quite explicitly the productions of beautiful documents---and this makes the disregard of the attractiveness of the input even more jarring. But, while the algorithms that are presented in J code may well be regarded as beautiful by some, it would take a rather perverse sense of beauty to argue that the physical manifestation of the algorithms themselves is pretty, although in many cases it will likely be quite terse.
J also has a tutorial mechanism. This facility is integrated into the IDE, and allows for the easy execution of fragments of J code. However its facilities for text display are quite limited, and very simplistic in comparison with that available in some other languages.
Some of the problems of information display, particularly in contexts associated with computer programming, have been treated under the name Literate Programming following the lead of Donald Knuth.
Knuth invented Literate Programming to deal with the presentation of his code magnum opus, TeX, a system designed to help specify the typesetting of mathematics.
TeX is a very complex program, and since Knuth is interested not only in mathematics and typesetting, but computer programming as well, he was concerned with describing aspects of his complex program in a way that would allow them to be used not only for their primary purpose, but also in a `tutorial' role as exemplary computer programs. And they are exemplary programs indeed.
In order to solve this problem, Knuth invented a particular programming style that has been successfully applied in several different circumstances. It has also been taken up, and taught, in a number of different places.
Literate Programming is a `style' of recording both computer code and its documentation in one single document. The fundamental construct of literate programming is the paragraph which consits of one (or more) paragraphs of descriptive text followed by a block of code. This is a slight oversimplification of the actual situation, but not in any way which is material to the discussion here.
Central to the notion of literate programming is a breaking down of the ordering of the document that is quite normal when computer code is involved. Most code needs to be presented in some fairly carefully managed order. This order may be a reasonable one for expository purposes, but it clearly need not necessarily be so. Knuth built his Web (Now that the word `Web' has become common because of the Internet, there is often confusion between Knuth's use of the term (which well pre-dated the Internet use---they essentially have nothing to do with one another.) out of blocks which can be ordered for expository purposes, but nevertheless processed into an order appropriate for the execution of the text.
Knuth calls the processor that produces executable code from a web Tangle. This processor removes the expository sections of a Web and structures the code, assuming of course that the Web is good code, into a legal program. This program can then be handed to the appropriate compiler to produce an executable computer program.
The other processor that can be applied to a Web is called Weave. This processor takes a Web and produces readable documentation by typesetting the descriptive paragraphs and carefully composing the code into a standard, readable, form.
Knuth's concept of a Web which can be processed in to essentially very different ways is somewhat unusual in programming. Of course, there is no such thing as a free lunch, so there is no magic to this approach, but it does allow the information about a computer program to be collected in a particularly effective way. It is quite natural to divide code into units which are small enough to be comprehensible, but also large enough to accomplish something significant enough to mention. The paragraphs of text and the blocks of code are the appropriate size to manage for this kind of purpose.
The idea of Super-Literate Programming is to extend this concept into a slightly broader domain. Not only are we concerned with descriptive text and code, we are also interested in presenting the output of the execution of the code. This is not a problem in the context that Knuth has used for Literate Programming.
Nevertheless the concepts are tantalizingly close enough to one another to suggest that there may be something in at least considering how the basic concept of literate programming might be extended.
The main body of the document that drives this process consists of at least four parts, each of which may occur many times. They are the name, descriptive body, input segment and output display. We will call each occurrence of each one of these a fragment.
The overall document consists of a document header followed by any number of fragments. Each fragment consists of a name, descriptive body, input segment and output display, at least one of which is not null. In my current view it would be unusual for a fragment not to have a name, but at this stage I wouldn't want to rule anything out until there is a practical rendering of these ideas.
The name of a fragment is a title for the fragment (paragraph, input code, output).
Names act not only as titles of fragments, but they have a hierarchical structure that causes them to form---taken by themselves---an outline of the document. This clearly places some constraints on Names, most particularly that each name occurs at most one level deeper in the hierarchy than the name which preceeds it. This corresponds to normal outline structure, and suggests that the document as a whole is a conforming outline.
Descriptive bodies are pretty much just normal text. A piece of descriptive text can be several paragraphs long if that is useful.
In some contexts, descriptive text may be displayed with an indentation appopriate for the level of the name corresponding to it. In other situations this indentation may not be desired.
The input segment is code which is related to the purpose of the fragment. Whether this code is executable on a stand-alone basis or not depends on the purpose of the fragment.
Code fragments may not be executable without being set in an appropriate context. This can raise a problem for presentation in some interactive situation. Just how this should be managed, and what rules should be imposed on a document in order to make this work from a sensible engineering standpoint remain to be clarified.
However the program input is specified, there must be an algorithmic process that takes the representation and produces executable elements that can be submitted to the processor, in this case J, to produce the output that will form a part of the display.
The output display is an important part of the process being described here, and it represents the most substantial addition to the conventional literate programming technology that has been described elsewhere.
The characteristics that we have outlined in specifying the requirements of the document need to be realized in a physical form.
Next, we need to describe just how the structure of the document that has been proposed can be used to solve the problems presented by each of our target areas.
This involves figuring out how to process the source documents into the appropriate kind of object documents. It also requires that some decisions be made about what tool set is going to be used to build the necessary processor functions.
[Note: This section will require some thinking and working out.] In particular the way that this might all fit into the domain of Blogs will require some special hard thinking.
Producing the outline of a document should be one of the easiest tasks. Essentially, all that is necessary is
There are several tools which may be useful in various parts of the process of handling source files. Some have a strong prediliction for only using the target language to produce the kind of system described here, but that remains to be seen.
J is a good tool for parsing the original source and handling some aspects of the problem. It is also clear that J must be involved in processing the code fragments to convert input into output, so it is bound to be a part of the process at some level.
perl is a good candidate for some parts of the process as well.
Some literate programming resources are:
Literate Programming by Donald E. Knuth (Stanford, California: Center for the Study of Language and Information, 1992), xvi+368pp. (CSLI Lecture Notes, no. 27.) ISBN 0-937073-80-6 Japanese translation by Makoto Arisawa, Bungeiteki Programming (Tokyo: ASCII Corporation, 1994), 463pp
Nelson Beebe's bibliography http://www.math.utah.edu:8080/pub/tex/bib/index-table-l.html
Tex and Latex : Drawing and Literate Programming/Book and Disk (McGraw-Hill Programming Tools for Scientists & Engineers) by Eitan M. Gurari (Hardcover - December 1993) Limited Availability
Weaving a Program: Literate Programming in Web, Wayne Sewell Out of Print--Limited Availability
Computational character processing : character coding, input, output, synthesis, ordering, conversion, text compression, encryption, display hashing, literate programming : bibliography by Conrad Sabourin Out of Print--Limited Availability
|
David Ness Date: 8 November 2001 |
|