2006-03-29

Celebes Kalossi 2.0

I've decided it's time to post about my object-oriented programming model, Celebes Kalossi, again. All previous statements are inoperative, so you don't have to look back at my earlier postings. This posting will be mostly about terminology.

In CK, there are classes. A class contains declarations of state variables (aka instance variables, fields, data members) and both declarations and definitions of methods. State variables are only accessible from within the class: they are all private in Java terminology.

A declaration of a method specifies the method's name and its signature; that is, the type of its return value and the names and types of its arguments. In the model, no two methods in a class can have the same name; an actual implementation might provide Java-style method overriding, since overriding is resolved at compile time and is basically convenient syntactic sugar.

A definition of a method specifies everything the corresponding declaration does, but also includes the code of the method. If a class contains a definition of a method, it has no need to contain its declaration too.

A method may be public, private, or neither; the third type will be called standard methods here. A public method can be called from anywhere, and can be invoked on any object. A private method cannot be invoked outside the class in which it is defined, so there is no point in declaring one. Basically, it's just a subroutine. The difference between standard and private methods will be explained in another posting.

Standard and private methods can only be invoked on the self (this in Java) object, implicitly or explicitly. The most important rule of CK is that you cannot invoke a method on self that is not declared (not necessarily defined) in the current class.

Finally, by "Java" I mean "Java or C#". More later.

Speaking in Ander-Saxon

Some while back I wrote a posting on partially understanding languages that included a well-known quotation from Old English specialist Tom Shippey about how English became simplified over time.

Here's a translation (by me) of that explanation into Ander-Saxon, a variety of English in which French, Latin, and Greek words and roots are replaced by native English ones.

Reckon what happens when somebody who speaks, shall we say, good Old English from the south of the land runs into somebody from the northeast who speaks good Old Norse. They can without fear pass on with each other, but the hardnesses in both tongues are going to get lost. So if the Anglo-Saxon from the South wants to say (in good Old English) "I'll sell you the horse that pulls my cart", he says: Ic selle the that hors the draegeth minne waegn.
Now the old Norseman -- if he had to say this -- would say: Ek mun selja ther hrossit er dregr vagn mine. So, roughly speaking, they understand each other. One says waegn and the other says vagn. One says horsand draegeth; the other says hros and dregr, but broadly they are onpassing. They understand the root words. What they don't understand are the wizardly bits of the wholespeech.
For a showdeal, the man speaking good Old English says for one horse that hors, but for two horses he says tha hors. Now the Old Norse speaker understands the word hors all right, but he's not sound if it means one or two, byspring in Old English you say "one horse", "two horse". There is no apartness between the two words for "horse". The apartness is carted in the word for "the", and the old Norseman might not understand this, byspring his word for "the" doesn't behave like that. So: are you trying to sell me one horse or are you trying to sell me two horses? If you get enough sittings like that there is a strong drive toward straightening out the tongue.

(I ran this past Professor Shippey in email.)

2006-03-26

On this and that

Here's a few little bits scoured up from here and there.

Boswell on Johnson's Dictionary:

A few of his definitions must be admitted to be erroneous. Thus, Windward and Leeward, though directly of opposite meaning, are defined identically the same way; as to which inconsiderable specks it is enough to observe, that his Preface announces that he was aware there might be many such in so immense a work; nor was he at all disconcerted when an instance was pointed out to him. A lady once asked him how he came to define Pastern the knee of a horse: instead of making an elaborate defence, as she expected, he at once answered, "Ignorance, Madam, pure ignorance." His definition of Network ["Any thing reticulated or decussated, at equal distances, with interstices between the intersections"] has been often quoted with sportive malignity, as obscuring a thing in itself very plain.

To which we may add his definition of lexicographer: "a writer of dictionaries, a harmless drudge".

On the names for people with variously colored hair:

Blond and blonde are masculine and feminine forms, though the latter is rarely used as an adjective nowadays, only as a noun. Brunette, on the other hand, is feminine only; the form brunet which is sometimes found is not French, not English, and entirely barbarous. -ette is inherently both feminine and diminutive (though the latter sense dominates in English, as in cassette, diskette, kitchenette, statuette), and not to be split up into two separate affixes.

On whiteboards:

Whiteboards are common in corporations, but I have never seen one in any educational establishment in the U.S. (which is by no means to say there are none). The coolest variety have a large canvas which can be scrolled left or right, by full screens or by smaller steps, and can even save copies of what's currently in view using a giant scanner; you can hook up a conventional printer for hard copy or (I suppose) put them on a network. I only got to use such a Wundergerät once or twice, alas.

On Latin in Great Britain:

The Great Vowel Shift that changed the pronunciation of the English long vowels in the 15th century affected not only English but also the spoken Latin of the monasteries. Indeed, there was a period where English and Scottish Latiners could not understand one another, because Scottish Latin did not undergo the Shift even though Scots itself (mostly) did!

On how history could have gone:

Could the Internet have been invented if telephones hadn't been invented first? I think so. Telegraphy is a lot simpler than telephony, and telegraph operators had something socially very like the Internet (but involving a lot fewer people, of course) more than a hundred years ago. There were even routers and protocol gateways, instantiated by human beings.

A technical civilization might well go from semaphore telegraphs to electric telegraphs to teletypewriters to Morse-code radio to high-speed wired and wireless digital transmissions, missing analog telephones and radio altogether.

On the root *tag-:

Ruminating over the English words tact and tactics led me to realize how interestingly convergent in meaning they have become, descending from the same PIE root *tag- through different branches, respectively Latin tangere, tactus 'touch(ed)'; Greek taktikh 'deployment < arrangement'.

On tornadoes:

Conventional wisdom says tornadoes never happen in the Eastern U.S. Conventional wisdom, as all too often, does not know its history. Tornadoes have been recorded in all of the fifty states and D.C. Indeed, only the following 10 states have not had a major tornado (causing death or property damage) since 1980:

Alaska (1959), Hawaii (1971), Indiana (1974), Iowa (1979), Kentucky (1974), Minnesota (1978), Missouri (1973), North Dakota (1978), Vermont (1970), West Virginia (1974).

A Creative Choice

This piece was submitted by me to the mailing list Heroic Stories. It appears here in slightly modified form.

I used to work as a programmer for a news service, a small subsidiary of a larger news and financial information company. We write and publish medical news over the Internet; our customers include companies with medical websites, pharmaceutical companies, newspapers, and specialized and general-use web portals.

Back in 2002, advertising-supported media (which means most media) had fallen on hard times as a result of the slow economy. Our subsidiary, like many media companies, had to cut back on its staff. For us the need was particularly acute, as most of our customers were Internet-based, and about half of them went belly-up after the dot-com bubble burst in 2001.

We had staved off the problem for about a year, thanks to having annual contracts. But eventually we had to cut costs, and the only way we could do that and still maintain service to our remaining customers was to cut staff. As a result, in August 2002 the "powers that be" declared that one or two people would have to be sacrificed from each department: sales, financial, news-writing, and technical.

The financial department was abolished altogether and its functions transferred to a group in the parent company. Most of the other groups naturally suffered as a result of losing journalists, editors, and salespeople -- but they survived, still able to perform their missions.

Our technical department, however, consisted of just two programmers and a system administrator. Without the programmers, we couldn't maintain our existing systems and implement new ones. Without the system administrator, who doubled as a help-desk person, we would have been unable to support the rest of the subsidiary or keep our production systems backed up and running smoothly. Terminating any of us would have meant a massive workload for the remaining two, much of it work they were not trained to perform. It was an ugly choice to make.

The director of the technical department decided to meet the challenge in a creative way. She was going on maternity leave just after the announcement came out, and decided to terminate herself instead of one of her staff. She said that she considered herself the "most expendable" person in the technical department.

Management was shocked by the idea of losing a department manager instead of regular staff. They protested loudly and tried to make Sandra change her mind, but to no avail. Her clear-headed analysis prevailed, and it was decided that after Sandra's departure we would report jointly to a technical manager within the parent company and the CEO of our subsidiary.

Sandra returned from maternity leave and worked until the end of 2002, then left to devote herself to motherhood and free-lance work. As a result of her selfless action, the three of us who remained were able to fulfill our customers' and employer's needs. In the end, however, both of the other two were let go, leaving me to perform all the remaining technical functions until the end of 2005, when I too was laid off.

2006-03-24

I had to say that

Josiah Willard Gibbs was the greatest American physicist of the 19th century, perhaps the only world-class American physicist of his time.

He was well-known among his friends and peers not only for his brilliance but also for his extraordinary modesty. They were astounded, therefore, when on one occasion when Gibbs was testifying as an expert witness, and the opposing lawyer asked him what right he had to say such things, he replied "I am the greatest living authority on the matter."

Gibbs's explanation after the fact: "I had to say that; I was on oath".

French fries are not chips

It's commonly held that English "chips" are American "French fries", but I deny it. McDonald's-style fried potatoes are canonical French fries, but they are not canonical chips. Leftpondians don't eat chips very often, and perhaps think of them as "fat French fries" if they don't know any better, but the point is that the two terms refer to different things. Congress is the American Parliament, no doubt, but it would be absurd to say that the terms had the same referent!

Rightpondians often do claim that the fried potato products sold by McDonald's are "chips". Since they have only one term available, they will tend to use it for all fried potato products other than crisps ("potato chips" in American English), whether or not the potatoes are julienned (as in French fries proper) or sliced in large wedges or bars (as in chips proper). But taking a transatlantic perspective, the two terms are not really interchangeable, for they have different prototypes. This is not the case with "crisps" vs. "potato chips", which have only one prototype.

But when an American goes to a place (whether in America or elsewhere; in my case, about 600 meters away) where "chips" are served and openly called by that name, s/he will have quite a different gustatory experience from what results from eating "French fries".

To consider the flip side of the issue for the moment, if I went to England and saw an Erithecus rubecula, I would have to call it a "robin", because no other English term is available. That doesn't mean that I don't know it's a different bird from the specimens of Turdus migratorius that I commonly denote by that term.

Ralph vs. the Tortoise

Consider the following statements:

  1. Ralph believes that Ortcutt is not a spy.
  2. Ralph believes that the man in the brown hat is a spy.
  3. The man in the brown hat is Ortcutt.

Therefore:

  1. Ralph believes of Ortcutt that he is not a spy.
  2. Ralph believes of Ortcutt that he is a spy.

This is apparently no problem, as long as Ralph does not believe "Ortcutt is a spy and Ortcutt is not a spy", which he does not. People with appropriate false beliefs or appropriate ignorance can believe (de re) contradictory things.

But now consider Hofstadter's Tortoise:

  1. The Tortoise affirms "My shell is green".
  2. The Tortoise affirms "My shell is not green".
  3. The Tortoise rejects "My shell is green and my shell is not green".

It seems to follow that:

  1. The Tortoise believes of his shell that it is green.
  2. The Tortoise believes of his shell that is is not green.

Must we accept that the Tortoise's beliefs are not contradictory de re, but only de dicto? The de re version seems exactly parallel to Ralph's de re beliefs. Yet Ralph is merely ignorant of a key point (viz. #3), whereas the Tortoise seems to be "logically insane".

Writing out XML

You can't just embed plain text into an XML element or attribute; character content and attribute values have to be escaped in a number of ways, not necessarily obvious. Here's a checklist of things to make sure to do. (Once again, this post will look terrible in RSS readers that don't fully understand Atom.)

  1. Escape all & characters as &amp;.
  2. Escape all < characters as &lt;.
  3. Escape all > characters as &gt;. Technically it's enough to do so only when they are preceded by ]] in character content, but in my opinion making that check is more trouble than it's worth.
  4. Escape all carriage-return characters as &#xD;. These should be very rare in XML content, as they will have been converted to line-feeds on parsing.
  5. Escape all tab characters in attribute values as &#x9;. You can escape them in character content if you want, but it's not necessary.
  6. Escape all line-feed/newline characters in attribute values as &#xA; (not D as I first wrote).
  7. Output all line-feed/newline characters in character content as the local line terminator: carriage-return (on Mac Classic), line-feed (on Unix) or both (on Windows). You can provide alternative line terminators at user option.
  8. Escape all characters that can't be represented in the output character set. If the output character set is UTF-8 or UTF-16 (in any flavor), this step is not necessary.
  9. Directly output everything else.

I'm glad to say that XOM, my favorite XML tree representation, does all these things in its Serializer class.

2006-03-23

My favorite errata page

viii

ERRATA

p. viii: for "ERRATA" read "ERRATUM".

How to write XHTML even if you don't know how

Warning to RSS users: this post may not be legible in newsreaders that don't understand Atom very well.

  1. Put all tag names and attribute names in lower case.
  2. Make sure every start-tag has an end-tag. This rule does not apply to the HTML empty tags, namely basefont, br, area, link, img, param, hr, input, col, frame, and isindex. (If you don't know what some of these are, don't worry about it).
  3. Replace the > at the end of an empty tag with the three-character sequence " />".
  4. Make sure all start-tags and end-tags are properly nested.
  5. Make sure all attribute values are in quotation marks, either single or double.
  6. Make sure attributes like "checked", that don't have values, are written "checked='checked'".
  7. Any & and < characters, even in scripts or stylesheets, must be replaced by &amp; and &lt; respectively.
  8. Don't wrap scripts in comment markers (<!-- ... -->).
  9. Make sure you use the semicolon after an entity reference like &aacute;.

That's all.

Gorillas in the desert

Here's a story about an anthropologist working among the Yaqui, an Indian nation in northern Mexico. In New World Spanish, the word indio 'Indian' has two senses, one neutral, one derogatory, and it is all too commonplace for speakers to slip between the meanings without really being aware of it.

On this particular occasion, the anthropologist was sitting around a fire with some Yaquis. One of them, who was rather large and rather drunk, got up suddenly and began to circle the campfire, beating his chest and shouting "Soy indio ... soy indio ..." (as much as to say, "I'm an [epithet], and what are you going to do about it?").

The anthropologist felt a bit intimidated by this, and decided he had to do something to deflect possible violence. So he too got up, began to circle the fire, beat his chest, and shout "Soy judío ... soy judío ...". It all ended happily.

2006-03-22

Why per-CPU pricing for software can be sensible

Contra Tim Bray, per-CPU pricing is actually quite a reasonable thing to do if you think your product is not readily replaceable (that is, if all competitive products are actually substitutes only). It's a form of price discrimination, aka "charge the rich high, the poor low".

This approach is sustainable when 1) you can reliably tell who the rich are, and 2) you can prevent a secondary market from arising (so that the poor sell to the rich at a profit, undercutting you). Because (proprietary) software is copyright, and is licensed not sold, it meets the second condition; predicating high prices on expensive features of the platform meets the first condition.

Quine's Paradox

We all (being reasonable persons and not fanatics) are trapped by Quine's Paradox: namely, to believe a statement p is to believe that p is true, so I believe that each of my beliefs is true. Yet I also believe that some of my beliefs (I know not which) will turn out to be false if and when tested. (I believe I left my glasses in my bedroom today, but beliefs of this sort have turned out to be false often enough...)

So:

I believe that each of my beliefs is true;
I believe that some of my beliefs are false.

Saith Quine: "I for one had hoped for better from reasonable persons."

Typographical variety

  1. The Polish acute accent is shorter and stubbier than the Western one.
  2. French likes to put spaces in front of certain terminal punctuations, notably semicolon and colon, and also inside guillemets.
  3. Quotation marks have at least six flavors in Europe alone:
    • 6-quotes ... 9-quotes (English, Dutch, Italian, Spanish, Turkish)
    • 9-quotes ... 9-quotes (Scandinavian languages)
    • low-9-quotes ... 6-quotes (German, Czech, Slovak)
    • low-9-quotes ... 9-quotes (Hungarian, Polish)
    • guillemets pointing in (Slovene, German sometimes)
    • guillemets pointing out (French, Greek, Russian)
  4. Some languages like initial dashes for dialogue, some don't.
  5. French c with cedilla can be written with a detached comma below, but not so in Portuguese or Catalan. Turkish insists on s with cedilla, Romanian on s with a comma below for their respective sh-sounds. (The story for Gagauz, which is a Turkic language spoken in Romania, is still uncertain.)
  6. Inverted punctuation marks are unique to Spanish.
  7. Lojban uses dots at the beginnings of words. :-)

To continue

Thanks, Tim, for getting me off my butt. I was gonna post today already, I was, I was. Been thinking about it all week. Really. No more excuses. Gonna post. Watch this space.

Oh yes. This is post #160.