Linux Creator Torvalds Details Code Differences

Linux creator Linus Torvalds detailed many specific code differences between Linux and Unix in an effort to contradict assertions of copyright violation by The SCO Group.

The SCO Group Inc. recently made some specific claims about programs within Linux that it contends were stolen from the Lindon, Utah, companys Unix intellectual property. Linux founder Linus Torvalds on Monday offered a number of specific code examples that countered SCOs assesment.

/zimages/2/28571.gifTo read the reponse of Torvalds and other Linux leaders to the SCO code examples, click here.

Here are some specific areas where Torvalds found differences between Unix and Linux:

Torvalds looked at lib/ctype.c and include/linux/ctype.h.

"First, some background: The ctype name comes character type, and the whole point of ctype.h and ctype.c is to test what kind of character were dealing with," Torvalds said.

"In other words, those files implement tests for doing things like asking, is this character a digit or is this character an uppercase letter etc.

"So you can write something like:

if (isdigit(c))
.. we do something with the digit ...

and the ctype files implement that logic."

"Those files exist (in very similar form) in the original Linux-0.01 release under the names lib/ctype.c and include/ctype.h. That kernel was released in September of 1991, and contains no code except for mine and Lars Wirzenius, who co-wrotekernel/vsprintf.c."

"In fact, you can look at the files today and 12 years ago, and you can see clearly that they are largely the same: the modern files have been cleaned up and fix a number of really ugly things (tolower/toupper works properly), but they are clearly incremental improvement on the original one."

Torvalds added that original Linux version did not look like the Unix source one. He said it had several similarities that he attributed to the following reasons:

  • The ctype interfaces are defined by the C standard library.
  • The C standard also specifies what kinds of names a system-library interface can use internally. In particular, the C standard specifies that names that start with an underscore and a capital letter are internal to the library.

    This is important, Torvalds said, because it explains why both the Linux implementation and the Unix implementation used a particular naming scheme for the flags.

  • Algorithmically, there arent many ways to test whether a character is a number or not. Thats especially true in C, where a macro must not use its argument more than once.

    Torvalds provided an example: "The obvious implementation of isdigit(), which tests for whether a character is a digit or not) would be:

    #define isdigit(x) ((x) >= 0 && (x) <= 9)

    but this is not actually allowed by the C standard, because x is used twice."

"This explains why both Linux and traditional Unix use the other obvious implementation: having an array that describes what each of the possible 256 characters are, and testing the contents of that array (indexed by the character) instead. That way the macro argument is only used once," Torvalds said.

"This basically explains the similarities. There simply arent that many ways to do a standard C ctype implementation, in other words," he said.

Torvalds went on to "look at the differences between Linux and traditional Unix."

"Both Linux and traditional Unix use a naming scheme of underscore and a capital letter for the flag names. There are flags for is upper case (_U) and is lower case (_L), and surprise, surprise, both Unix and Linux use the same name," Torvalds said.

"But think about it—if you wanted to use a short flag name, and you were limited by the C-standard naming, what names would you use? Maybe youd select U for Upper case and L for Lower case?"

"Looking at the other flags, Linux uses _D for Digit, while traditional Unix instead uses _N for Number. Both make sense, but they are different," he continued.

"I personally think that the Linux naming makes more sense (the function that tests for a digit is called isdigit(), not isnumber()), but on the other hand, I can certainly understand why Unix uses _N—the function that checks for whether a character is alphanumeric is called isalnum(), and that checks whether the character is a upper case letter, a lower-case letter or a digit (a k a, number)."

"In short: there arent that many ways you can choose the names, and there is lots of overlap, but its clearly not 100 percent," Torvalds said.

Next page: The code marches on.