More Code Examples

By Steven Vaughan-Nichols  |  Posted 2003-12-22 Print this article Print

"The original Linux ctype.h/ctype.c file has obvious deficiencies, which pretty much point to somebody new to C making mistakes—(me) rather than any old and respected source," Torvalds observed. "For example, the toupper()/tolower() macros are just totally broken, and nobody would write the isascii() and toascii() the way they were written in that original Linux. And you can see that they got fixed later on in Linux development, even though you can also see that the files otherwise didnt change," he said.

"Remember how C macros must only use their argument once. So lets say that you wanted to change an upper-case character into a lower-case one, which is what tolower() does. Normal use is just a fairly obvious:

newchar = tolower(oldchar)

"And the original Linux code does:

extern char _ctmp;
#define tolower(c)

"This is not very pretty, but notice how we have a temporary character _ctmp (remember that internal header names should start with an underscore and an upper case character—this is already slightly broken in itself). Thats there so that we can use the argument c only once—to assign it to the new temporary—and then later on we use that temporary several times," he said.

Torvalds said the reasons this is broken are:

  • Its not thread-safe. "If two different threads try to do this at once, they will stomp on each others temporary variable."

  • The argument might be a complex expression, and as such it should really be parenthesized, he said. The above example "gets several valid (but unusual) expressions wrong."

According to Torvalds, these are the kinds of mistakes a young programmer would make. "Its classic," he said.

"And I bet its not what the Unix code looked like, even in 1991. Unix by then was 20 years old, and I think that it uses a simple table lookup (which makes a lot more sense anyway and solves all problems)," he suggested. "Id be very surprised if it had those kinds of beginner mistakes in it, but I dont actually have access to the code, so what do I know? I can look up some BSD code on the Web, it definitely does not do anything like the above," he continued.

Torvalds added that the lack of proper parentheses existed in other places of the original Linux ctype.h file. He said isascii() and toascii() were similarly broken.

"In other words: There are lots of indications that the code was not copied, but was written from scratch. Bugs and all," he said.

Torvalds said he had recently searched Google for the term _ctmp. He said the results were Linux-related. "Doing a Google search for _ctmp -linux shows more Linux pages that just dont happen to have Linux in them, except for one which is the L4 microkernel. And that one shows that they used the Linux header file [since] it still says _LINUX_CTYPE_H in it."

"There is definitely a lot of proof that my ctype.h is original work," Torvalds said.

Steven J. Vaughan-Nichols is editor at large for Ziff Davis Enterprise. Prior to becoming a technology journalist, Vaughan-Nichols worked at NASA and the Department of Defense on numerous major technological projects. Since then, he's focused on covering the technology and business issues that make a real difference to the people in the industry.

Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Rocket Fuel