Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home IT Management
    • IT Management
    • Servers

    Linux Creator Torvalds Details Code Differences

    Written by

    Steven J. Vaughan-Nichols
    Published December 22, 2003
    Share
    Facebook
    Twitter
    Linkedin

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      The SCO Group Inc. recently made some specific claims about programs within Linux that it contends were stolen from the Lindon, Utah, companys Unix intellectual property. Linux founder Linus Torvalds on Monday offered eWEEK.com a number of specific code examples that countered SCOs assesment.

      /zimages/2/28571.gifTo read the reponse of Torvalds and other Linux leaders to the SCO code examples, click here.

      Here are some specific areas where Torvalds found differences between Unix and Linux:

      Torvalds looked at lib/ctype.c and include/linux/ctype.h.

      “First, some background: The ctype name comes character type, and the whole point of ctype.h and ctype.c is to test what kind of character were dealing with,” Torvalds said.

      “In other words, those files implement tests for doing things like asking, is this character a digit or is this character an uppercase letter etc.

      “So you can write something like:

      if (isdigit(c))
      .. we do something with the digit …

      and the ctype files implement that logic.”

      “Those files exist (in very similar form) in the original Linux-0.01 release under the names lib/ctype.c and include/ctype.h. That kernel was released in September of 1991, and contains no code except for mine and Lars Wirzenius, who co-wrotekernel/vsprintf.c.”

      “In fact, you can look at the files today and 12 years ago, and you can see clearly that they are largely the same: the modern files have been cleaned up and fix a number of really ugly things (tolower/toupper works properly), but they are clearly incremental improvement on the original one.”

      Torvalds added that original Linux version did not look like the Unix source one. He said it had several similarities that he attributed to the following reasons:

      • The ctype interfaces are defined by the C standard library.
      • The C standard also specifies what kinds of names a system-library interface can use internally. In particular, the C standard specifies that names that start with an underscore and a capital letter are internal to the library.

        This is important, Torvalds said, because it explains why both the Linux implementation and the Unix implementation used a particular naming scheme for the flags.

      • Algorithmically, there arent many ways to test whether a character is a number or not. Thats especially true in C, where a macro must not use its argument more than once.

        Torvalds provided an example: “The obvious implementation of isdigit(), which tests for whether a character is a digit or not) would be:

        #define isdigit(x) ((x) >= 0 && (x) <= 9)

        but this is not actually allowed by the C standard, because x is used twice.”

      “This explains why both Linux and traditional Unix use the other obvious implementation: having an array that describes what each of the possible 256 characters are, and testing the contents of that array (indexed by the character) instead. That way the macro argument is only used once,” Torvalds said.

      “This basically explains the similarities. There simply arent that many ways to do a standard C ctype implementation, in other words,” he said.

      Torvalds went on to “look at the differences between Linux and traditional Unix.”

      “Both Linux and traditional Unix use a naming scheme of underscore and a capital letter for the flag names. There are flags for is upper case (_U) and is lower case (_L), and surprise, surprise, both Unix and Linux use the same name,” Torvalds said.

      “But think about it—if you wanted to use a short flag name, and you were limited by the C-standard naming, what names would you use? Maybe youd select U for Upper case and L for Lower case?”

      “Looking at the other flags, Linux uses _D for Digit, while traditional Unix instead uses _N for Number. Both make sense, but they are different,” he continued.

      “I personally think that the Linux naming makes more sense (the function that tests for a digit is called isdigit(), not isnumber()), but on the other hand, I can certainly understand why Unix uses _N—the function that checks for whether a character is alphanumeric is called isalnum(), and that checks whether the character is a upper case letter, a lower-case letter or a digit (a k a, number).”

      “In short: there arent that many ways you can choose the names, and there is lots of overlap, but its clearly not 100 percent,” Torvalds said.

      Next page: The code marches on.

      More Code Examples

      “The original Linux ctype.h/ctype.c file has obvious deficiencies, which pretty much point to somebody new to C making mistakes—(me) rather than any old and respected source,” Torvalds observed.

      “For example, the toupper()/tolower() macros are just totally broken, and nobody would write the isascii() and toascii() the way they were written in that original Linux. And you can see that they got fixed later on in Linux development, even though you can also see that the files otherwise didnt change,” he said.

      “Remember how C macros must only use their argument once. So lets say that you wanted to change an upper-case character into a lower-case one, which is what tolower() does. Normal use is just a fairly obvious:

      newchar = tolower(oldchar)

      “And the original Linux code does:

      extern char _ctmp;
      #define tolower(c)
      (_ctmp=c,isupper(_ctmp)?_ctmp+(a+A):_ctmp)

      “This is not very pretty, but notice how we have a temporary character _ctmp (remember that internal header names should start with an underscore and an upper case character—this is already slightly broken in itself). Thats there so that we can use the argument c only once—to assign it to the new temporary—and then later on we use that temporary several times,” he said.

      Torvalds said the reasons this is broken are:

      • Its not thread-safe. “If two different threads try to do this at once, they will stomp on each others temporary variable.”
      • The argument might be a complex expression, and as such it should really be parenthesized, he said. The above example “gets several valid (but unusual) expressions wrong.”

      According to Torvalds, these are the kinds of mistakes a young programmer would make. “Its classic,” he said.

      “And I bet its not what the Unix code looked like, even in 1991. Unix by then was 20 years old, and I think that it uses a simple table lookup (which makes a lot more sense anyway and solves all problems),” he suggested. “Id be very surprised if it had those kinds of beginner mistakes in it, but I dont actually have access to the code, so what do I know? I can look up some BSD code on the Web, it definitely does not do anything like the above,” he continued.

      Torvalds added that the lack of proper parentheses existed in other places of the original Linux ctype.h file. He said isascii() and toascii() were similarly broken.

      “In other words: There are lots of indications that the code was not copied, but was written from scratch. Bugs and all,” he said.

      Torvalds said he had recently searched Google for the term _ctmp. He said the results were Linux-related. “Doing a Google search for _ctmp -linux shows more Linux pages that just dont happen to have Linux in them, except for one which is the L4 microkernel. And that one shows that they used the Linux header file [since] it still says _LINUX_CTYPE_H in it.”

      “There is definitely a lot of proof that my ctype.h is original work,” Torvalds said.

      Steven J. Vaughan-Nichols
      Steven J. Vaughan-Nichols
      I'm editor-at-large for Ziff Davis Enterprise. That's a fancy title that means I write about whatever topic strikes my fancy or needs written about across the Ziff Davis Enterprise family of publications. You'll find most of my stories in Linux-Watch, DesktopLinux and eWEEK. Prior to becoming a technology journalist, I worked at NASA and the Department of Defense on numerous major technological projects.

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      9 Best AI 3D Generators You Need...

      Sam Rinko - June 25, 2024 0
      AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Video

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×