Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Subscribe
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Subscribe
    Home Latest News
    • Storage

    VorteXML Turns Data Into XML

    Written by

    Timothy Dyck
    Published January 20, 2003
    Share
    Facebook
    Twitter
    Linkedin

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      Datawatch Corp.s new VorteXML Server 1.0, which started shipping last month, provides a flexible template-based system to extract data from the straw of undifferentiated text files and turn it into XML gold.

      The servers sweet spot is with organizations that have collections of plain-text or HTML files (such as invoices, reports, confirmation e-mail messages or log files) that they want to turn into the more usable XML data format.

      However, eWeek Labs tests also found a number of important limitations that make this product more difficult to deploy than it should be and could point users toward products from close rivals Whitehill Technologies Inc. and ItemField Inc.

      Our main concern is that the server generates no warnings when it detects formatting errors in input text files. For example, a numeric value lost its cents digits when we added a dollar sign before the value and was silently rounded down to the nearest 10th-of-a-dollar value—despite our tagging the text data as a numeric value with two decimal places.

      Data fields not having the correct input format in test files (we tried with both numeric and date fields) were just skipped, leaving empty elements in our output XML. As a result of our feedback, Datawatch will add a key error-checking feature to its next VorteXML Server release, enabling administrators to prevent the generation of empty elements or attributes.

      VorteXML Server also has little flexibility in the data formats or platforms it supports. Only ASCII or ANSI text files can be imported; it lacks input filters for nontext data types such as Microsoft Corp. Word documents, rich-text- format documents or Adobe Systems Inc.s PDF. ItemFields ContentMaster has more flexibility in this area.

      Although VorteXML Server supports two older XML metadata formats—Document Type Definition and XML Data Reduced—it doesnt support the current standard, the much more powerful XML Schema. Text, numeric and date data types are supported, and XML Schema support is planned for a future update.

      The server is moderately priced at $7,999 per server for up to two CPUs and $1,999 for each additional two CPUs. A copy of Datawatchs $599 VorteXML Windows-based desktop text file-to-XML conversion tool is required to create the import templates VorteXML Server uses.

      VorteXML Server needs the full Microsoft stack: Windows 2000 or higher and SQL Server 7.0 or later. (A copy of the free Microsoft Data Engine is included for those who dont already have a copy of SQL Server.) Microsofts IIS (Internet Information Services) is required if VorteXML Servers Simple Object Access Protocol interface is used.

      Converting nonstructured formats such as text files into a structured format such as XML is inherently a hard problem to solve. VorteXML Servers strongest feature is its VorteXML desktop tool, which uses an intuitive, flexible “painting” system to highlight data fields in input text files. (VorteXML can do the XML conversion itself but only on a single input file.)

      VorteXML provides a mechanism to identify data fields through a combination of nearby text field labels, delimiters and absolute line position. It also has an expression language (although not a full programming language) to perform variable manipulation.

      VorteXML handles HTML data in an unusual way: HTML files are preparsed to extract only text between tags, and this text is marked with a tag sequence number generated by VorteXML. The sequence number makes it easy to select items that appear only once in a file, but we had to resort to trickery to extract a list of items without the lists column heading (which, with the loss of its heading tag, was not differentiated from the list items). Preserving tag metadata, such as tag type or attribute values, would make this process easier.

      Once we had a template created, we used VorteXML Servers management console to define a conversion project with input and output file directories and an associated conversion template.

      Conversion was a beautifully simple matter of just dropping input files into the input directory. The new files were automatically detected and moved to a processed directory; matching XML files showed up shortly thereafter in the output directory. The ability to put output data directly into a relational database would be a good future addition.

      Performance was slow in tests using VorteXML Servers included Microsoft Data Engine database: Converting 100 files took 33 minutes on a dual Intel Corp. Pentium III server (results that Datawatch didnt see in its replications of our tests). Switching to SQL Server 2000 reduced our processing time to 2 minutes, more than an order of magnitude faster.

      West Coast Technical Director Timothy Dyck is at [email protected].

      Executive Summary

      : VorteXML Server 1.0″>

      Executive Summary: VorteXML Server 1.0

      Usability

      Excellent

      Capability

      Good

      Performance

      Good

      Interoperability

      Poor

      Manageability

      Fair

      Scalability

      Fair

      Security

      Good

      VorteXML makes what can be a difficult job—turning text data into XML—straightforward. The tool is easy to use and will do the hoped-for job in many situations. However, the 1.0 release has a significant number of functional gaps that make it difficult for administrators to detect when input text files contain formatting errors.

      COST ANALYSIS

      At $8,000 per server, VorteXML isnt that expensive, but for one-off jobs we would turn first to text processing languages such as Perl, sed or awk.

      (+) Easy, powerful text file template definition tool and expression language; automatic file-based import system eases data input; Web services interface.

      (-) No mechanism to alert administrators to bad data in import files; no import filters included, so only straight text files can be imported; HTML files are parsed in a way that discards most tag metadata; doesnt support XML Schema; requires Windows 2000, IIS and Microsoft SQL Server (or Microsoft Data Engine).

      EVALUATION SHORT LIST

      • In-house development of small programs to do text transformation
      • Whitehill Technologies xml Transport
      • ItemFields ContentMaster
      • Data Junction Corp.s Data Junction Content Extractor
      • vortexml.datawatch.com
      Timothy Dyck
      Timothy Dyck
      Timothy Dyck is a Senior Analyst with eWEEK Labs. He has been testing and reviewing application server, database and middleware products and technologies for eWEEK since 1996. Prior to joining eWEEK, he worked at the LAN and WAN network operations center for a large telecommunications firm, in operating systems and development tools technical marketing for a large software company and in the IT department at a government agency. He has an honors bachelors degree of mathematics in computer science from the University of Waterloo in Waterloo, Ontario, Canada, and a masters of arts degree in journalism from the University of Western Ontario in London, Ontario, Canada.

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      9 Best AI 3D Generators You Need...

      Sam Rinko - June 25, 2024 0
      AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Video

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.