Dell officials want to enable the “citizen data scientist.”
Dell is rolling out version 13.1 of the company’s Statistica data analytics platform, which will offer a range of new capabilities in such areas as expanded in-database analytics, improved fraud detection through network analytics and enabling analytics at the network edge on devices and systems, which will address the growing amounts of data being generated by the Internet of things (IoT).
A key part of the new software, which is available now, is making it easier for line-of-business workers to prepare and analyze structured and unstructured data from myriad resources and distribute the results of the analysis. As the data generated by companies continues to skyrocket, there is a growing demand for tools that will make it easier and faster for businesses to quickly analyze the data and derive useful business information from it.
Data scientists are become increasingly important at companies, but there is a lack of available skilled data scientists in the market. At the same time, line-of-business workers no longer want to bring their data to someone and then have to wait for the results to come back, according to Shawn Rogers, chief research officer at the Information Management Group at Dell. They want the ability to run the analytics themselves on the data they have and then share it with others.
“Data is at the core of everybody’s company as far as innovation goes,” Rogers said at a recent small event in Boston. “Everybody wants to leverage [data]. Not everybody knows how to do that. This is not about IT anymore. This is coming from business.”
Dell significantly bolstered it big data analytics capabilities in 2014 when it bought StatSoft, a company from Oklahoma with more than 25 years of expertise building analytic software. Dell officials soon after the deal changed the name to Statistica and began seeing what they had with their new acquisition. They spent 18 months working on the software and scaling. In October 2015, Dell released Statistica 13.0, the first real version with Dell’s fingerprints on it that touched on such areas as the user interface and in-database analytics.
Now comes version 13.1, and that will be followed by several other point upgrades that will build upon what 13.1 offers, according to John Thompson, general manager of Dell Statistica. Version 14.0 will see Dell making the software available in public, private and hybrid clouds, Thompson said during the Boston event.
To highlight the growing importance of data and analytics, Dell officials pointed to research by Gartner analysts who found that by 2018, more than half of large organizations worldwide will “compete using advanced analytics and proprietary algorithms,” and that by 2020, “predictive and prescriptive analytics will attract 40 percent of enterprises’ net new investment in business intelligence and analytics.”
The new version of Statistica has data preparation tools that will enable citizen data scientists to help drive the use of analytics. These are people who “don’t have Ph.D.s in math,” Rogers told eWEEK. “They can’t do algorithms, but they understand the benefits” of analytics.
Using Reusable Process Template in Statistica, data scientists can create analytic models and workflows and then distribute them to nontechnical users, who can use them to run analytics initiatives and share the results with others, Dell officials said. Those workflow templates can be used repeatedly throughout the organization, making analytics more efficient and solving problems without having to always rely on high levels of technical expertise.
Dell Targets ‘Citizen Data Scientist’ With Statistica 13.1
For Shire, having tools that enable nontechnical employees to run analytics initiatives in repeatable and validated ways is crucial, according to Rob Dimitri, head of process analytics at the global specialty biopharmaceutical company, which develops medicines for people with rare diseases. Traditionally, Shire employees would collect data and manually plug it into spreadsheets or Excel programs, which raised a number of issues, from scalability to validation to compliance, Dimitri told eWEEK.
These challenges not only posed efficiency and cost problems, but also could lead to Shire having to discard a batch of medicine being developed. For a company that produces about 30 batches a year, any lost batches means that medicine doesn’t reach the patients. With Statistica, employees have templates and models they can use to run a broad array of analytics initiatives and can then share the results. For Shire executives, this means faster, more relevant business information, processes that can be validated for compliance reasons and better product in the end. In addition, employees are empowered to more easily get the results and information they need.
“They’re coming to us” with their initiatives, Dimitri said. “We’re not going to them.”
In addition to supporting citizen data scientists, Dell officials also are looking to enable companies to do more IoT analytics at the edge of the network. A number of tech vendors are pushing analytics closer to devices that are generating or storing the data. Having analytics at the network edge enables faster results and response times, and also reduces the amount of data being sent to a central location. Sending all the IoT-generated data over the network to a central place for analysis—and then sending the results back to the edge—not only costs a lot in bandwidth, but also opens up the data to security problems.
In conjunction with Dell’s Boomi business, Statistica can put what officials are calling “analytic atoms” into any edge device or gateway, including Dell’s Edge Gateway 5000 Series, announced in October 2015. The technology enables the decision about what data needs to be sent back to a central repository to be made near the network edge.
Other capabilities in Statistica 13.1 include an extension of the company’s Native Distributed Analytics Architecture (NDAA) technology to provide in-database analytics to a wider range of databases. Along with Microsoft SQL Server, users can now run in-database analytics on Apache Hive (on Spark), MySQL, Teradata and Oracle databases. The new version also combines predictive analytics with human expertise for improved network analytics, officials said.
There also are improved visualization dashboards, an upgraded Web UI and enhanced validated data entry, they said.