Shopify’s growth is unparalleled in the e-commerce industry. Collectively, the platform’s merchants make up the 7th largest company in the world in terms of revenue. That puts them above conglomerates like Apple and Volkswagen.
The company has been at the forefront of ML adoption for anomaly detection and forecasting. That’s particularly impressive when considering that up to 85% of machine learning projects ultimately fail to deliver on initial promises and goals.
At the Cognilytica “Data for the AI Community” seminar, head of data science at Shopify Dr. Ella Hilal spoke on how the e-commerce titan has scaled so effectively using machine learning (ML). Let’s take a look.
Step 1: Be Clear on Your Forecasting Needs
The fundamental step of leveraging ML is simple enough: Gain a top-level understanding of your forecasting needs.
Dr. Hilal suggests a few questions business leaders can ask to understand their forecasting needs:
- What decisions will be informed by this information?
- When are you making these decisions?
- How far into the future are you looking to forecast?
- How often are you making these decisions?
Realize that this holistic approach to creating a machine learning model will be applied in each step of the process.
Also see: Top Data Analytics Tools
Step 2: Understand Your Data
Dr. Hilal framed it clearly: “What data sources are available and what are their properties?”
Understanding the properties of your data sets is a key step to forecasting. Ask yourself foundational questions, such as whether data is univariate or multivariate. It’s also important to detect and eliminate non-stationary behavior in your time-series data.
“Nobody ever regrets spending the first few cycles of their effort toward a big model that informs big decisions by double-clicking on what data they have.”
In other words: It’s in your best interest to stay curious. It’s just as important to see which data sources are not available as the ones that are.
Understanding your data requires a top-level view of not only your operations but the market you operate in. This foundational knowledge is how data scientists can parse out anomalies and the specific reasons behind them.
Step 3: Choose the Right Forecasting Method
This is where the actionable steps behind forecasting begin. Dr. Hilal suggested several key points that forecasting models must have:
- Be easy to interpret.
- Possess the ability to be tuned easily.
- Be relatively fast.
- Provide automated forecasts.
- Manage multiple types of seasonality and recurring events.
- Manage missing observations or large outliers.
- Manage historical trend changes.
Shopify used Facebook Prophet, an open-source additive regression model. They specifically chose this model due to its scalability and its ability to generate forecasts quickly over millions of data points.
Another key decision to make when selecting the right forecasting model is whether to take a top-down or bottom-up approach. Top-down approaches look at top-line metrics first and analyze the drivers behind them, while bottom-up approaches analyze and track drivers first. Other questions to consider when selecting the proper model include:
- Single ML model or an ensemble approach?
- How often do you plan to retrain your model?
- Did you spend enough time tuning? Did you overfit?
Also see: Best Business Intelligence Tools
Step 4: Manage Anomaly Bias
“Anomalies are not all bad,” said Dr. Hilal. In fact, if it is effectively explained and perhaps even recurring, data scientists may want to amplify the anomaly to learn from it.
A large part of managing your anomaly bias is studying your data across time and analyzing which timeframes are indicative of future behavior. Shopify had a massive amount of data to select from, with very clear annual cycles of data in place. However, the team decided that the last three months were most indicative of their next performance.
Dr. Hilal also mentions that 2020 was, clearly, an anomalous year. However, businesses and data scientists should consider that anomalies and unique trends will always exist. This is why Shopify took 2021’s unique trends into account as well.
Step 5: Automate the Model
Finally, just like any other ML model in production, forecasting is truly effective when automated.
This automation should address a few points:
- How is your training data stored? Raw or transformed?
- How will you retrieve the data for training?
- How will you retrieve data for prediction?
- How do we get feedback from a model in production? Any systematic alerts?
- How are you tracking drift?
Remember, you ultimately want your model to be scalable and to work with little intervention – automation is valuable for independent operation.
Also see: Top Data Mining Techniques
Conclusion: Understand Your Decisions
It’s clear that the process behind leveraging ML for forecasting and anomaly detection at scale starts with a top-level understanding of your operations, needs, and industry.
In each step, Dr. Hilal emphasized the importance of stepping back and understanding the “what,” “how,” and “when” behind each decision.