Data at Instabee
Instabee simplifies the way consumers and merchants ship and receive parcels. With a presence in six countries, and serving thousands of online merchants, including ASOS, Zalando, Inditex, and H&M, Instabee is on track to become the leading European e-commerce platform, reaching more than 45 million consumers across Europe.
Instabee is taking a technology-first approach to online shipping and near real-time data is used for key processes and decisions – from operational data such as understanding lead time, volume, and forecasting to financial data on what is revenue-generating versus not.
Industry | Data stack | Company size |
---|---|---|
Logistics, eCommerce | Snowflake, dbt Cloud, SYNQ, Tableau | + 1,600 |
Key challenges
- Missing or inaccurate data in dashboards used by terminal managers frequently impacted key dashboards and operational KPIs, putting merchant retention at risk
- Instabee had no central documentation or visibility into data assets, code, and how they connect, which created a lack of trust
Key wins
- By combining dbt tests with SYNQ anomaly monitors, Instabee significantly reduced issue detection time and now resolves most operational issues within 5 minutes
- Established clear data product ownership and a live health overview. Teams across business functions, like finance and operations, are directly notified of relevant data issues
- Migrating critical models into dbt reduced processing time from 8 hours to 1 hour for key models, enhancing efficiency and enabling faster insights for the business
”If we lose too many merchants due to poor data quality, we lose business—and then our jobs. It’s that simple” – Head of Data
Adopting dbt Cloud and SYNQ to build reliable operational data products
After the Instabox and Budbee merger, Instabee wanted a best-in-class, modern data platform and adopted Snowflake as the data warehouse, Fivetran for ingesting data, Tableau for BI, dbt Cloud for data modeling and documentation, and SYNQ for data observability.
”We have data people working across the spectrum – from analysts doing ad-hoc and strategic analysis, analytics engineers building data models, data scientists running ML models and forecasting algorithms, and tech teams relying on data. Our platform needed to be able to support all of these stakeholders” – Josefin, Analytics Engineer
Building reliable, well-architected, and documented data models in dbt
The team knew right away that they wanted dbt Cloud as part of the stack. Today, all central reporting and data modeling rely on dbt Cloud. Dozens of jobs run everything from the daily production run to forecasting models.
Instabee has taken a deliberate approach to data modeling, creating consistent layers in the data architecture to make sure sources, staging models, and data marts are distinguishable. “This makes it faster to add new models and easier to reason about potential root causes and see how different tables connect in the lineage.”
dbt Cloud is the control plane to develop, deploy, and document data models. dbt Explorer is the source of truth for understanding and exploring data models.
“We are rigorous in documenting our data models and fields in dbt metadata and use dbt Explorer across the company to expose documentation. This helps everyone relying on data go to one place to find the information they need,” Josefin says. As a consequence, Instabee has significantly improved the data trust and transparency for everyone outside the data team.
The development workflow has been sped up drastically since adopting dbt Cloud. The team relies heavily on features such as version control to review new changes, SQLFluff for linting for faster development, CI/CD to make sure changes are tested before deployment, and dbt Explorer’s built-in lineage to understand dependencies.
”We recently moved a “black box” critical finance model into dbt. Since then, everyone can see all dependencies in the lineage and we’ve reduced the time it takes to build the model from 8 hours to 1 hour helping us save money and reduce the time to insight”
Using data products to manage business-critical data
“Data products have become the lens through which we evaluate and reason about our most important business processes in data. We group data products into areas such as BI and Finance and can instantly see if there are any errors on or upstream of data products,” says Josefin.
”The Data Product overview in SYNQ is the first page we open each morning to check if all the nightly runs have run successfully or if there are any errors across dbt and SYNQ anomaly monitors impacting our key data products”
“Each data product has a priority ranging from P1 to P3 based on its importance and we use that to decide how urgently we treat issues. With this at hand, the data team brings transparency to the business – from discovering and understanding data assets in dbt Explorer to seeing a live overview of the health of data products in SYNQ.”
Data products are closely tied to ownership. The Analytics Engineering team is notified of issues on core models in the data warehouse. But ownership isn’t limited to just the data team. The finance team is notified in the #finance-data-quality-monitoring
Slack channel if there are issues with finance data products. They’ve also extended it to operational use cases where key models rely on manual input data from spreadsheets for fuel data – input issues on these spreadsheets trigger not_null
or unique
dbt test errors and are routed directly to the operations manager responsible for the spreadsheets.
The ‘Data Product Reliability Workflow’ with dbt Cloud and SYNQ
Early on, the data team at Instabee knew they wanted data tooling on par with what they had in engineering – especially when detecting and resolving issues fast and learning from incidents.
Reduced time to detection – “We take testing seriously and learned that the best way to catch both ‘known unknowns’ and ‘unknown unknowns’ was to combine dbt tests with automated SYNQ anomaly monitors. Today, we run more than 1,000 dbt tests each day and combine that with 600 SYNQ anomaly monitors running on key sources and tables every 30 minutes. Combined, these help us be the first to know about issues in most cases”.
Reduced time to resolution – The restructured dbt architecture makes it easier to trace back issues to source systems and model our data. The combination of dbt Cloud and SYNQ has significantly sped up debugging workflows. “Especially SYNQ’s column-code lineage and the ability to select multiple columns gives us a really good idea about how everything is connected,” says Josefin.
”We are now able to solve most issues within 5 minutes of learning about them – something that could have taken us hours in the past. This is a significant factor in us retaining key merchants” – Head of Data
Learning from incidents – The team relies on SYNQ’s incident management functionality as a knowledge base to follow up and track issues across their data stack. “Some incidents tend to reoccur so it shortens our time to resolve and ability to mitigate issues when we have a log of previous issues and incidents”.
Incidents managed in SYNQ span both dbt test and model errors, and SYNQ anomaly monitors
”We typically declare a handful of incidents in SYNQ each month. Over time, this has become our knowledgebase with valuable information on how we tackled issues the last time they occurred”
“Since adopting dbt Cloud and SYNQ, we rarely have data issues impact of operational KPIs, our team has been freed up from the majority of firefighting and we have created the transparency we needed – from documentation to cross-team ownership,” says Josefin.
Want to learn more about data at Instabee? Watch their talk from the 2024 Data Innovation Summit on ‘Creating a culture of shared ownership & 5-minute issue resolution times’