What 'analytics-ready data' actually means — and how dbt gets you there
by QuackyData, Data Engineering
"Analytics-ready" gets used as if it's obvious, but it hides a real distinction. The data your tools produce and the data you can safely put in front of an executive are two very different things. The gap between them is modeling, and ignoring it is why so many companies have dashboards nobody believes.
Let's make the term concrete.
Raw data is a record, not an answer
When Airbyte pulls data from Stripe, your product database, and your CRM, what lands in storage is a faithful copy of each source — and that's exactly the problem. Raw data reflects how each tool happens to store things, not how your business thinks about them.
Raw data is typically:
- Shaped for the source system, not for analysis. A Stripe charge and a Salesforce opportunity describe the same deal in completely different vocabularies.
- Messy at the edges. Duplicate rows from retried syncs, nulls where you expected values, timestamps in three time zones, test records mixed in with real ones.
- Disconnected. The customer in your billing system and the account in your CRM are the same company, but nothing in the raw data says so.
You can query raw data, but every analyst who does will make slightly different cleanup decisions — and that's exactly how you end up with three revenue numbers in one meeting. Raw data is the right thing to keep and the wrong thing to report from.
The three layers: raw, staging, marts
Analytics-ready data is built in layers, each with one job. This is the structure we implement with dbt on top of the lakehouse.
Raw
The untouched landing zone — an exact copy of each source. We never edit it. It's the audit trail, and the ability to rebuild everything downstream from it is what makes the whole pipeline reproducible.
Staging
One cleaning step per source, sitting directly on top of raw. Staging models do the unglamorous, essential work: rename cryptic columns to consistent names, cast types correctly, standardize timestamps to one time zone, drop test records, and deduplicate. Crucially, staging doesn't combine sources or apply business logic. Each staging model still maps one-to-one to a source — just a clean version of it.
Marts
The business layer. Marts join the cleaned sources together and encode the definitions your company actually uses: what counts as an active customer, how revenue is recognized, what a "qualified lead" is. A mart is purpose-built — a customers table, a daily revenue table, a subscriptions table — and it's what dashboards and analysts read from.
The discipline matters: business logic lives in marts, cleaning lives in staging, and history lives in raw. When someone asks "where does this number come from?", there's a single, traceable path instead of a tangle of one-off queries.
Tests and freshness: how data earns trust
Structure organizes the data. Tests are what make it trustworthy. With dbt, the rules a dataset must satisfy live in code, right next to the models, and they run on every pipeline execution:
- Uniqueness — every customer appears once, so you don't silently double-count revenue.
- Not-null — required keys and fields are present, so joins don't quietly drop rows.
- Accepted values — a subscription status is one of the values it's allowed to be, catching surprises from upstream changes.
- Relationships — every order points to a customer that actually exists, so nothing is orphaned.
- Freshness — sources have updated recently, so you know whether a flat dashboard means flat sales or a broken sync.
When a test fails, the pipeline flags it before bad data reaches a dashboard. That's the whole point. Untested data can be wrong without anyone noticing; tested data tells you the moment it stops being reliable.
What trustworthy data looks like in practice
For a business, analytics-ready data shows up as a set of quiet properties you stop having to worry about:
- One number per metric, with one agreed definition behind it.
- A clear answer to "where did this come from?" for any figure on any dashboard.
- Confidence that today's data is actually today's, not a silently stale copy.
- New questions answered by extending well-built marts, not by re-cleaning raw data from scratch every time.
That's the real deliverable. Dashboards are the visible part, but the trust under them is the product — and modeling is how you build it.
The takeaway
Analytics-ready data isn't a tool you buy; it's a structure you maintain — raw kept honest, staging kept clean, marts kept meaningful, and tests keeping all of it accountable. Done well, it fades into the background and people simply trust the numbers. Building and running that modeling layer is exactly the work we take on, so trustworthy data becomes something your team uses rather than something it argues about.