Managing Data Products in Google Cloud's Knowledge Catalog

Google Cloud's Dataplex Knowledge Catalog finally makes data products first-class objects you can find, understand, and request access to — no stale wiki page, no week-long access ticket. I wired up a handful of data products recently, and tested access through a fictional banking chat front end. Let's dive in!

Finding a data product

The catalog opens on natural-language search across every project you can see, with quick filters along the top. One of those filters is Data Products.

Knowledge Catalog landing with the Data Products quick filter
The Knowledge Catalog landing screen — search across projects, with a one-click Data Products filter

Filter to that type and you get the shelf: every certified product, its description, and the system it lives on. This is the first quiet win — a person who has never met your team can discover what exists and read a one-line description before pinging anyone.

A list of data products filtered by type
Browsing the catalog filtered to Type: Data Product

Describing it — the tabs that matter

Open a product and you land on Overview, with a tab bar that is the whole governance surface: Overview, Assets, Access groups & permissions, Access requests, Contract, Aspects, and Insights.

A data product overview page
Overview: description, documentation, contract summary, owner contact, and the full tab bar

Overview carries the human story — description, documentation, a contract summary, the owner contact, and created/modified timestamps. Assets is the substance: the actual BigQuery tables and views the product is made of.

The Assets tab listing the product's tables
Assets: the BigQuery tables and views bundled into the product

Access groups & permissions is where sharing gets defined. You declare named consumer groups, and each asset shows the IAM role mapped to each group. This is the bridge from "a product" to "an actual grant."

Access groups mapped to IAM roles per asset
Access groups and the per-asset permissions they map to (here, read access for two consumer groups)

Aspects is the structured metadata — business domain, owner, steward, criticality, certification status, SLA, cost center. These are typed fields, not freeform tags, so you can govern and query them consistently across products.

The Aspects tab showing typed metadata fields
Aspects: typed governance metadata — domain, owner, steward, criticality, certification, SLA, cost center

Data quality, in the same place

You can stand up Dataplex data-quality and profiling scans alongside the catalog, point them at a product's tables, and publish the results onto the entry. The latest scan score, pass/fail, and rows-profiled then surface on the product — and, usefully, in search results — so the certification badge is backed by an actual number rather than a promise. Two caveats: the Contract tab and the query-recommendation side of Insights are Google-managed and not writable through the API, so those you curate in the console.

dq

Requesting and granting access

Here is the part that kills the ticket queue. A consumer hits Request access, picks the consumer group that fits, and writes a justification.

The request-access form
Requesting access: pick a consumer group, request for yourself, and add a justification for the owner

The owner sees it under Access requests and approves or rejects, with the justification attached for the audit trail.

The approver's Access requests queue
The owner's Access requests queue — approve or reject, with full request history
Approving a request
Reviewing and approving a consumer's request

Approval drops the requester into the consumer group, which already holds the mapped read role — so membership is the grant. One important guardrail: those groups are organization-scoped, so you can only grant requests to principals inside your org. A request for an outside account fails outright rather than silently widening your perimeter.

An access request for an out-of-org account failing
In-org only: a request for a principal outside the organization is rejected — groups don't accept external members

Approved consumers get an email confirming exactly what was granted, by whom, and when.

The approval notification email
The consumer's approval email — resource, access group, status, and approver, all logged

Exposing it in your own app

None of this has to live only in the cloud console. The catalog is API-addressable, so a front-end can let users discover products by description — pulling back domain, criticality, certification, contract version, data-quality, and owner for each hit.

A front-end app searching the catalog
A custom app querying the catalog: discover data products by description, with governance metadata inline

You can then gate features on the same product access. If a user lacks it, the app returns a friendly pointer to request it instead of a raw permission error.

An app denying access and linking to the request flow
Access-aware app: a user without the product grant is pointed to the request flow, not given a stack trace

Once granted, the same question runs — here, deposit activity by customer segment, answered through conversational analytics over the governed products.

Conversational analytics answering over the granted product
After the grant: the analyst asks in plain language and gets an answer over the governed data products

The masking gotcha nobody warns you about

If you protect columns with column-level security and dynamic masking, lower-privilege tiers see NULL for the masked fields — by design. The trap: a conversational-analytics agent reads those NULLs and confidently reports that the data is missing or the table is empty. It is not wrong about the NULLs; it is wrong about why. The fix is not an IAM change — it is a prompt change. You inject a masking directive into the system instruction before the question reaches the agent, telling it that NULLs in protected columns mean masked-at-your-access-level, and to answer with what is visible (counts, segments, distributions) instead of claiming the data is absent. Without that directive, your governance looks like a data outage.

Where this shines, and where it gets fuzzy

The wins are real: discoverability without tribal knowledge, self-serve request/approve with an audit trail, certification and data-quality in one pane, and access decoupled from hand-edited IAM. The friction shows up at the edges. A product that bundles tables from several domains has several rightful owners, so "who approves this request" gets blurry — overlapping assets and shared ownership are easy to model and hard to govern cleanly. And the tooling is early: there is no Terraform coverage yet, so most of this is set up through the REST API or the console, and the Contract and query-recommendation pieces are console-only for now.

None of that outweighs the core shift. The durable skill here was never writing the perfect data dictionary — it is making governed data findable and grantable by the people who need it, in a place they already are. That is the part AI can't paper over, and it is exactly where data engineers stay valuable.

0 Comments

Leave a Comment