API Reference

Overview

The PostGrad Knowledge API provides programmatic access to structured business knowledge extracted from real meetings. All endpoints return responses in a consistent JSON envelope format.

Base URL: https://postgrad.io/api/v1

Authentication: Most endpoints require a Bearer token (API key) in the Authorization header. See Authentication for details.

Public endpoints (no auth required):

List Categories -- discover available knowledge categories
Get Stats -- knowledge base statistics

Authenticated endpoints:

List Knowledge -- paginated, filterable knowledge entries
Search Knowledge -- full-text search across entries
Get Knowledge by ID -- retrieve a single entry
Get Usage -- current period API usage

Response Envelope

All responses follow a consistent envelope:

{
  "data": "...",
  "pagination": { "total": 100, "limit": 20, "offset": 0, "has_more": true },
  "meta": { "queries_used": 42, "queries_remaining": 958 },
  "error": null
}

data contains the response payload (array or object)
pagination is present on list endpoints, null otherwise
meta includes usage counters on authenticated endpoints, null on public endpoints
error is null on success; on failure, data is null and error contains code and message

This envelope is a stable public contract. Field additions are backwards-compatible (additive); field removals or rename go through a deprecation window with the alias kept for at least one minor release.

Knowledge entry fields

Every knowledge entry returned by /knowledge, /knowledge/search, /knowledge/recent, and /knowledge/{id} follows the same canonical shape:

Field	Type	When set	Notes
`id`	string (uuid)	always	Stable identifier — safe to cache and reference.
`feed_id`	string (uuid)	always	Always populated, including on `?feed=all` cross-feed responses.
`feed_name`	string	always	Human-readable name of the originating feed.
`feed_slug`	string	always	URL-safe slug of the originating feed. Accepted in `X-PostGrad-Feed`.
`title`	string	always	Short headline (≤ 100 chars typical).
`category`	string	always	Normalized to lowercase + underscores (e.g. `deal_evaluation`).
`tags`	string[]	always	Free-form tags.
`content`	string	always	Main body. Can be long — paginate if needed.
`confidence`	number	always	0–1. Static extraction confidence assigned at ingest. Not a relevance score.
`score`	number	search responses only	Public relevance score. Equals `sort_score` (post-freshness adjustment). Clamped to [0,1]. Omitted on list / get-by-id responses — there's no query to score against.
`permalink`	string \| null	always	Direct dashboard URL for sharing / citation.
`created_at`	string (ISO 8601)	`verbosity=full` only	When the entry was first ingested.
`updated_at`	string (ISO 8601)	`verbosity=full` only	When the entry was last revised.
`raw_score`	number \| null	`verbosity=full` search responses only	Pre-adjustment relevance score on the mode's native scale (raw `ts_rank` for keyword, cosine distance for semantic, raw RRF for hybrid). Useful only when re-ranking without the freshness penalty. Omitted on list / get-by-id responses.
`sort_score`	number \| null	`verbosity=full` search responses only	Equals `score`. The number that actually drove the API's sort order. Omitted on list / get-by-id responses.
`freshness_multiplier`	number \| null	`verbosity=full` search responses only	Multiplier applied to derive `sort_score` (e.g. `0.75` for `stale`-tagged entries). `null` when no adjustment fired. Was `freshness_penalty` in earlier versions — both keys ship for one release cycle. Omitted on list / get-by-id responses.

compact is the default verbosity for MCP and small REST defaults; pass ?verbosity=full to surface created_at, updated_at, raw_score, sort_score, and freshness_multiplier. List + get-by-id responses omit score, raw_score, sort_score, and freshness_multiplier entirely — there's no query to score against, so the fields would always be null (noise on the wire).

Which score should I sort on?

Always score. It's the post-adjustment number that drove the API's own sort order, clamped to [0,1], and on the same scale across all three modes. If you set a threshold like score >= 0.7 and later swap modes, the threshold still means roughly the same thing.

Mode	`score`	`raw_score`	`sort_score`	When they diverge
`keyword`	clamped `ts_rank`	same as `score`	same as `score`	Only when a freshness multiplier fires — then `sort_score < raw_score` and `score === sort_score`.
`semantic`	clamped cosine similarity	raw cosine distance (lower = closer)	similarity × freshness_multiplier	`raw_score` is on a different scale (distance, not similarity) — never re-sort using `raw_score` directly.
`hybrid`	normalized RRF	raw RRF	RRF × freshness_multiplier	Same shape as semantic; sort on `score`.

If freshness_multiplier is null, no adjustment fired — score, raw_score, and sort_score are effectively identical (modulo the mode-specific scale of raw_score). The three-field surface is only useful when you want to re-rank without the freshness penalty — drop the multiplier, use raw_score, sort yourself.

Important about within-mode distributions: the [0,1] scale is consistent across modes, but the typical distribution differs. Keyword scores cluster low (0.2–0.4 is a strong match). Semantic similarity tends to cluster in 0.55–0.75 on natural-language queries (the answer is in there if the topic matches). Hybrid mirrors semantic but with a top-result-as-1.0 normalization, so the headline scores look higher even when the underlying ranking is the same. A score >= 0.7 threshold will keep different absolute numbers of results in each mode, but the relative ranking within a single response is what you should trust.

Which mode should I pick?

All three modes are available on every tier. Mode is a per-query tuning choice:

keyword (default): exact-word ts_rank. Best when you know the term ("Supabase", "RAG"). Fastest, no embedding compute. Misses when the user phrases naturally.
semantic: vector similarity over the entry's content. Best for natural-language questions where the user wouldn't use the same words as the entry ("how do I tell if a client is about to churn"). The mode that makes the product feel agent-native.
hybrid: reciprocal-rank fusion of keyword + semantic. In practice we find it's near-identical to semantic on clean operator content; it can win when content has mixed sizes (long reference dumps + short operator notes) by re-balancing toward shorter precise matches. Worth A/B-ing for your specific content; not universally better.

Default to semantic for natural-language agent queries. Use keyword when you're searching for a specific term. Try hybrid if your content varies wildly in length.

`confidence` vs `score` — two different signals, both on every entry

These are the two most commonly confused fields. They answer different questions:

confidence — How sure are we that this entry is true? Set at ingest by the extraction pipeline (reinforced upward when multiple meetings agree). It's a static, per-entry rating — the same 0.85 whether the entry surfaces for the query "pricing" or for "team management." Think of it as the licensor's trust score for the insight itself. Use it to filter low-quality matches with confidence_min=0.7.
score — How well does this entry match this specific query? Set per-query by the search ranker — pure tsrank for keyword mode, cosine similarity for semantic, RRF for hybrid, all clamped to [0,1] and adjusted by freshness_multiplier to derive sort_score. Different on every query. Always null on list/get-by-id responses (there's no query to score against).

Sort, filter, and threshold against the field that matches your use case: confidence for "show me high-quality results," score for "show me the most relevant results to this query." They are independent — a high-confidence entry can be a poor match for a given query, and vice versa.

Error envelope

On failure (error is not null), the response body shape is:

{
  "data": null,
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Rate limit exceeded. Try again shortly.",
    "details": { "limit": 10, "remaining": 0, "resetAt": 1781897700 }
  }
}

Common error codes:

Code	HTTP	Meaning
`UNAUTHORIZED`	401	Missing / malformed API key.
`INVALID_API_KEY_FORMAT`	401	Key doesn't match `pg_live_*` shape.
`INVALID_FEED`	400	`X-PostGrad-Feed` value is neither a UUID nor a valid slug.
`FEED_NOT_SUBSCRIBED`	403	Valid feed exists but the caller isn't subscribed.
`FEED_NOT_FOUND`	404	No feed exists with the given id (MCP only).
`CATEGORY_RESTRICTED`	403	Scoped key tried to access a category outside its allowed set.
`TIER_INSUFFICIENT`	403	Requested search mode (semantic / hybrid) above the key's tier.
`VALIDATION_ERROR`	400	Bad query param (e.g. `limit=99999`, malformed UUID).
`RATE_LIMIT_EXCEEDED`	429	Per-minute or monthly quota hit. `Retry-After` header indicates seconds to wait.
`METHOD_NOT_ALLOWED`	405	HTTP method not supported on this endpoint. `Allow:` header lists accepted methods.
`INTERNAL_ERROR`	500	Unhandled server failure. Includes a server-side log id where possible.

Rate-limit headers

Every authenticated response (success or error) carries:

X-RateLimit-Limit — per-minute cap for the caller's tier.
X-RateLimit-Remaining — calls remaining in the current minute.
X-RateLimit-Reset — Unix epoch seconds when the window resets.
X-Monthly-Quota-Limit / -Used / -Remaining — monthly counters.

429 responses additionally carry Retry-After (seconds to wait — Unix-spec, not millis).

Public endpoints (/feeds/catalog, /stats, /categories) don't emit these headers — there's no per-caller quota to report. They are cached at the edge for 5 minutes (Cache-Control: public, max-age=300) and rate-limited only at the network level.

HTTP methods

Read endpoints accept GET (and HEAD / OPTIONS for preflight). The three search endpoints additionally accept POST with a JSON body whose keys mirror the query params:

# These two are equivalent
curl "https://postgrad.io/api/v1/knowledge/search?q=pricing&limit=5"

curl -X POST https://postgrad.io/api/v1/knowledge/search \
  -H "Content-Type: application/json" \
  -d '{"q": "pricing", "limit": 5}'

Other methods (PUT, PATCH, DELETE) return 405 Method Not Allowed with the canonical error envelope and an Allow: header listing the methods this endpoint actually accepts.

Per-endpoint `context` block

Every response (success or error) includes a context field. Its contents depend on the endpoint + the variant of the call:

Feed-scoped responses (`X-PostGrad-Feed: <uuid>` or slug)

{
  "context": {
    "feed": {
      "id": "f1a2c3d4-...",
      "name": "Agency Growth Playbook",
      "slug": "agency-growth-playbook",
      "provider": "Big Steele"
    }
  }
}

Cross-feed responses (`X-PostGrad-Feed: all` or omitted)

{
  "context": {
    "all_feeds": true,
    "feeds_searched": [
      { "feed_id": "f1...", "name": "Agency Growth Playbook", "slug": "agency-growth-playbook" },
      { "feed_id": "f2...", "name": "Tech Stack Decisions",   "slug": "tech-stack-decisions" }
    ]
  }
}

The response additionally carries these headers (always emitted on feed=all):

X-PostGrad-Feeds-Searched — count, mirrors context.feeds_searched.length.
X-PostGrad-Feeds-Truncated — count of subscribed feeds beyond the per-request fan-out cap (currently 20). Present only when truncation actually occurred.
X-PostGrad-Feed-Source — scoped / auto-selected / all-feeds / header-uuid / header-slug. How the server resolved the feed scope.

Catalog / stats responses (public)

{
  "context": {
    "catalog": { "total": 9, "cached_seconds": 300 }
  }
}

context.cached_seconds is present on every cached public endpoint so consumers can decide whether to refresh.

Search response `data` shape — extra fields on `feed=all`

Cross-feed responses include these top-level fields alongside data (which is still an array of canonical entry objects):

Field	Type	Notes
`mode_served`	`'keyword' \| 'semantic' \| 'hybrid'`	The mode the server actually ran. Equals `mode` from the request unless tier policy downgraded it.
`fallback`	`boolean`	`true` when `mode_served < mode_requested` (defensive downgrade for unknown/legacy tiers). On Starter/Pro/Scale this is always `false` — current tier policy uses hard 403 `TIER_INSUFFICIENT` instead of silent fallback. Kept on the envelope for forward compatibility.
`feeds_searched_count`	number	Same as `context.feeds_searched.length`. Redundant by design — easier to read on a flat envelope.
`dupes_dropped`	number	Same entry id surfaced by multiple feeds → keep one, count the rest.
`all_feeds`	`true`	Always `true` on the cross-feed branch (single-feed responses omit this).

Catalog feed shape

The /feeds/catalog and /feeds/{slug-or-uuid} endpoints share most fields. Catalog returns a list; the detail endpoint adds recent_entries[] + categories_with_counts[] + provider_bio for the eval-before-subscribe surface.

Field	Type	Notes
`id`	uuid	Stable identifier.
`slug`	string	URL-safe slug. Accepted in `X-PostGrad-Feed`.
`name`	string	Human-readable feed name.
`description`	string	One-paragraph marketing description from the licensor.
`categories`	string[]	Up to 12 categories (catalog) or all categories (detail). See `categories_total` for the un-truncated count.
`categories_total`	number	Catalog only. Total distinct categories on this feed; `categories[]` is truncated to 12.
`price_monthly`	number	USD/month. 0 for free feeds.
`is_curated`	boolean	Legacy flag — see `source_type` instead.
`source_type`	enum	See below.
`provider_name`	string \| null	The licensor's name when `source_type='expert'`; null otherwise (no individual operator on compiled/demo feeds).
`entry_count`	number	Total published entries.
`sample_titles`	string[]	Catalog only. 10 recent titles, filtered for placeholder noise.
`last_updated_at`	timestamp \| null	Most recent entry's `updated_at`. Use this to tell "actively maintained" from "abandoned."

`source_type` enum

The marketplace tier signal. Drives the visual badge on /marketplace and is the right field to filter on for RAG consumers selecting feeds.

Value	Meaning
`expert`	Operator-authored from the licensor's own experience. `provider_name` is populated. Highest-trust tier.
`compiled`	Aggregated by the PostGrad pipeline from public sources (news, articles, transcripts). Summaries + links, not full content. `provider_name` is null.
`demo`	Seed/example content for trying the platform out. Not a real operator's knowledge. Avoid for production RAG.

Filtering the catalog

GET /api/v1/feeds/catalog accepts three optional query params:

?categories=sales_process,deal_evaluation — comma-separated. Feed must contain ALL listed categories in its categories[] (logical AND). Matches whole category names exactly (case-insensitive); use /api/v1/categories to discover valid names.
?source_type=expert — restrict to one tier (expert | compiled | demo).
?q=growth — case-insensitive substring match against name + description.

Unknown query params are silently ignored — tracking params like ?utm_source=... won't 400. Response context.catalog.filter_applied is true when any filter narrowed the result, and total_before_filter exposes the unfiltered count for "0 of 9 feeds match" UX.

API Reference

On this page