Back to Projects
Project

Building a Resume-Native AI Assistant for My Portfolio

Building a Resume-Native AI Assistant for My Portfolio

Designing an assistant that answers from structured portfolio data, uses semantic retrieval over resume content, and safely proxies GitHub Models through Cloudflare Workers.

aigithub-modelscloudflare-workersretrievalnextjsportfolio

Context

I wanted the portfolio to do more than display information. I wanted it to answer for itself.

Not with a generic chatbot and not with a prompt that tries to improvise from thin air. The goal was narrower and more useful: let a visitor ask questions about my experience, projects, technical strengths, and background, and get answers grounded in the actual content already published on the site.

That design constraint shaped the entire implementation.

The assistant is not a toy layer on top of a marketing site. It is a retrieval system built around structured resume data, deterministic guardrails, and a thin production-safe proxy to GitHub Models.

What I Built

The result is an AI assistant embedded directly into the portfolio experience that:

  • loads a generated static resume dataset from the site itself
  • converts that content into searchable snippets
  • answers common questions locally when a deterministic path is safer and faster
  • ranks the most relevant context using embeddings and keyword fallback
  • folds recent conversation turns into retrieval so follow-up questions still make sense
  • keeps client-side embeddings fresh with cache versioning and TTL invalidation
  • calls GitHub Models through a Cloudflare Worker proxy
  • returns concise answers with citations tied to the underlying portfolio content
  • links cited projects, articles, and case studies back to their actual pages when possible

In practical terms, visitors can ask questions like:

  • What kind of backend systems has he built?
  • Has he worked on payment infrastructure?
  • What technologies does he use most often?
  • What is his current role and background?

That sounds straightforward. The real work is in making the answers accurate, constrained, fast, and maintainable.

The Core Design Decision

The most important decision was to treat the portfolio as a data product, not just a set of pages.

Because this site is statically exported, the assistant cannot rely on traditional server-side Next.js API routes. Instead, the content pipeline generates a static resume payload at build time and publishes it as a site asset.

That gave me a clean contract:

  1. author content once in the portfolio
  2. normalize it into a structured resume payload
  3. let the assistant reason only over that approved dataset

This is a much better engineering tradeoff than scraping rendered HTML or letting a model guess from partial page context.

Abstract Architecture Diagram

Visitor Question
      |
      v
Portfolio Assistant UI
      |
      +--> Load /api/resume.json
      |
      +--> Build normalized snippets
      |
      +--> Try local deterministic answer
      |
      +--> Retrieve relevant context
             | 
             +--> Embeddings available -> semantic ranking
             |
             +--> Otherwise -> keyword ranking fallback
      |
      v
Cloudflare Worker /assistant
      |
      +--> origin validation
      +--> payload validation
      +--> rate limiting
      +--> token isolation
      |
      v
GitHub Models
  - embeddings
  - chat completions
      |
      v
Structured JSON response + citations
      |
      v
Grounded answer rendered in the portfolio

Request Lifecycle Diagram

The full request path is a little more interesting than the high-level diagram suggests, because there are several control points before the model is ever asked to answer.

1. Visitor opens the footer assistant drawer
   |
   v
2. Client loads /api/resume.json
   |
   +--> generated at build time from content/about, content/home,
   |    content/settings, content/recommendations, and content/posts
   |
   v
3. Client normalizes content into snippets
   |
   +--> summary, about, skills, links, contact
   +--> experience and education entries
   +--> projects, articles, case studies, recommendations
   |
   v
4. Client computes a content hash
   |
   +--> used as the cache key for snippet embeddings
   +--> cache entries also carry a version and TTL so stale vectors can expire
   |
   v
5. Visitor submits a question
   |
   +--> local guardrails check scope and reject off-topic or adversarial prompts
   |
   +--> local deterministic answer path tries known patterns first
   |    - current role
   |    - education
   |    - common links
   |    - top technologies
   |    - payment-related experience
   |
   +--> if local answer exists, return immediately with citations
   |
   v
6. Retrieval phase builds the search intent
   |
   +--> combine current question with a small window of recent conversation
   +--> preserve who said what so follow-up turns keep their meaning
   |
   v
7. Retrieval phase selects supporting evidence
   |
   +--> embeddings ready
   |    - embed the conversation-aware retrieval query
   |    - score all snippets by cosine similarity
   |
   +--> embeddings unavailable
   |    - fall back to keyword overlap scoring
   |
   v
8. Top-ranked snippets only are sent to the model
   |
   +--> recent chat context is included
    +--> response is constrained to a JSON schema
    |
    v
9. Cloudflare Worker /assistant
   |
   +--> validate allowed origin
   +--> enforce POST + application/json
   +--> parse and validate request payload
   +--> apply per-IP rate limiting
   +--> keep GitHub Models token on the server boundary
   |
   v
10. GitHub Models
   |
   +--> embeddings endpoint for semantic retrieval
   +--> chat completions endpoint for grounded answer generation
   |
   v
11. Client validates model output
    |
    +--> parse JSON
    +--> enforce answer shape
    +--> discard invalid citation IDs
    +--> downgrade unsupported answers to "missing"
    +--> if the model path fails but retrieval is strong, show a closest-match fallback
    |
    v
12. UI renders final answer + snippet citations

Why the Resume API Matters

The assistant is powered by a generated resume.json representation of the site content. That payload aggregates the information that actually matters to a recruiter, hiring manager, or engineering leader:

  • summary and about information
  • experience history
  • education
  • links
  • projects
  • articles and case studies
  • recommendations and proof points

That means the assistant is reasoning over normalized, intentional content rather than a noisy DOM.

From an architecture perspective, this matters for three reasons:

  • consistency: the UI and the assistant share the same source of truth
  • reliability: the payload shape is stable and easy to evolve
  • observability: retrieval logic can work on well-defined content chunks instead of ambiguous page fragments

Embeddings and Retrieval Strategy

This is where the assistant becomes meaningfully useful.

I break the resume payload into small, semantically focused snippets. Each snippet represents a coherent unit such as:

  • a role in my experience timeline
  • an education entry
  • a project or article
  • a recommendation
  • a skills or summary block

Those snippets are then embedded through GitHub Models and ranked against the user’s question using vector similarity.

Retrieval flow

  • the client loads the resume data
  • it builds a snippet set and computes a hash of the content
  • snippet embeddings are cached locally by model and content hash
  • when a user asks a question, the question is embedded
  • the system selects the highest-signal snippets and sends only that context to the model

This does two things well:

  1. it improves answer quality because the model is given the most relevant evidence instead of the whole resume
  2. it keeps the design efficient by avoiding unnecessary repeated embedding work for unchanged content

The hash-based cache boundary matters more than it may seem. It means embeddings are tied to the exact published content state. If the resume changes, the cache key changes too, so the assistant does not accidentally reuse vectors generated for older content.

I also added a keyword-based fallback path. If embeddings are unavailable for any reason, the assistant still works by ranking snippets through lexical overlap. That is an important operational detail: degraded mode is still useful mode.

There is now a second cache-safety layer as well: embeddings stored in the browser are namespaced by a cache version and expire after a TTL window. That matters when the content shape, ranking strategy, or embedding handling changes. A content hash prevents wrong reuse across resume changes, while versioning and TTL help retire older cached vectors after retrieval logic improves.

Multi-Turn Retrieval, Not Just Multi-Turn UI

One subtle weakness in many portfolio assistants is that they only use chat history for display, not for retrieval.

I wanted follow-up questions to work more naturally.

So the assistant now uses a small window of recent conversation turns when building the retrieval query. That means questions like:

  • What was his first project?
  • What about his recent work?
  • Did any of that involve payments?

can be interpreted in relation to the previous turns rather than as isolated fragments.

This does not mean the model gets free rein to improvise from conversation memory. The important detail is that retrieval itself becomes context-aware. The system still goes back to the published resume dataset, but it does so with a better understanding of what the visitor is referring to.

The conversation is also persisted locally in the browser, with message role information preserved, so both the UI and retrieval layer know which turns came from the visitor and which came from the assistant.

Citations That Resolve Back to Real Content

Early citations were useful as proof, but they were still mostly internal identifiers.

I tightened that up so references to projects, posts, and case studies can carry real URLs. When the assistant cites one of those snippets, the UI can render a direct link back to the corresponding page on the portfolio.

That improves two things:

  • the answer feels more inspectable because a visitor can immediately open the source material
  • the assistant becomes a navigation surface, not just a response surface

That is an underrated product detail. In a portfolio, a good answer should often lead the visitor deeper into the work.

Better Failure Handling When Retrieval Is Strong but Generation Fails

Another practical improvement was separating retrieval failure from generation failure.

Those are not the same problem.

If the model request fails because of a rate limit or temporary upstream issue, but retrieval already found a strong matching snippet, the assistant now has a more graceful degraded path. Instead of immediately saying the information is unavailable, it can return the closest relevant reference and make that uncertainty explicit.

The wording is intentionally careful. It does not pretend to be a full grounded answer if the full answer path did not complete. It says, in effect: here is the closest relevant source I could find.

That is a much more honest fallback than collapsing every upstream failure into "I don't know."

A Safer Model Boundary with Cloudflare Workers

I did not want model credentials exposed in the browser, and I did not want the frontend talking directly to a third-party inference endpoint.

So I added a Cloudflare Worker route that acts as a narrow proxy for the assistant.

The worker handles:

  • origin validation so requests only come from approved sites
  • schema validation for both chat and embeddings payloads
  • rate limiting to prevent abuse, with a bounded per-IP request window
  • secret isolation so the GitHub Models token never leaves the server boundary
  • uniform response handling across embeddings and chat completions

In practical terms, the Worker is not trying to become an AI orchestration layer. It is intentionally narrow. The frontend decides what context to send, and the Worker decides whether the request is valid and safe to proxy.

This is a simple pattern, but it is the right one. It turns a frontend AI feature into a controlled service interface rather than an open client-side experiment.

Why GitHub Models

GitHub Models was a strong fit for this implementation because it provided a clean way to call both embeddings and chat inference through a familiar developer workflow.

That let me keep the integration focused on product and architecture concerns:

  • retrieving the right context
  • structuring the prompt correctly
  • validating the model output
  • controlling cost and abuse boundaries

In other words, the interesting engineering problem here was never “how do I call an LLM API.” It was “how do I make the model answer the right question from the right evidence in a portfolio setting.”

Guardrails and Deterministic Shortcuts

One of the mistakes people make with assistants is sending every query directly to the model.

I took a more disciplined approach.

Before the remote model is involved, the assistant first checks for:

  • off-topic or adversarial prompts
  • prompt-injection style instructions
  • questions that can be answered deterministically from known patterns

That means common questions like current role, core skills, education, or portfolio links can often be handled immediately without a full model round trip.

This improves latency, reduces token usage, and tightens correctness.

It also gives the assistant a more senior-engineered feel: use the model where it adds value, not where a clear rule-based answer already exists.

There is another important constraint in the implementation: even when the model is used, an answer is only accepted as a real answer if it comes back with valid citations that map to the provided snippet IDs. If the model returns an unsupported answer shape or cites content outside the retrieved context, the client treats that as a missing-information case instead of trusting it.

For development, I also added a local debug view that exposes the retrieval mode, the built retrieval query, the top-ranked snippets, and whether the closest-match fallback was used. That makes it much easier to inspect why a question failed, whether the wrong snippets were selected, or whether the model path failed after retrieval had already succeeded.

Engineering Tradeoffs

There are a few deliberate tradeoffs in this design.

1. Static data over live repository introspection

The assistant answers from the published site content, not from repo activity, commit history, or unstated knowledge. That makes the experience more trustworthy even if it is narrower.

2. Retrieval-first over open-ended conversation

I optimized for factual grounding rather than personality. The assistant is there to help a visitor understand my work quickly and accurately.

3. Progressive enhancement over hard dependency

If embeddings fail, the assistant does not disappear. It falls back gracefully and still answers from keyword-ranked evidence. If the generation path fails after retrieval succeeds, it can still surface the closest grounded reference instead of throwing everything away.

4. Structured output over free-form generation

The response is expected in a constrained JSON shape with answer status and citations. That gives the UI something deterministic to render and makes failure modes easier to manage.

5. Thin worker over backend sprawl

I deliberately kept the backend boundary small. The Worker does not own retrieval, snippet construction, or portfolio-specific business logic. That keeps the moving parts understandable and makes the production surface easier to secure.

6. Freshness controls over forever-cached vectors

Client-side caching is great for latency, but AI features age in more ways than plain assets do. The underlying content can change, the embedding model can change, or the ranking logic can improve. That is why the assistant uses a content hash, a cache version, and a TTL together instead of assuming embeddings in local storage should live forever.

What I Like About This Build

What I like most is that the system stays honest.

It does not pretend to know more than the portfolio actually says. It does not blur the line between retrieval and invention. And it does not rely on backend sprawl to deliver a relatively small but high-leverage user experience.

This is the kind of AI integration I find compelling in real systems: bounded, observable, useful, and aligned with the actual product surface.

Closing Thoughts

There is a broader lesson here.

Good AI product engineering is rarely about adding a model to the stack and calling it innovation. The real work is in designing the data boundary, controlling failure modes, choosing where determinism should win, and making sure the system remains legible when something goes wrong.

That is what this assistant represents for me.

A portfolio is supposed to communicate capability. Building a resume-native assistant on top of structured content is a strong way to demonstrate not just that I can use AI tools, but that I can integrate them with sound engineering judgment.