1 week ago 23 hours ago

Tutorial: Make Your Site Queryable with NL Web

NL Web (Natural Language Web) is Microsoft's open-source layer that makes your existing schema markup, feeds, and HTML queryable by AI agents in plain language — no content changes required. This tutorial walks through how the protocol works architecturally, which structured data assets it draws on, and the three fastest paths to getting your site agentic-ready. A dual-source breakdown from Moz's video overview and the official GitHub documentation.

by marketingagent.io 1 week ago23 hours ago

0views

Make Your Website Queryable by AI Agents with NL Web

Every piece of structured data you’ve added to your site — schema markup, sitemaps, product feeds, heading tags — was already making you ready for the agentic web. NL Web (Natural Language Web) is the open-source layer that puts it all to work, letting AI agents query your site in plain language and get grounded, accurate responses. After completing this tutorial, you’ll understand how NL Web works architecturally, what existing site assets it draws on, and the two fastest paths to implementing it.

NL Web is an open-source project released by Microsoft in spring 2025. It enables natural language querying of websites without requiring any changes to your existing content or CMS. The project is maintained on GitHub by RV Guha — the same developer who created schema.org and RSS — which signals strong long-term pedigree.

NL Web sits as a webapp layer on your existing website, using the ASK Protocol to field natural language queries from AI agents.

When a user or AI agent sends a query to an NL Web-enabled site, it travels through something called the Ask Protocol. This protocol is the communication standard that connects the incoming natural language question to your site’s underlying structured data. Think of it as the handshake layer between the agent and your content.

The NL Web webapp sits on top of your site and exposes the ASK Protocol endpoint AI agents use to query it.

NL Web draws on a broader definition of structured data than most SEOs default to. Any organized, tagged data qualifies: schema markup and JSON-LD, yes — but also H1/H2 heading tags, title tags, HTML lists and tables, product feeds, RSS feeds, XML sitemaps, and custom database content such as CMS-driven dynamic pages.

Schema markup, product feeds, RSS, sitemaps, HTML headings — if it's structured data on your site, NL Web can use it. — Schema markup, product feeds, RSS, sitemaps, HTML headings — if it’s structured data on your site, NL Web can use it.

Once the Ask Protocol surfaces the relevant structured data, an LLM processes the query against that grounding context. Using real site data as the foundation reduces hallucination risk significantly. If the query requires interactivity — say, filtering a product catalog or booking a slot — NL Web can optionally call the site’s MCP (Model Context Protocol) instance to handle those actions.

The LLM processing layer uses your site's structured data to formulate accurate, context-aware responses to agent queries. — The LLM processing layer uses your site’s structured data to formulate accurate, context-aware responses to agent queries.

To implement NL Web, visit the project’s GitHub repository and have your development team follow the official documentation to set it up. The repo is actively maintained and accepts community questions and proposals — a good signal for ongoing support.

How to actually implement NL Web: GitHub self-host, Wix CMS plugin, or third-party NL Web partners — your three paths forward.

If your site runs on Wix with a premium plan and a custom domain, you can enable the Agentic Web integration by toggling a single setting — no developer required.

Warning: this step may differ from current official documentation — see the verified version below.

The complete NL Web architecture: your existing structured data feeds an LLM matching layer that responds to AI agent queries via the ASK Protocol.

How does this compare to the official docs?

The video gives an accessible architectural overview and two concrete implementation starting points, but the official Microsoft documentation and GitHub repository go deeper on configuration, Ask Protocol specifics, and MCP integration — which is where Act 2 picks up.

Here’s What the Official Docs Show

The video’s architectural walkthrough is well-grounded — the core mechanics, data sources, and MCP framing all check out against the official repository and referenced standards. What follows fills in a few specifics the docs surface that will matter when you move from overview to implementation.

1. What NLWeb is and where it lives

The video’s approach here matches the current docs exactly on the essentials: NLWeb is MIT-licensed, open-source, and actively maintained. One address to update: as of June 2026, the canonical GitHub organization is nlweb-ai, not microsoft — the repository lives at github.com/nlweb-ai/NLWeb, with 6.2k stars, 54 contributors, and 1,164 commits confirming healthy community momentum.

On schema.org attribution: the schema.org homepage credits its founding to Google, Microsoft, Yahoo, and Yandex as a collaborative effort. No individual creator is named in the visible documentation.

The NLWeb GitHub repository at github.com/nlweb-ai/NLWeb — MIT-licensed, 6.2k stars, hosted under the 'nlweb-ai' organization. — 📄 The NLWeb GitHub repository at github.com/nlweb-ai/NLWeb — MIT-licensed, 6.2k stars, hosted under the ‘nlweb-ai’ organization.

2. The query-routing layer

No official documentation was found for this step — proceed using the video’s approach and verify independently.

One terminology note worth flagging: the README and visible codebase use the module name AskAgent for the query-routing component. As of June 2026, the phrase “Ask Protocol” does not appear in any visible NLWeb documentation.

NLWeb README — the query-routing module is named 'AskAgent'; 'Ask Protocol' does not appear in current documentation. — 📄 NLWeb README — the query-routing module is named ‘AskAgent’; ‘Ask Protocol’ does not appear in current documentation.

3. Structured data sources NLWeb draws on

The video’s approach here matches the current docs exactly. Schema.org and RSS are named explicitly in the README as foundational formats covering over 100 million websites. JSON-LD is confirmed at json-ld.org as a lightweight linked data format with conformant libraries in Python, JavaScript, Go, PHP, and 7+ other languages.

XML sitemaps could not be verified — sitemaps.org returned HTTP 403 Forbidden during all documentation capture attempts.

📄 Schema.org — a four-company collaborative standard now used by over 45 million domains, supporting RDFa, Microdata, and JSON-LD encoding.

📄 JSON-LD homepage confirming a lightweight linked data format with an interactive Playground available at json-ld.org.

4. LLM processing and MCP integration

The video’s approach here matches the current docs exactly. One precision worth adding: the README describes MCP support as native — “It natively supports MCP, allowing the same natural language APIs to serve both humans and AI agents.” MCP is a built-in architectural capability, not a bolt-on. Whether you expose it in your deployment is a configuration choice.

📄 Official MCP docs confirming it as an open-source standard for connecting AI applications to external systems — broader than NLWeb alone.

5. Implementation paths

The video’s approach here matches the current docs exactly on GitHub-based self-hosting. One addition the repo surfaces: a code/wordpress/nlweb directory represents a WordPress-specific integration path that the video did not cover alongside the developer setup and Wix options.

NLWeb repo file structure: 54 contributors, Python-primary, with a 'code/wordpress/nlweb' WordPress integration directory visible. — 📄 NLWeb repo file structure: 54 contributors, Python-primary, with a ‘code/wordpress/nlweb’ WordPress integration directory visible.

6. Wix one-toggle enablement

No official documentation was found for this step — proceed using the video’s approach and verify independently.

Useful Links

GitHub – nlweb-ai/NLWeb — The canonical NLWeb repository: MIT-licensed Python implementation with 54 contributors and active development across 68 branches.
Schema.org — Collaborative structured data vocabulary standard founded by Google, Microsoft, Yahoo, and Yandex, currently used by over 45 million domains at version 30.0.
JSON-LD – JSON for Linked Data — Official JSON-LD specification home with an interactive Playground and conformant libraries across 11+ languages including Python, Go, and PHP.
What is the Model Context Protocol (MCP)? — Official MCP documentation covering architecture, use cases, and server, client, and app build paths.
sitemaps.org Protocol — XML sitemap protocol reference; this domain returned HTTP 403 Forbidden during documentation capture and may require direct browser access.