Serving AI-Ready Markdown from Sitecore XP 10.4 MVC
As more teams start building AI assistants, retrieval workflows, and search experiences on top of CMS content, one practical question keeps coming up:
How do we expose Sitecore content in a format that’s easy for AI systems to consume?
In many cases, the answer is Markdown.
Markdown is simple, lightweight, readable, and structured enough to work well for indexing, chunking, retrieval, and prompt construction. It strips away most of the presentation noise while preserving the content hierarchy that matters.
In this article, I’ll walk through a technical approach for implementing automatic Markdown generation in Sitecore XP 10.4 with MVC, including:
- handling
.mdrequests through MVC routing, - using a controller to resolve the requested item and perform the conversion,
- converting Rich Text fields with the ReverseMarkdown DLL,
- expanding internal links and media URLs before conversion,
- and overriding
EnsureLoggedInForPreviewso preview requests can also support.md.
The goal is to make Sitecore content available in a clean machine-friendly format without forcing authors to maintain a second version of the same content.
Why Markdown makes sense for AI consumers
Sitecore naturally renders HTML, which works well for browsers but is not always ideal for AI systems.
Markdown has a few clear advantages:
- It removes a lot of visual and structural noise.
- It preserves headings, lists, links, and content hierarchy.
- It is easier to store, index, chunk, and pass into LLM workflows.
- It gives you a format that is human-readable and machine-friendly at the same time.
This is not about replacing the standard website output. It is about creating a second representation of the same content, one that is better suited for AI agents, internal knowledge systems, and retrieval pipelines.
What this implementation is trying to solve
The idea is straightforward:
A request like this:
/products/my-page.md
should be handled by Sitecore MVC, mapped to the corresponding content item, converted into Markdown, and returned as a text response.
That gives you an alternate output channel for the same content tree already managed in Sitecore.
This can be useful for:
- AI agents
- RAG pipelines
- semantic search indexing
- content export workflows
- internal assistant experiences
- knowledge base processing
High-level approach
At a high level, the implementation looks like this:
- A request comes in for a URL ending in
.md. - MVC routing intercepts that request.
- A controller receives the route and resolves the Sitecore item.
- A conversion layer reads the relevant fields.
- Rich Text content is normalized and converted into Markdown.
- The final Markdown document is returned with an appropriate response content type.
This keeps the HTML site experience intact while giving you a second, AI-friendly output format from the same source content.
A few implementation decisions worth making early
Before writing code, it helps to define a few boundaries.
1. Decide what content should be exposed as Markdown
Not every page should necessarily have a .md representation. You may want to restrict it by:
- template
- site section
- publishing state
- language
- business rules
That decision is important, especially if some content should remain strictly presentation-based or internal.
2. Convert from structured content, not from final rendered HTML
It may be tempting to capture the final HTML output of a page and convert that into Markdown. That can work for simple cases, but it often brings along layout noise, repeated components, navigation, and presentation artifacts.
A cleaner approach is to generate Markdown from the item fields or from a dedicated content model. That gives you much tighter control over what is included, in what order, and in what shape.
3. Internal and media links must be expanded before Markdown conversion
This is a critical detail.
If a Rich Text field contains Sitecore internal links or media references, those URLs should be resolved first. Otherwise, the Markdown output may contain incomplete, unresolved, or environment-dependent references.
The correct order is:
- read the Rich Text HTML,
- expand internal and media URLs,
- then convert the resulting HTML to Markdown.
4. ReverseMarkdown helps, but it should not be the whole solution
For Rich Text fields, a significant part of the HTML-to-Markdown conversion can be handled using the ReverseMarkdown DLL. It’s very useful and removes a lot of manual effort.
Still, it works best when the HTML has already been normalized. You’ll usually want at least some pre-processing and, depending on your content, possibly some post-processing as well.
Suggested architecture
A clean way to organize this is to break the solution into a few focused parts:
- Route registration for
.mdrequests - Markdown controller to receive and process those requests
- Markdown service to handle content transformation
- Link expansion helper or service to resolve Sitecore links and media items
- Preview pipeline override for
.mdsupport during preview
This separation keeps the controller lean and makes the conversion logic easier to test and evolve over time.
Registering MVC routes for .md requests
The first step is to register a route that catches requests ending in .md and forwards them to the right controller.
The exact implementation depends on your current routing setup, but the idea is simple: register the Markdown route before broader MVC routes so it gets a chance to handle the request first.
Example route registration
public class RegisterMarkdownRoute
{
public void Process(PipelineArgs args)
{
RouteTable.Routes.MapRoute(
name: "MarkdownRoutes",
url: "{*path}",
defaults: new { controller = "Markdown", action = "Page" },
constraints: new { path = new MarkdownRouteConstraint() }
);
}
}
Following the constraint to validate the .md calls
public class MarkdownRouteConstraint : IRouteConstraint
{
public bool Match(HttpContextBase httpContext, Route route, string parameterName,
RouteValueDictionary values, RouteDirection routeDirection)
{
var raw = values[parameterName] as string ?? string.Empty;
// Normalize (route gives without leading slash)
var path = raw.TrimStart('/');
// Only *.md
if (!path.EndsWith(".md", StringComparison.OrdinalIgnoreCase))
return false;
// Avoid Sitecore/system paths (defense-in-depth; IgnoreRoute already covers most)
if (path.StartsWith("sitecore", StringComparison.OrdinalIgnoreCase) ||
path.StartsWith("sitecore_shell", StringComparison.OrdinalIgnoreCase) ||
path.StartsWith("sitecoreadmin", StringComparison.OrdinalIgnoreCase) ||
path.StartsWith("-/", StringComparison.OrdinalIgnoreCase) ||
path.StartsWith("api/", StringComparison.OrdinalIgnoreCase))
{
return false;
}
return true;
}
}
What this route should do
At minimum, the route should:
- intercept requests ending in
.md - preserve the logical page path
- pass control to a dedicated controller
- allow the controller to resolve the corresponding Sitecore item
In many implementations, the controller will receive the request path, strip the .md extension, and use the remaining path to locate the correct item.
The MVC controller that receives the route and triggers conversion
The controller is the core of the request flow. It receives the .md request, resolves the Sitecore item behind that path, and returns the Markdown output.
A good controller should stay focused on orchestration:
- validate the request
- resolve the target item
- call a conversion service
- return the generated Markdown response
Example controller
public class MarkdownController : Controller
{
private readonly PageMarkdownGenerator _generator;
private readonly ISitecoreItemRepository _itemRepository;
private readonly ICacheManager _cacheManager;
public MarkdownController(PageMarkdownGenerator generator,
ISitecoreItemRepository itemRepository,
ICacheManager cacheManager)
{
_generator = generator ?? throw new ArgumentNullException(nameof(generator));
_itemRepository = itemRepository ?? throw new ArgumentNullException(nameof(itemRepository));
_cacheManager = cacheManager ?? throw new ArgumentNullException(nameof(cacheManager));
}
public ActionResult Page(string path)
{
try
{
var isMarkdownActive = IsMarkdownActive();
if(!isMarkdownActive)
{
Log.Info("MarkdownController: Markdown feature is not active. Returning 404.", this);
return new HttpStatusCodeResult(404, "Not Found");
}
// Get the request path
var requestPath = HttpContext?.Request?.FilePath ?? string.Empty;
if (string.IsNullOrWhiteSpace(requestPath))
{
Log.Warn("MarkdownController: Request path is null or empty", this);
return new HttpStatusCodeResult(400, "Bad Request");
}
// Build the full Sitecore item path using the repository
var fullItemPath = _itemRepository.BuildFullItemPath(requestPath);
// Retrieve the item from the repository
var item = _itemRepository.GetItemByPath(fullItemPath);
if (item == null)
{
Log.Warn($"MarkdownController: Item not found at path '{fullItemPath}'", this);
return new HttpStatusCodeResult(404, "Not Found");
}
// Set the context item for downstream processing
Sitecore.Context.Item = item;
var cacheKey = $"Markdown_{fullItemPath}";
var markdown = _cacheManager.GetOrSetInCache(
cacheKey,
TimeSpan.FromMinutes(20), // Cache duration
() =>
{
var site = SiteContextFactory.GetSiteContext("YOURWEBSITE");
using (new SiteContextSwitcher(site))
{
var markdownContent = _generator.Generate(item);
return markdownContent;
}
}
);
return MarkdownContent(markdown);
}
catch (Exception ex)
{
Log.Error($"MarkdownController: Markdown delivery failed for path='{path}'", ex, this);
return new HttpStatusCodeResult(500, "Internal Server Error");
}
}
private static ContentResult MarkdownContent(string markdown)
{
return new ContentResult
{
Content = markdown ?? string.Empty,
ContentType = "text/markdown",
ContentEncoding = Encoding.UTF8
};
}
private static bool IsMarkdownActive()
{
var siteSettingsItem = Sitecore.Configuration.Factory.GetDatabase("web")
.Items.GetItem(Pages.SiteSettingsId);
if (siteSettingsItem == null)
{
Log.Warn("MarkdownController: Site settings item not found", typeof(MarkdownController));
return false;
}
var isMarkdownActive = siteSettingsItem?.GetCheckboxBoolean(Templates.Markdown.Fields.IsMarkdownActive) ?? false;
return isMarkdownActive;
}
}
What the controller should not do
Try not to place all transformation logic directly in the controller. It’s much easier to maintain this feature if the controller delegates the actual conversion to a service layer.
That becomes especially important if you later want to reuse the same Markdown generation logic from:
- an API endpoint
- a scheduled export process
- a batch indexer
- a content sync integration
Converting Sitecore content into Markdown
This is where the real value of the implementation lives.
Not every page type should be converted in exactly the same way, so it usually makes sense to define a conversion strategy based on:
- template
- content type
- field map
- view model
- business rules
A simple conversion might include:
- page title as
# - section headings as
##or### - summary or introduction
- main body content
- related links
- useful metadata for downstream AI workflows
The important thing is consistency. AI consumers usually benefit more from predictable structure than from clever formatting.
Converting Rich Text fields with ReverseMarkdown
For Rich Text fields, part of the conversion can be handled using the ReverseMarkdown library.
This is a practical choice because it takes care of many common HTML structures such as:
- paragraphs
- headings
- emphasis
- lists
- links
That said, you should not feed raw Sitecore Rich Text HTML straight into the converter without preparing it first.
Expand links before converting
This is one of the most important parts of the pipeline.
Before sending Rich Text HTML into ReverseMarkdown, you should first expand:
- internal Sitecore links
- media item URLs
If you skip this step, the generated Markdown may contain broken references, partial URLs, or internal representations that only make sense inside Sitecore.
So the conversion flow should look like this:
- read the Rich Text HTML from the field
- resolve internal links into final public or canonical URLs
- resolve media references into usable media URLs
- pass the normalized HTML into
ReverseMarkdown - apply any final cleanup rules
Example of Rich Text conversion with ReverseMarkdown
/// Serializes a Rich Text field to markdown ///
private void SerializeRichTextField(StringBuilder sb, Field field)
{
var htmlValue = field.Value;
// Expand dynamic links and media links
htmlValue = ExpandDynamicLinks(htmlValue);
htmlValue = ExpandMediaLinks(htmlValue);
if (!TryConvertHtmlToMarkdown(htmlValue, out var mdValue))
{
// Fallback: output as plain text
AppendFieldWithValue(sb, htmlValue);
return;
}
if (!string.IsNullOrWhiteSpace(mdValue))
{
AppendFieldWithValue(sb, mdValue.Trim());
}
}
private bool TryConvertHtmlToMarkdown(string htmlValue, out string mdValue)
{
mdValue = null;
try
{
var converter = new ReverseMarkdown.Converter();
mdValue = converter.Convert(htmlValue);
// Check if the converted markdown still contains HTML <section> tags
if (!string.IsNullOrEmpty(mdValue) && mdValue.Contains("<section"))
{
mdValue = ConvertSectionToMarkdown(mdValue);
}
if (!string.IsNullOrEmpty(mdValue) && mdValue.Contains("<figure"))
mdValue = ConvertFigureToMarkdown(mdValue);
return true;
}
catch (ArgumentException ex)
{
Log.Error("Error converting Rich Text field: Invalid HTML argument", ex, this);
return false;
}
catch (InvalidOperationException ex)
{
Log.Error("Error converting Rich Text field: Invalid operation", ex, this);
return false;
}
}
Expanding internal links and media item URLs
This step is what makes the Markdown output reliable outside the normal HTML rendering context.
Sitecore Rich Text content can include internal links and media references that depend on the platform’s own link handling. If you want your Markdown to be useful to external consumers, those references need to be fully resolved first.
What this helper should handle
This logic should ideally take care of:
- converting internal links into public-facing URLs
- resolving media items into usable media URLs
- avoiding environment-specific broken paths
- keeping output consistent across preview and delivery scenarios
Example helper for URL expansion
private string ExpandDynamicLinks(string text)
{
if (string.IsNullOrEmpty(text) || !text.Contains(DynamicLinkPrefix))
return text;
var sb = new StringBuilder(text.Length);
int currentIndex = 0;
int startIndex = text.IndexOf(DynamicLinkPrefix, StringComparison.InvariantCulture);
while (startIndex >= 0)
{
int endIndex = text.IndexOf(DynamicLinkSuffix, startIndex, StringComparison.InvariantCulture);
if (endIndex < 0)
{
break;
}
var endPosition = endIndex + DynamicLinkSuffix.Length;
var url = TryParseDynamicLink(text, startIndex, endIndex);
sb.Append(text, currentIndex, startIndex - currentIndex);
sb.Append(url);
currentIndex = endPosition;
startIndex = text.IndexOf(DynamicLinkPrefix, currentIndex, StringComparison.InvariantCulture);
}
sb.Append(text.Substring(currentIndex));
return sb.ToString();
}
/// <summary>
/// Attempts to parse and resolve a dynamic link
/// </summary>
private string TryParseDynamicLink(string text, int startIndex, int endIndex)
{
try
{
var linkLength = endIndex - startIndex;
var dynamicLink = DynamicLink.Parse(text.Substring(startIndex, linkLength));
return ResolveDynamicLink(dynamicLink);
}
catch (ArgumentException ex)
{
Log.Warn($"Error parsing dynamic link at position {startIndex}: Invalid argument", ex, this);
return text.Substring(startIndex, endIndex + DynamicLinkSuffix.Length - startIndex);
}
catch (FormatException ex)
{
Log.Warn($"Error parsing dynamic link at position {startIndex}: Format error", ex, this);
return text.Substring(startIndex, endIndex + DynamicLinkSuffix.Length - startIndex);
}
}
/// /// Expands media links like /media/1D28D6CA27E0447F8EA242CDECABA25E.ashx to their actual URLs ///
private string ExpandMediaLinks(string text)
{
if (string.IsNullOrEmpty(text))
return text;
return MediaLinkRegex.Replace(text, match =>
{
var mediaIdString = match.Groups[1].Value;
if (string.IsNullOrEmpty(mediaIdString) || mediaIdString.Length != MediaIdCompactLength)
return match.Value;
return TryGetMediaUrl(mediaIdString, match.Value);
});
}
/// <summary>
/// Expands media links like /media/1D28D6CA27E0447F8EA242CDECABA25E.ashx to their actual URLs
/// </summary>
private string ExpandMediaLinks(string text)
{
if (string.IsNullOrEmpty(text))
return text;
return MediaLinkRegex.Replace(text, match =>
{
var mediaIdString = match.Groups[1].Value;
if (string.IsNullOrEmpty(mediaIdString) || mediaIdString.Length != MediaIdCompactLength)
return match.Value;
return TryGetMediaUrl(mediaIdString, match.Value);
});
}
/// <summary>
/// Attempts to get media URL for a media ID string
/// </summary>
private string TryGetMediaUrl(string mediaIdString, string fallbackValue)
{
try
{
var formattedId = FormatMediaId(mediaIdString);
var mediaUrl = GetMediaUrlById(formattedId);
return mediaUrl ?? fallbackValue;
}
catch (ArgumentException ex)
{
Log.Warn($"Error expanding media link for ID {mediaIdString}: Invalid argument", ex, this);
return fallbackValue;
}
catch (InvalidOperationException ex)
{
Log.Warn($"Error expanding media link for ID {mediaIdString}: Invalid operation", ex, this);
return fallbackValue;
}
}
Overriding EnsureLoggedInForPreview to allow .md requests
If you want .md requests to work during preview, you may need to adjust how Sitecore handles those requests in the preview pipeline.
One of the places this can surface is the EnsureLoggedInForPreview processor. Depending on how the request is interpreted, .md URLs may be blocked or handled differently from standard page requests.
A targeted override can solve that by explicitly allowing valid Markdown requests in preview scenarios.
What this override is meant to solve
This override helps you:
- allow
.mdrequests during preview - test Markdown output before publishing
- avoid breaking normal preview behavior
- keep the exception narrow and controlled
Example override of EnsureLoggedInForPreview
public class EnsureLoggedInForPreviewMarkdown : ActionExecutingProcessor
{
public override void Process(ActionExecutingArgs args)
{
if (IsMarkdownRequest(args.Context))
return;
if (Context.PageMode.IsNormal || Context.IsLoggedIn)
return;
using (new SiteContextSwitcher(Factory.GetSite("shell")))
{
using (new UserSwitcher(Context.User))
ShellPage.IsLoggedIn(true, (httpContext, loginUrl) => RedirectToLogin(args.Context, loginUrl));
}
}
private static bool IsMarkdownRequest(ActionExecutingContext context)
{
if (context?.HttpContext?.Request == null)
return false;
var requestPath = context.HttpContext.Request.FilePath ?? string.Empty;
return requestPath.EndsWith(".md", StringComparison.OrdinalIgnoreCase);
}
private static void RedirectToLogin(ActionExecutingContext actionContext, string loginUrl)
{
Assert.ArgumentNotNull(actionContext, nameof(actionContext));
Assert.ArgumentNotNull(loginUrl, nameof(loginUrl));
actionContext.Result = new RedirectResult(loginUrl, false);
}
}
Keep the security scope narrow
This is worth being careful about.
The goal is not to weaken preview security. The goal is simply to allow a very specific request pattern. Your override should stay strict and validate the request as narrowly as possible.
Sitecore patch configuration
If you are replacing or inserting a custom processor into the pipeline, you’ll also need a patch config to register it.
Example patch config
<?xml version="1.0"?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:set="http://www.sitecore.net/xmlconfig/set/">
<sitecore>
<services>
<configurator type="Foundation.Markdown.ServicesConfigurator, Oshyn.Feature.Markdown" />
</services>
<pipelines>
<initialize>
<processor type="Foundation.Markdown.Routes.RegisterMarkdownRoute, Oshyn.Feature.Markdown"
patch:after="processor[@type='Sitecore.Mvc.Pipelines.Loader.InitializeRoutes, Sitecore.Mvc']" />
</initialize>
<mvc.actionExecuting>
<processor type="Foundation.Markdown.Pipelines.EnsureLoggedInForPreviewMarkdown, Foundation.Markdown" patch:instead="processor[@type='Sitecore.Mvc.Pipelines.Request.ActionExecuting.EnsureLoggedInForPreview, Sitecore.Mvc']" />
</mvc.actionExecuting>
</pipelines>
</sitecore>
</configuration>
Practical recommendations
Once the basic implementation is in place, a few refinements can make it much more useful in real projects.
Define a stable output contract
If the Markdown is going to feed AI systems, consistency matters a lot.
It helps to define a clear output contract, such as:
- always include the title
- include the canonical URL
- use consistent heading levels
- exclude navigation and decorative content
- include only the fields that add knowledge value
Filter aggressively
Not everything that appears on a web page is useful to an AI consumer.
Menus, breadcrumbs, promos, CTAs, utility blocks, and layout fragments often add noise. The Markdown version should focus on the actual content, not the entire page chrome.
Consider including metadata
Depending on your downstream use case, it may help to include metadata such as:
- item ID
- template name
- language
- canonical URL
- last updated date
That can be useful for indexing, traceability, or debugging.
Think about caching
If the Markdown output will be requested frequently, caching is worth considering.
Repeatedly resolving Sitecore links, processing Rich Text, and converting HTML can add unnecessary overhead. Caching by item, language, version, or publish state can improve performance significantly.
Why this pattern works well
One of the strengths of this approach is that it lets you reuse the content model you already have.
You are not asking authors to maintain separate Markdown content. You are not duplicating data. You are simply creating an alternate output channel from the same source.
That gives you a practical way to make Sitecore content available to AI systems while staying within a familiar MVC architecture.
It also leaves room to grow. Once you have this pattern in place, it becomes much easier to add other output formats later, such as:
- plain text
- JSON
- structured AI payloads
- RAG-oriented export formats
Final thoughts
Automatic Markdown generation in Sitecore XP 10.4 MVC is a very practical pattern when you need to expose managed content to AI agents and downstream machine consumers.
The key is not to treat this as a simple HTML conversion task. It works better when you treat it as a controlled content projection.
That is why the strongest implementation usually includes:
- dedicated routing for
.mdrequests - an MVC controller that resolves the item and coordinates the response
- a conversion layer built around content structure, not full page HTML
- ReverseMarkdown for Rich Text field conversion
- URL expansion for internal links and media items before conversion
- and a focused override of
EnsureLoggedInForPreviewwhen preview support is required
With that in place, you can turn standard Sitecore-managed content into a clean Markdown representation that is much more useful for AI workflows.
Conclusion
This is a small architectural addition, but it can unlock a lot of value.
By exposing .md versions of Sitecore content, you create a bridge between traditional CMS delivery and AI-ready content consumption. That bridge becomes even stronger when the Markdown is clean, predictable, and generated from the right content sources.
If the implementation is done carefully, this can become a solid foundation for search, retrieval, assistants, and content-aware AI features built on top of Sitecore.