Serving AI-Ready Markdown from Sitecore XP 10.4 MVC

As more teams start building AI assistants, retrieval workflows, and search experiences on top of CMS content, one practical question keeps coming up:

How do we expose Sitecore content in a format that’s easy for AI systems to consume?

In many cases, the answer is Markdown.

Markdown is simple, lightweight, readable, and structured enough to work well for indexing, chunking, retrieval, and prompt construction. It strips away most of the presentation noise while preserving the content hierarchy that matters.

In this article, I’ll walk through a technical approach for implementing automatic Markdown generation in Sitecore XP 10.4 with MVC, including:

handling .md requests through MVC routing,
using a controller to resolve the requested item and perform the conversion,
converting Rich Text fields with the ReverseMarkdown DLL,
expanding internal links and media URLs before conversion,
and overriding EnsureLoggedInForPreview so preview requests can also support .md.

The goal is to make Sitecore content available in a clean machine-friendly format without forcing authors to maintain a second version of the same content.

Why Markdown makes sense for AI consumers

Sitecore naturally renders HTML, which works well for browsers but is not always ideal for AI systems.

Markdown has a few clear advantages:

It removes a lot of visual and structural noise.
It preserves headings, lists, links, and content hierarchy.
It is easier to store, index, chunk, and pass into LLM workflows.
It gives you a format that is human-readable and machine-friendly at the same time.

This is not about replacing the standard website output. It is about creating a second representation of the same content, one that is better suited for AI agents, internal knowledge systems, and retrieval pipelines.

What this implementation is trying to solve

The idea is straightforward:

A request like this:

/products/my-page.md

should be handled by Sitecore MVC, mapped to the corresponding content item, converted into Markdown, and returned as a text response.

That gives you an alternate output channel for the same content tree already managed in Sitecore.

This can be useful for:

AI agents
RAG pipelines
semantic search indexing
content export workflows
internal assistant experiences
knowledge base processing

High-level approach

At a high level, the implementation looks like this:

A request comes in for a URL ending in .md.
MVC routing intercepts that request.
A controller receives the route and resolves the Sitecore item.
A conversion layer reads the relevant fields.
Rich Text content is normalized and converted into Markdown.
The final Markdown document is returned with an appropriate response content type.

This keeps the HTML site experience intact while giving you a second, AI-friendly output format from the same source content.

A few implementation decisions worth making early

Before writing code, it helps to define a few boundaries.

1. Decide what content should be exposed as Markdown

Not every page should necessarily have a .md representation. You may want to restrict it by:

template
site section
publishing state
language
business rules

That decision is important, especially if some content should remain strictly presentation-based or internal.

2. Convert from structured content, not from final rendered HTML

It may be tempting to capture the final HTML output of a page and convert that into Markdown. That can work for simple cases, but it often brings along layout noise, repeated components, navigation, and presentation artifacts.

A cleaner approach is to generate Markdown from the item fields or from a dedicated content model. That gives you much tighter control over what is included, in what order, and in what shape.

3. Internal and media links must be expanded before Markdown conversion

This is a critical detail.

If a Rich Text field contains Sitecore internal links or media references, those URLs should be resolved first. Otherwise, the Markdown output may contain incomplete, unresolved, or environment-dependent references.

The correct order is:

read the Rich Text HTML,
expand internal and media URLs,
then convert the resulting HTML to Markdown.

4. ReverseMarkdown helps, but it should not be the whole solution

For Rich Text fields, a significant part of the HTML-to-Markdown conversion can be handled using the ReverseMarkdown DLL. It’s very useful and removes a lot of manual effort.

Still, it works best when the HTML has already been normalized. You’ll usually want at least some pre-processing and, depending on your content, possibly some post-processing as well.

Suggested architecture

A clean way to organize this is to break the solution into a few focused parts:

Route registration for .md requests
Markdown controller to receive and process those requests
Markdown service to handle content transformation
Link expansion helper or service to resolve Sitecore links and media items
Preview pipeline override for .md support during preview

This separation keeps the controller lean and makes the conversion logic easier to test and evolve over time.

Registering MVC routes for `.md` requests

The first step is to register a route that catches requests ending in .md and forwards them to the right controller.

The exact implementation depends on your current routing setup, but the idea is simple: register the Markdown route before broader MVC routes so it gets a chance to handle the request first.

Example route registration

public class RegisterMarkdownRoute
{
    public void Process(PipelineArgs args)
    {
        RouteTable.Routes.MapRoute(
            name: "MarkdownRoutes",
            url: "{*path}",
            defaults: new { controller = "Markdown", action = "Page" },
            constraints: new { path = new MarkdownRouteConstraint() }
        );
    }
}

Following the constraint to validate the .md calls

public class MarkdownRouteConstraint : IRouteConstraint
{
    public bool Match(HttpContextBase httpContext, Route route, string parameterName,
        RouteValueDictionary values, RouteDirection routeDirection)
    {
        var raw = values[parameterName] as string ?? string.Empty;

        // Normalize (route gives without leading slash)
        var path = raw.TrimStart('/');

        // Only *.md
        if (!path.EndsWith(".md", StringComparison.OrdinalIgnoreCase))
            return false;

        // Avoid Sitecore/system paths (defense-in-depth; IgnoreRoute already covers most)
        if (path.StartsWith("sitecore", StringComparison.OrdinalIgnoreCase) ||
            path.StartsWith("sitecore_shell", StringComparison.OrdinalIgnoreCase) ||
            path.StartsWith("sitecoreadmin", StringComparison.OrdinalIgnoreCase) ||
            path.StartsWith("-/", StringComparison.OrdinalIgnoreCase) ||
            path.StartsWith("api/", StringComparison.OrdinalIgnoreCase))
        {
            return false;
        }

        return true;
    }
}

What this route should do

At minimum, the route should:

intercept requests ending in .md
preserve the logical page path
pass control to a dedicated controller
allow the controller to resolve the corresponding Sitecore item

In many implementations, the controller will receive the request path, strip the .md extension, and use the remaining path to locate the correct item.

The MVC controller that receives the route and triggers conversion

The controller is the core of the request flow. It receives the .md request, resolves the Sitecore item behind that path, and returns the Markdown output.

A good controller should stay focused on orchestration:

validate the request
resolve the target item
call a conversion service
return the generated Markdown response

Example controller

public class MarkdownController : Controller
{
    private readonly PageMarkdownGenerator _generator;
    private readonly ISitecoreItemRepository _itemRepository;
    private readonly ICacheManager _cacheManager;

    public MarkdownController(PageMarkdownGenerator generator,
        ISitecoreItemRepository itemRepository,
        ICacheManager cacheManager)
    {
        _generator = generator ?? throw new ArgumentNullException(nameof(generator));
        _itemRepository = itemRepository ?? throw new ArgumentNullException(nameof(itemRepository));
        _cacheManager = cacheManager ?? throw new ArgumentNullException(nameof(cacheManager));
    }

    public ActionResult Page(string path)
    {
        try
        {
            
            var isMarkdownActive = IsMarkdownActive();
            if(!isMarkdownActive) 
            {
                Log.Info("MarkdownController: Markdown feature is not active. Returning 404.", this);
                return new HttpStatusCodeResult(404, "Not Found");
            }

            // Get the request path
            var requestPath = HttpContext?.Request?.FilePath ?? string.Empty;

            if (string.IsNullOrWhiteSpace(requestPath))
            {
                Log.Warn("MarkdownController: Request path is null or empty", this);
                return new HttpStatusCodeResult(400, "Bad Request");
            }

            // Build the full Sitecore item path using the repository
            var fullItemPath = _itemRepository.BuildFullItemPath(requestPath);

            // Retrieve the item from the repository
            var item = _itemRepository.GetItemByPath(fullItemPath);

            if (item == null)
            {
                Log.Warn($"MarkdownController: Item not found at path '{fullItemPath}'", this);
                return new HttpStatusCodeResult(404, "Not Found");
            }

            // Set the context item for downstream processing
            Sitecore.Context.Item = item;

            var cacheKey = $"Markdown_{fullItemPath}";

            var markdown = _cacheManager.GetOrSetInCache(
                cacheKey,
                TimeSpan.FromMinutes(20), // Cache duration
                () =>
                {
                    var site = SiteContextFactory.GetSiteContext("YOURWEBSITE");
                    using (new SiteContextSwitcher(site))
                    {
                        var markdownContent = _generator.Generate(item);
                        return markdownContent;
                    }
                }
            );

            return MarkdownContent(markdown);
        }
        catch (Exception ex)
        {
            Log.Error($"MarkdownController: Markdown delivery failed for path='{path}'", ex, this);
            return new HttpStatusCodeResult(500, "Internal Server Error");
        }
    }

    private static ContentResult MarkdownContent(string markdown)
    {
        return new ContentResult
        {
            Content = markdown ?? string.Empty,
            ContentType = "text/markdown",
            ContentEncoding = Encoding.UTF8
        };
    }

    private static bool IsMarkdownActive()
    {
        var siteSettingsItem = Sitecore.Configuration.Factory.GetDatabase("web")
                              .Items.GetItem(Pages.SiteSettingsId);
        if (siteSettingsItem == null)
        {
            Log.Warn("MarkdownController: Site settings item not found", typeof(MarkdownController));
            return false;
        }
        var isMarkdownActive = siteSettingsItem?.GetCheckboxBoolean(Templates.Markdown.Fields.IsMarkdownActive) ?? false;

        return isMarkdownActive;
    }
}

What the controller should not do

Try not to place all transformation logic directly in the controller. It’s much easier to maintain this feature if the controller delegates the actual conversion to a service layer.

That becomes especially important if you later want to reuse the same Markdown generation logic from:

an API endpoint
a scheduled export process
a batch indexer
a content sync integration

Converting Sitecore content into Markdown

This is where the real value of the implementation lives.

Not every page type should be converted in exactly the same way, so it usually makes sense to define a conversion strategy based on:

template
content type
field map
view model
business rules

A simple conversion might include:

page title as #
section headings as ## or ###
summary or introduction
main body content
related links
useful metadata for downstream AI workflows

The important thing is consistency. AI consumers usually benefit more from predictable structure than from clever formatting.

Converting Rich Text fields with ReverseMarkdown

For Rich Text fields, part of the conversion can be handled using the ReverseMarkdown library.

This is a practical choice because it takes care of many common HTML structures such as:

paragraphs
headings
emphasis
lists
links

That said, you should not feed raw Sitecore Rich Text HTML straight into the converter without preparing it first.

Expand links before converting

This is one of the most important parts of the pipeline.

Before sending Rich Text HTML into ReverseMarkdown, you should first expand:

internal Sitecore links
media item URLs

If you skip this step, the generated Markdown may contain broken references, partial URLs, or internal representations that only make sense inside Sitecore.

So the conversion flow should look like this:

read the Rich Text HTML from the field
resolve internal links into final public or canonical URLs
resolve media references into usable media URLs
pass the normalized HTML into ReverseMarkdown
apply any final cleanup rules

Example of Rich Text conversion with ReverseMarkdown

/// Serializes a Rich Text field to markdown ///
private void SerializeRichTextField(StringBuilder sb, Field field)
{
var htmlValue = field.Value;

// Expand dynamic links and media links
htmlValue = ExpandDynamicLinks(htmlValue);
htmlValue = ExpandMediaLinks(htmlValue);

if (!TryConvertHtmlToMarkdown(htmlValue, out var mdValue))
{
    // Fallback: output as plain text
    AppendFieldWithValue(sb, htmlValue);
    return;
}

if (!string.IsNullOrWhiteSpace(mdValue))
{
    AppendFieldWithValue(sb, mdValue.Trim());
}

}

private bool TryConvertHtmlToMarkdown(string htmlValue, out string mdValue)
{
mdValue = null;
try
{
var converter = new ReverseMarkdown.Converter();
mdValue = converter.Convert(htmlValue);

    // Check if the converted markdown still contains HTML <section> tags
    if (!string.IsNullOrEmpty(mdValue) && mdValue.Contains("<section"))
    {
        mdValue = ConvertSectionToMarkdown(mdValue);
    }

    if (!string.IsNullOrEmpty(mdValue) && mdValue.Contains("<figure"))
        mdValue = ConvertFigureToMarkdown(mdValue);

    return true;
}
catch (ArgumentException ex)
{
    Log.Error("Error converting Rich Text field: Invalid HTML argument", ex, this);
    return false;
}
catch (InvalidOperationException ex)
{
    Log.Error("Error converting Rich Text field: Invalid operation", ex, this);
    return false;
}

}

Expanding internal links and media item URLs

This step is what makes the Markdown output reliable outside the normal HTML rendering context.

Sitecore Rich Text content can include internal links and media references that depend on the platform’s own link handling. If you want your Markdown to be useful to external consumers, those references need to be fully resolved first.

What this helper should handle

This logic should ideally take care of:

converting internal links into public-facing URLs
resolving media items into usable media URLs
avoiding environment-specific broken paths
keeping output consistent across preview and delivery scenarios

Example helper for URL expansion

private string ExpandDynamicLinks(string text)
{
    if (string.IsNullOrEmpty(text) || !text.Contains(DynamicLinkPrefix))
        return text;

    var sb = new StringBuilder(text.Length);
    int currentIndex = 0;
    int startIndex = text.IndexOf(DynamicLinkPrefix, StringComparison.InvariantCulture);

    while (startIndex >= 0)
    {
        int endIndex = text.IndexOf(DynamicLinkSuffix, startIndex, StringComparison.InvariantCulture);
        if (endIndex < 0)
        {
            break;
        }

        var endPosition = endIndex + DynamicLinkSuffix.Length;
        var url = TryParseDynamicLink(text, startIndex, endIndex);

        sb.Append(text, currentIndex, startIndex - currentIndex);
        sb.Append(url);
        currentIndex = endPosition;

        startIndex = text.IndexOf(DynamicLinkPrefix, currentIndex, StringComparison.InvariantCulture);
    }

    sb.Append(text.Substring(currentIndex));
    return sb.ToString();
}

/// <summary>
/// Attempts to parse and resolve a dynamic link
/// </summary>
private string TryParseDynamicLink(string text, int startIndex, int endIndex)
{
    try
    {
        var linkLength = endIndex - startIndex;
        var dynamicLink = DynamicLink.Parse(text.Substring(startIndex, linkLength));
        return ResolveDynamicLink(dynamicLink);
    }
    catch (ArgumentException ex)
    {
        Log.Warn($"Error parsing dynamic link at position {startIndex}: Invalid argument", ex, this);
        return text.Substring(startIndex, endIndex + DynamicLinkSuffix.Length - startIndex);
    }
    catch (FormatException ex)
    {
        Log.Warn($"Error parsing dynamic link at position {startIndex}: Format error", ex, this);
        return text.Substring(startIndex, endIndex + DynamicLinkSuffix.Length - startIndex);
    }
}

/// /// Expands media links like /media/1D28D6CA27E0447F8EA242CDECABA25E.ashx to their actual URLs ///
private string ExpandMediaLinks(string text)
{
if (string.IsNullOrEmpty(text))
return text;

return MediaLinkRegex.Replace(text, match =>
{
    var mediaIdString = match.Groups[1].Value;

    if (string.IsNullOrEmpty(mediaIdString) || mediaIdString.Length != MediaIdCompactLength)
        return match.Value;

    return TryGetMediaUrl(mediaIdString, match.Value);
});

}

/// <summary>
/// Expands media links like /media/1D28D6CA27E0447F8EA242CDECABA25E.ashx to their actual URLs
/// </summary>
private string ExpandMediaLinks(string text)
{
    if (string.IsNullOrEmpty(text))
        return text;

    return MediaLinkRegex.Replace(text, match =>
    {
        var mediaIdString = match.Groups[1].Value;

        if (string.IsNullOrEmpty(mediaIdString) || mediaIdString.Length != MediaIdCompactLength)
            return match.Value;

        return TryGetMediaUrl(mediaIdString, match.Value);
    });
}

/// <summary>
/// Attempts to get media URL for a media ID string
/// </summary>
private string TryGetMediaUrl(string mediaIdString, string fallbackValue)
{
    try
    {
        var formattedId = FormatMediaId(mediaIdString);
        var mediaUrl = GetMediaUrlById(formattedId);
        return mediaUrl ?? fallbackValue;
    }
    catch (ArgumentException ex)
    {
        Log.Warn($"Error expanding media link for ID {mediaIdString}: Invalid argument", ex, this);
        return fallbackValue;
    }
    catch (InvalidOperationException ex)
    {
        Log.Warn($"Error expanding media link for ID {mediaIdString}: Invalid operation", ex, this);
        return fallbackValue;
    }
}

Overriding `EnsureLoggedInForPreview` to allow `.md` requests

If you want .md requests to work during preview, you may need to adjust how Sitecore handles those requests in the preview pipeline.

One of the places this can surface is the EnsureLoggedInForPreview processor. Depending on how the request is interpreted, .md URLs may be blocked or handled differently from standard page requests.

A targeted override can solve that by explicitly allowing valid Markdown requests in preview scenarios.

What this override is meant to solve

This override helps you:

allow .md requests during preview
test Markdown output before publishing
avoid breaking normal preview behavior
keep the exception narrow and controlled

Example override of `EnsureLoggedInForPreview`

public class EnsureLoggedInForPreviewMarkdown : ActionExecutingProcessor
{
    public override void Process(ActionExecutingArgs args)
    {
        if (IsMarkdownRequest(args.Context))
            return;

        if (Context.PageMode.IsNormal || Context.IsLoggedIn)
            return;

        using (new SiteContextSwitcher(Factory.GetSite("shell")))
        {
            using (new UserSwitcher(Context.User))
                ShellPage.IsLoggedIn(true, (httpContext, loginUrl) => RedirectToLogin(args.Context, loginUrl));
        }
    }

    private static bool IsMarkdownRequest(ActionExecutingContext context)
    {
        if (context?.HttpContext?.Request == null)
            return false;

        var requestPath = context.HttpContext.Request.FilePath ?? string.Empty;
        return requestPath.EndsWith(".md", StringComparison.OrdinalIgnoreCase);
    }

    private static void RedirectToLogin(ActionExecutingContext actionContext, string loginUrl)
    {
        Assert.ArgumentNotNull(actionContext, nameof(actionContext));
        Assert.ArgumentNotNull(loginUrl, nameof(loginUrl));
        actionContext.Result = new RedirectResult(loginUrl, false);
    }
}

Keep the security scope narrow

This is worth being careful about.

The goal is not to weaken preview security. The goal is simply to allow a very specific request pattern. Your override should stay strict and validate the request as narrowly as possible.

Sitecore patch configuration

If you are replacing or inserting a custom processor into the pipeline, you’ll also need a patch config to register it.

Example patch config

<?xml version="1.0"?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:set="http://www.sitecore.net/xmlconfig/set/">
  <sitecore>
    <services>
      <configurator type="Foundation.Markdown.ServicesConfigurator, Oshyn.Feature.Markdown" />
    </services>
    <pipelines>
      <initialize>
        <processor type="Foundation.Markdown.Routes.RegisterMarkdownRoute, Oshyn.Feature.Markdown" 
                   patch:after="processor[@type='Sitecore.Mvc.Pipelines.Loader.InitializeRoutes, Sitecore.Mvc']" />
      </initialize>
      <mvc.actionExecuting>
        <processor type="Foundation.Markdown.Pipelines.EnsureLoggedInForPreviewMarkdown, Foundation.Markdown"        patch:instead="processor[@type='Sitecore.Mvc.Pipelines.Request.ActionExecuting.EnsureLoggedInForPreview, Sitecore.Mvc']" />
      </mvc.actionExecuting>
    </pipelines>
  </sitecore>
</configuration>

Practical recommendations

Once the basic implementation is in place, a few refinements can make it much more useful in real projects.

Define a stable output contract

If the Markdown is going to feed AI systems, consistency matters a lot.

It helps to define a clear output contract, such as:

always include the title
include the canonical URL
use consistent heading levels
exclude navigation and decorative content
include only the fields that add knowledge value

Filter aggressively

Not everything that appears on a web page is useful to an AI consumer.

Menus, breadcrumbs, promos, CTAs, utility blocks, and layout fragments often add noise. The Markdown version should focus on the actual content, not the entire page chrome.

Consider including metadata

Depending on your downstream use case, it may help to include metadata such as:

item ID
template name
language
canonical URL
last updated date

That can be useful for indexing, traceability, or debugging.

Think about caching

If the Markdown output will be requested frequently, caching is worth considering.

Repeatedly resolving Sitecore links, processing Rich Text, and converting HTML can add unnecessary overhead. Caching by item, language, version, or publish state can improve performance significantly.

Why this pattern works well

One of the strengths of this approach is that it lets you reuse the content model you already have.

You are not asking authors to maintain separate Markdown content. You are not duplicating data. You are simply creating an alternate output channel from the same source.

That gives you a practical way to make Sitecore content available to AI systems while staying within a familiar MVC architecture.

It also leaves room to grow. Once you have this pattern in place, it becomes much easier to add other output formats later, such as:

plain text
JSON
structured AI payloads
RAG-oriented export formats

Final thoughts

Automatic Markdown generation in Sitecore XP 10.4 MVC is a very practical pattern when you need to expose managed content to AI agents and downstream machine consumers.

The key is not to treat this as a simple HTML conversion task. It works better when you treat it as a controlled content projection.

That is why the strongest implementation usually includes:

dedicated routing for .md requests
an MVC controller that resolves the item and coordinates the response
a conversion layer built around content structure, not full page HTML
ReverseMarkdown for Rich Text field conversion
URL expansion for internal links and media items before conversion
and a focused override of EnsureLoggedInForPreview when preview support is required

With that in place, you can turn standard Sitecore-managed content into a clean Markdown representation that is much more useful for AI workflows.

Conclusion

This is a small architectural addition, but it can unlock a lot of value.

By exposing .md versions of Sitecore content, you create a bridge between traditional CMS delivery and AI-ready content consumption. That bridge becomes even stronger when the Markdown is clean, predictable, and generated from the right content sources.

If the implementation is done carefully, this can become a solid foundation for search, retrieval, assistants, and content-aware AI features built on top of Sitecore.

Ramiro Batallas

Principal Backend Engineer at Oshyn Inc. I’m a Sitecore and .NET engineer with more than 15 years of experience designing and delivering enterprise content platforms. My work focuses on building maintainable Sitecore XP/XM and composable DXP solutions, with an emphasis on performance, clean architecture, and a strong developer experience. I’ve led implementations across MVC, headless, and hybrid models, integrating Sitecore with search, CDP, commerce, and marketing automation ecosystems. I enjoy translating complex business requirements into practical content models, pipelines, and APIs that empower editors and marketers without sacrificing technical quality. My background includes .NET, SQL, Sitecore, Optimizely/Episerver, cloud infrastructure, and modern delivery practices based on Scrum and DevOps. I’m recognized for clear technical communication, pragmatic problem-solving, and helping teams deliver solutions that are robust, scalable, and genuinely useful.