Serving AI-Ready Markdown from Sitecore XP 10.4 MVC

As more teams start building AI assistants, retrieval workflows, and search experiences on top of CMS content, one practical question keeps coming up:

How do we expose Sitecore content in a format that’s easy for AI systems to consume?

In many cases, the answer is Markdown.

Markdown is simple, lightweight, readable, and structured enough to work well for indexing, chunking, retrieval, and prompt construction. It strips away most of the presentation noise while preserving the content hierarchy that matters.

In this article, I’ll walk through a technical approach for implementing automatic Markdown generation in Sitecore XP 10.4 with MVC, including:

The goal is to make Sitecore content available in a clean machine-friendly format without forcing authors to maintain a second version of the same content.

Why Markdown makes sense for AI consumers

Sitecore naturally renders HTML, which works well for browsers but is not always ideal for AI systems.

Markdown has a few clear advantages:

This is not about replacing the standard website output. It is about creating a second representation of the same content, one that is better suited for AI agents, internal knowledge systems, and retrieval pipelines.

What this implementation is trying to solve

The idea is straightforward:

A request like this:

/products/my-page.md

should be handled by Sitecore MVC, mapped to the corresponding content item, converted into Markdown, and returned as a text response.

That gives you an alternate output channel for the same content tree already managed in Sitecore.

This can be useful for:

High-level approach

At a high level, the implementation looks like this:

  1. A request comes in for a URL ending in .md.
  2. MVC routing intercepts that request.
  3. A controller receives the route and resolves the Sitecore item.
  4. A conversion layer reads the relevant fields.
  5. Rich Text content is normalized and converted into Markdown.
  6. The final Markdown document is returned with an appropriate response content type.

This keeps the HTML site experience intact while giving you a second, AI-friendly output format from the same source content.

A few implementation decisions worth making early

Before writing code, it helps to define a few boundaries.

1. Decide what content should be exposed as Markdown

Not every page should necessarily have a .md representation. You may want to restrict it by:

That decision is important, especially if some content should remain strictly presentation-based or internal.

2. Convert from structured content, not from final rendered HTML

It may be tempting to capture the final HTML output of a page and convert that into Markdown. That can work for simple cases, but it often brings along layout noise, repeated components, navigation, and presentation artifacts.

A cleaner approach is to generate Markdown from the item fields or from a dedicated content model. That gives you much tighter control over what is included, in what order, and in what shape.

3. Internal and media links must be expanded before Markdown conversion

This is a critical detail.

If a Rich Text field contains Sitecore internal links or media references, those URLs should be resolved first. Otherwise, the Markdown output may contain incomplete, unresolved, or environment-dependent references.

The correct order is:

  1. read the Rich Text HTML,
  2. expand internal and media URLs,
  3. then convert the resulting HTML to Markdown.

4. ReverseMarkdown helps, but it should not be the whole solution

For Rich Text fields, a significant part of the HTML-to-Markdown conversion can be handled using the ReverseMarkdown DLL. It’s very useful and removes a lot of manual effort.

Still, it works best when the HTML has already been normalized. You’ll usually want at least some pre-processing and, depending on your content, possibly some post-processing as well.

Suggested architecture

A clean way to organize this is to break the solution into a few focused parts:

This separation keeps the controller lean and makes the conversion logic easier to test and evolve over time.

Registering MVC routes for .md requests

The first step is to register a route that catches requests ending in .md and forwards them to the right controller.

The exact implementation depends on your current routing setup, but the idea is simple: register the Markdown route before broader MVC routes so it gets a chance to handle the request first.

Example route registration

public class RegisterMarkdownRoute
{
    public void Process(PipelineArgs args)
    {
        RouteTable.Routes.MapRoute(
            name: "MarkdownRoutes",
            url: "{*path}",
            defaults: new { controller = "Markdown", action = "Page" },
            constraints: new { path = new MarkdownRouteConstraint() }
        );
    }
}

Following the constraint to validate the .md calls

public class MarkdownRouteConstraint : IRouteConstraint
{
    public bool Match(HttpContextBase httpContext, Route route, string parameterName,
        RouteValueDictionary values, RouteDirection routeDirection)
    {
        var raw = values[parameterName] as string ?? string.Empty;

        // Normalize (route gives without leading slash)
        var path = raw.TrimStart('/');

        // Only *.md
        if (!path.EndsWith(".md", StringComparison.OrdinalIgnoreCase))
            return false;

        // Avoid Sitecore/system paths (defense-in-depth; IgnoreRoute already covers most)
        if (path.StartsWith("sitecore", StringComparison.OrdinalIgnoreCase) ||
            path.StartsWith("sitecore_shell", StringComparison.OrdinalIgnoreCase) ||
            path.StartsWith("sitecoreadmin", StringComparison.OrdinalIgnoreCase) ||
            path.StartsWith("-/", StringComparison.OrdinalIgnoreCase) ||
            path.StartsWith("api/", StringComparison.OrdinalIgnoreCase))
        {
            return false;
        }

        return true;
    }
}

What this route should do

At minimum, the route should:

In many implementations, the controller will receive the request path, strip the .md extension, and use the remaining path to locate the correct item.

The MVC controller that receives the route and triggers conversion

The controller is the core of the request flow. It receives the .md request, resolves the Sitecore item behind that path, and returns the Markdown output.

A good controller should stay focused on orchestration:

  1. validate the request
  2. resolve the target item
  3. call a conversion service
  4. return the generated Markdown response

Example controller

public class MarkdownController : Controller
{
    private readonly PageMarkdownGenerator _generator;
    private readonly ISitecoreItemRepository _itemRepository;
    private readonly ICacheManager _cacheManager;

    public MarkdownController(PageMarkdownGenerator generator,
        ISitecoreItemRepository itemRepository,
        ICacheManager cacheManager)
    {
        _generator = generator ?? throw new ArgumentNullException(nameof(generator));
        _itemRepository = itemRepository ?? throw new ArgumentNullException(nameof(itemRepository));
        _cacheManager = cacheManager ?? throw new ArgumentNullException(nameof(cacheManager));
    }

    public ActionResult Page(string path)
    {
        try
        {
            
            var isMarkdownActive = IsMarkdownActive();
            if(!isMarkdownActive) 
            {
                Log.Info("MarkdownController: Markdown feature is not active. Returning 404.", this);
                return new HttpStatusCodeResult(404, "Not Found");
            }

            // Get the request path
            var requestPath = HttpContext?.Request?.FilePath ?? string.Empty;

            if (string.IsNullOrWhiteSpace(requestPath))
            {
                Log.Warn("MarkdownController: Request path is null or empty", this);
                return new HttpStatusCodeResult(400, "Bad Request");
            }

            // Build the full Sitecore item path using the repository
            var fullItemPath = _itemRepository.BuildFullItemPath(requestPath);

            // Retrieve the item from the repository
            var item = _itemRepository.GetItemByPath(fullItemPath);

            if (item == null)
            {
                Log.Warn($"MarkdownController: Item not found at path '{fullItemPath}'", this);
                return new HttpStatusCodeResult(404, "Not Found");
            }

            // Set the context item for downstream processing
            Sitecore.Context.Item = item;

            var cacheKey = $"Markdown_{fullItemPath}";

            var markdown = _cacheManager.GetOrSetInCache(
                cacheKey,
                TimeSpan.FromMinutes(20), // Cache duration
                () =>
                {
                    var site = SiteContextFactory.GetSiteContext("YOURWEBSITE");
                    using (new SiteContextSwitcher(site))
                    {
                        var markdownContent = _generator.Generate(item);
                        return markdownContent;
                    }
                }
            );

            return MarkdownContent(markdown);
        }
        catch (Exception ex)
        {
            Log.Error($"MarkdownController: Markdown delivery failed for path='{path}'", ex, this);
            return new HttpStatusCodeResult(500, "Internal Server Error");
        }
    }

    private static ContentResult MarkdownContent(string markdown)
    {
        return new ContentResult
        {
            Content = markdown ?? string.Empty,
            ContentType = "text/markdown",
            ContentEncoding = Encoding.UTF8
        };
    }

    private static bool IsMarkdownActive()
    {
        var siteSettingsItem = Sitecore.Configuration.Factory.GetDatabase("web")
                              .Items.GetItem(Pages.SiteSettingsId);
        if (siteSettingsItem == null)
        {
            Log.Warn("MarkdownController: Site settings item not found", typeof(MarkdownController));
            return false;
        }
        var isMarkdownActive = siteSettingsItem?.GetCheckboxBoolean(Templates.Markdown.Fields.IsMarkdownActive) ?? false;

        return isMarkdownActive;
    }
}

What the controller should not do

Try not to place all transformation logic directly in the controller. It’s much easier to maintain this feature if the controller delegates the actual conversion to a service layer.

That becomes especially important if you later want to reuse the same Markdown generation logic from:

Converting Sitecore content into Markdown

This is where the real value of the implementation lives.

Not every page type should be converted in exactly the same way, so it usually makes sense to define a conversion strategy based on:

A simple conversion might include:

The important thing is consistency. AI consumers usually benefit more from predictable structure than from clever formatting.

Converting Rich Text fields with ReverseMarkdown

For Rich Text fields, part of the conversion can be handled using the ReverseMarkdown library.

This is a practical choice because it takes care of many common HTML structures such as:

That said, you should not feed raw Sitecore Rich Text HTML straight into the converter without preparing it first.

Expand links before converting

This is one of the most important parts of the pipeline.

Before sending Rich Text HTML into ReverseMarkdown, you should first expand:

If you skip this step, the generated Markdown may contain broken references, partial URLs, or internal representations that only make sense inside Sitecore.

So the conversion flow should look like this:

  1. read the Rich Text HTML from the field
  2. resolve internal links into final public or canonical URLs
  3. resolve media references into usable media URLs
  4. pass the normalized HTML into ReverseMarkdown
  5. apply any final cleanup rules

Example of Rich Text conversion with ReverseMarkdown

/// Serializes a Rich Text field to markdown ///
private void SerializeRichTextField(StringBuilder sb, Field field)
{
var htmlValue = field.Value;

// Expand dynamic links and media links
htmlValue = ExpandDynamicLinks(htmlValue);
htmlValue = ExpandMediaLinks(htmlValue);

if (!TryConvertHtmlToMarkdown(htmlValue, out var mdValue))
{
    // Fallback: output as plain text
    AppendFieldWithValue(sb, htmlValue);
    return;
}

if (!string.IsNullOrWhiteSpace(mdValue))
{
    AppendFieldWithValue(sb, mdValue.Trim());
}

}

private bool TryConvertHtmlToMarkdown(string htmlValue, out string mdValue)
{
mdValue = null;
try
{
var converter = new ReverseMarkdown.Converter();
mdValue = converter.Convert(htmlValue);

    // Check if the converted markdown still contains HTML <section> tags
    if (!string.IsNullOrEmpty(mdValue) && mdValue.Contains("<section"))
    {
        mdValue = ConvertSectionToMarkdown(mdValue);
    }

    if (!string.IsNullOrEmpty(mdValue) && mdValue.Contains("<figure"))
        mdValue = ConvertFigureToMarkdown(mdValue);

    return true;
}
catch (ArgumentException ex)
{
    Log.Error("Error converting Rich Text field: Invalid HTML argument", ex, this);
    return false;
}
catch (InvalidOperationException ex)
{
    Log.Error("Error converting Rich Text field: Invalid operation", ex, this);
    return false;
}

}

Expanding internal links and media item URLs

This step is what makes the Markdown output reliable outside the normal HTML rendering context.

Sitecore Rich Text content can include internal links and media references that depend on the platform’s own link handling. If you want your Markdown to be useful to external consumers, those references need to be fully resolved first.

What this helper should handle

This logic should ideally take care of:

Example helper for URL expansion

private string ExpandDynamicLinks(string text)
{
    if (string.IsNullOrEmpty(text) || !text.Contains(DynamicLinkPrefix))
        return text;

    var sb = new StringBuilder(text.Length);
    int currentIndex = 0;
    int startIndex = text.IndexOf(DynamicLinkPrefix, StringComparison.InvariantCulture);

    while (startIndex >= 0)
    {
        int endIndex = text.IndexOf(DynamicLinkSuffix, startIndex, StringComparison.InvariantCulture);
        if (endIndex < 0)
        {
            break;
        }

        var endPosition = endIndex + DynamicLinkSuffix.Length;
        var url = TryParseDynamicLink(text, startIndex, endIndex);

        sb.Append(text, currentIndex, startIndex - currentIndex);
        sb.Append(url);
        currentIndex = endPosition;

        startIndex = text.IndexOf(DynamicLinkPrefix, currentIndex, StringComparison.InvariantCulture);
    }

    sb.Append(text.Substring(currentIndex));
    return sb.ToString();
}

/// <summary>
/// Attempts to parse and resolve a dynamic link
/// </summary>
private string TryParseDynamicLink(string text, int startIndex, int endIndex)
{
    try
    {
        var linkLength = endIndex - startIndex;
        var dynamicLink = DynamicLink.Parse(text.Substring(startIndex, linkLength));
        return ResolveDynamicLink(dynamicLink);
    }
    catch (ArgumentException ex)
    {
        Log.Warn($"Error parsing dynamic link at position {startIndex}: Invalid argument", ex, this);
        return text.Substring(startIndex, endIndex + DynamicLinkSuffix.Length - startIndex);
    }
    catch (FormatException ex)
    {
        Log.Warn($"Error parsing dynamic link at position {startIndex}: Format error", ex, this);
        return text.Substring(startIndex, endIndex + DynamicLinkSuffix.Length - startIndex);
    }
}

/// /// Expands media links like /media/1D28D6CA27E0447F8EA242CDECABA25E.ashx to their actual URLs ///
private string ExpandMediaLinks(string text)
{
if (string.IsNullOrEmpty(text))
return text;

return MediaLinkRegex.Replace(text, match =>
{
    var mediaIdString = match.Groups[1].Value;

    if (string.IsNullOrEmpty(mediaIdString) || mediaIdString.Length != MediaIdCompactLength)
        return match.Value;

    return TryGetMediaUrl(mediaIdString, match.Value);
});

}

/// <summary>
/// Expands media links like /media/1D28D6CA27E0447F8EA242CDECABA25E.ashx to their actual URLs
/// </summary>
private string ExpandMediaLinks(string text)
{
    if (string.IsNullOrEmpty(text))
        return text;

    return MediaLinkRegex.Replace(text, match =>
    {
        var mediaIdString = match.Groups[1].Value;

        if (string.IsNullOrEmpty(mediaIdString) || mediaIdString.Length != MediaIdCompactLength)
            return match.Value;

        return TryGetMediaUrl(mediaIdString, match.Value);
    });
}

/// <summary>
/// Attempts to get media URL for a media ID string
/// </summary>
private string TryGetMediaUrl(string mediaIdString, string fallbackValue)
{
    try
    {
        var formattedId = FormatMediaId(mediaIdString);
        var mediaUrl = GetMediaUrlById(formattedId);
        return mediaUrl ?? fallbackValue;
    }
    catch (ArgumentException ex)
    {
        Log.Warn($"Error expanding media link for ID {mediaIdString}: Invalid argument", ex, this);
        return fallbackValue;
    }
    catch (InvalidOperationException ex)
    {
        Log.Warn($"Error expanding media link for ID {mediaIdString}: Invalid operation", ex, this);
        return fallbackValue;
    }
}

Overriding EnsureLoggedInForPreview to allow .md requests

If you want .md requests to work during preview, you may need to adjust how Sitecore handles those requests in the preview pipeline.

One of the places this can surface is the EnsureLoggedInForPreview processor. Depending on how the request is interpreted, .md URLs may be blocked or handled differently from standard page requests.

A targeted override can solve that by explicitly allowing valid Markdown requests in preview scenarios.

What this override is meant to solve

This override helps you:

Example override of EnsureLoggedInForPreview

public class EnsureLoggedInForPreviewMarkdown : ActionExecutingProcessor
{
    public override void Process(ActionExecutingArgs args)
    {
        if (IsMarkdownRequest(args.Context))
            return;

        if (Context.PageMode.IsNormal || Context.IsLoggedIn)
            return;

        using (new SiteContextSwitcher(Factory.GetSite("shell")))
        {
            using (new UserSwitcher(Context.User))
                ShellPage.IsLoggedIn(true, (httpContext, loginUrl) => RedirectToLogin(args.Context, loginUrl));
        }
    }

    private static bool IsMarkdownRequest(ActionExecutingContext context)
    {
        if (context?.HttpContext?.Request == null)
            return false;

        var requestPath = context.HttpContext.Request.FilePath ?? string.Empty;
        return requestPath.EndsWith(".md", StringComparison.OrdinalIgnoreCase);
    }

    private static void RedirectToLogin(ActionExecutingContext actionContext, string loginUrl)
    {
        Assert.ArgumentNotNull(actionContext, nameof(actionContext));
        Assert.ArgumentNotNull(loginUrl, nameof(loginUrl));
        actionContext.Result = new RedirectResult(loginUrl, false);
    }
}

Keep the security scope narrow

This is worth being careful about.

The goal is not to weaken preview security. The goal is simply to allow a very specific request pattern. Your override should stay strict and validate the request as narrowly as possible.

Sitecore patch configuration

If you are replacing or inserting a custom processor into the pipeline, you’ll also need a patch config to register it.

Example patch config

<?xml version="1.0"?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:set="http://www.sitecore.net/xmlconfig/set/">
  <sitecore>
    <services>
      <configurator type="Foundation.Markdown.ServicesConfigurator, Oshyn.Feature.Markdown" />
    </services>
    <pipelines>
      <initialize>
        <processor type="Foundation.Markdown.Routes.RegisterMarkdownRoute, Oshyn.Feature.Markdown" 
                   patch:after="processor[@type='Sitecore.Mvc.Pipelines.Loader.InitializeRoutes, Sitecore.Mvc']" />
      </initialize>
      <mvc.actionExecuting>
        <processor type="Foundation.Markdown.Pipelines.EnsureLoggedInForPreviewMarkdown, Foundation.Markdown"        patch:instead="processor[@type='Sitecore.Mvc.Pipelines.Request.ActionExecuting.EnsureLoggedInForPreview, Sitecore.Mvc']" />
      </mvc.actionExecuting>
    </pipelines>
  </sitecore>
</configuration>

Practical recommendations

Once the basic implementation is in place, a few refinements can make it much more useful in real projects.

Define a stable output contract

If the Markdown is going to feed AI systems, consistency matters a lot.

It helps to define a clear output contract, such as:

Filter aggressively

Not everything that appears on a web page is useful to an AI consumer.

Menus, breadcrumbs, promos, CTAs, utility blocks, and layout fragments often add noise. The Markdown version should focus on the actual content, not the entire page chrome.

Consider including metadata

Depending on your downstream use case, it may help to include metadata such as:

That can be useful for indexing, traceability, or debugging.

Think about caching

If the Markdown output will be requested frequently, caching is worth considering.

Repeatedly resolving Sitecore links, processing Rich Text, and converting HTML can add unnecessary overhead. Caching by item, language, version, or publish state can improve performance significantly.

Why this pattern works well

One of the strengths of this approach is that it lets you reuse the content model you already have.

You are not asking authors to maintain separate Markdown content. You are not duplicating data. You are simply creating an alternate output channel from the same source.

That gives you a practical way to make Sitecore content available to AI systems while staying within a familiar MVC architecture.

It also leaves room to grow. Once you have this pattern in place, it becomes much easier to add other output formats later, such as:

Final thoughts

Automatic Markdown generation in Sitecore XP 10.4 MVC is a very practical pattern when you need to expose managed content to AI agents and downstream machine consumers.

The key is not to treat this as a simple HTML conversion task. It works better when you treat it as a controlled content projection.

That is why the strongest implementation usually includes:

With that in place, you can turn standard Sitecore-managed content into a clean Markdown representation that is much more useful for AI workflows.

Conclusion

This is a small architectural addition, but it can unlock a lot of value.

By exposing .md versions of Sitecore content, you create a bridge between traditional CMS delivery and AI-ready content consumption. That bridge becomes even stronger when the Markdown is clean, predictable, and generated from the right content sources.

If the implementation is done carefully, this can become a solid foundation for search, retrieval, assistants, and content-aware AI features built on top of Sitecore.