Summary
Over the years, Google’s ability to crawl and index web content has significantly changed. Seeing this evolution is important to understand the current state of SEO for modern web applications. We analyzed over 100,000 Googlebot fetches across various sites to test and validate Google�'s SEO capabilities.
With a better understanding of what Google is capable of, let's look at some common myths and how they impact SEO. Key aspects of the current system include:Universal rendering: Google now attempts to render all HTML pages, not just a subset. Stateless rendering: Each page render occurs in a fresh browser session, without retaining cookies or state from previous renders.
For this article, we primarily focused on data from Googlebot, which provided the largest and most reliable dataset. Our analysis included over 37,000 rendered HTML pages matched with server-beacon pairs. We are still gathering data about other search engines, including AI providers like OpenAI and Anthropic.
Out of over 100,000 Googlebot fetches analyzed on nextjs.org, 100% of HTML pages resulted in full-page renders. All content loaded asynchronously via API calls was successfully indexed.Streamed content via RSCs was also fully rendered, confirming that streaming does not adversely impact SEO.
Many SEO practitioners believe that JavaScript-heavy pages face significant delays in indexing due to a rendering queue. Our analysis focused on how Google processes pages with different status codes (200, 304, 3xx, 4xx, 5xx) and those with noindex meta tags.
To address the impact of rendering queue and timing on SEO, we investigated:Rendering delays. We looked at how often Google re-renders pages and if there were patterns in rendering frequency for different types of content. We analyzed rendering times for URLs with and without query strings.
Google can discover links in non-rendered JavaScript payloads on the page, such as those in React Server Components or similar structures. For example, on nextjs.org, pages with?ref= parameters experienced longer rendering delays, especially at higher percentiles. Google processes content by identifying strings that look like URLs.
The source and format of a link (e.g., in an tag or embedded in a payload) did not impact how Google prioritized its crawl. Crawl priority remained consistent regardless of whether a URL was found in the initial crawl or post-rendering.
Rendering prioritization: Google's rendering process isn't strictly first-in-first-out. Factors like content freshness and update frequency influence prioritization more than JavaScript complexity. While Google can effectively render JS-heavy pages, the process is more resource-intensive compared to static HTML.
There are some differences between rendering strategies when it comes to Google’s abilities. These fine-grained differences exist, but Google will quickly discover and index your site regardless of rendering strategy. Focus on creating performant web applications that benefit users more than worrying about special accommodations for Google's rendering process.
MERJ is a leading SEO and data engineering consultancy specializing in technical SEO and performance optimization for complex web applications. We can walk you through how it works for your application. If you need assistance with any of the SEO topics raised in this research, don't hesitate to contact MERJ.