Technical SEO Audit Guide: Find and Fix Every Issue

What Is a Technical SEO Audit and Why Does It Matter?

A technical SEO audit is a systematic review of every factor that affects how search engines crawl, index, and rank your website. Unlike content strategy or link building, technical SEO focuses on the infrastructure beneath your pages: the HTTP responses, the HTML markup, the site architecture, and the signals that tell Google whether your site is fast, accessible, and trustworthy.

If your pages can’t be crawled, they won’t be indexed. If they aren’t indexed, they won’t rank. It’s that simple. Even the best content in the world is invisible if a misconfigured robots.txt blocks Googlebot, or if a stray noindex tag tells search engines to ignore your most important landing page.

A thorough technical SEO audit checklist covers crawlability, indexation, on-page signals, site speed, structured data, HTTP status codes, and more. Whether you’re running a 50-page marketing site or a 500,000-page e-commerce catalog, the fundamentals are the same. The difference is in how you scale the process.

In this guide, we’ll walk through every major category of a technical SEO audit, explain what to look for, why it matters, and how to automate the entire workflow with a single SEO analysis tool so you never have to audit manually again.

Crawlability and Indexation

Crawlability is the foundation of technical SEO. Before anything else, search engines need to discover and access your pages. Indexation determines which of those crawled pages actually make it into the search index.

Robots.txt

Your robots.txt file is the first thing a crawler reads. It tells bots which paths they’re allowed to visit and which they should skip. A single misplaced Disallow: / can block your entire site from being indexed. During an audit, verify that:

Critical pages and directories are not disallowed
Staging or admin paths are blocked appropriately
The file is accessible at the root (/robots.txt) and returns a 200 status

Meta Robots Tags (noindex / nofollow)

Even if a page is crawlable, a <meta name="robots" content="noindex"> tag in the <head> will prevent it from appearing in search results. A nofollow directive tells crawlers not to follow outbound links on that page, which affects how link equity flows through your site.

Common issues include:

Accidentally applying noindex to important pages after a migration
Leaving noindex tags from staging environments in production
Using nofollow on internal links, which wastes crawl budget

Canonical Tags

Canonical tags (<link rel="canonical" href="...">) tell search engines which version of a page is the “original” when duplicate or near-duplicate content exists. Misconfigured canonicals are one of the most common technical SEO problems:

Self-referencing canonicals should point to the page’s own URL
Pages with query parameters should canonicalize to the clean version
Canonical URLs should use the correct protocol (HTTPS, not HTTP)
Canonicals should never point to a 404, a redirect, or a noindex page

XML Sitemaps

An XML sitemap is a roadmap for search engines. It lists every page you want indexed, along with optional metadata like last-modified dates and priority hints. A well-maintained sitemap helps search engines discover new content faster and understand your site’s structure.

During your audit, check that:

The sitemap is referenced in robots.txt
It only includes pages that return 200 status codes
It does not include noindex pages or redirects
It’s been updated recently (stale sitemaps signal neglect)

With crawler.sh, you can generate a fresh XML sitemap directly from your crawl results, ensuring it always reflects the actual state of your site. See the sitemap generation feature for details.

Crawl Budget

Crawl budget is the number of pages a search engine will crawl on your site within a given timeframe. For large sites, wasting crawl budget on low-value pages (faceted navigation, session-based URLs, infinite scroll pagination) means important pages get crawled less frequently.

Optimizing crawl budget involves blocking low-value paths in robots.txt, using canonical tags to consolidate duplicates, and ensuring your internal linking prioritizes high-value pages.

On-Page SEO Checks

On-page SEO elements are the HTML signals that directly tell search engines what a page is about. Getting these right is table stakes for ranking.

Title Tags

The <title> tag is arguably the single most important on-page ranking factor. It appears in search results as the clickable headline and strongly influences click-through rates.

Common title tag issues to audit:

Missing titles - Pages without a <title> tag have no way to communicate their topic to search engines
Short titles (under 30 characters) - These underuse the available space and miss keyword opportunities
Long titles (over 60 characters) - Google truncates titles in SERPs, so anything beyond ~60 characters may be cut off
Duplicate titles - Multiple pages sharing the same title confuse search engines about which page to rank for a given query

Title tag optimization is one of the highest-ROI activities in SEO. A clear, keyword-rich title with the right length can improve rankings and click-through rates simultaneously.

Meta Descriptions

While meta descriptions aren’t a direct ranking factor, they heavily influence click-through rates from search results. Google uses them as the snippet text beneath your title.

Audit for:

Missing meta descriptions - Google will auto-generate a snippet, which may not be ideal
Short descriptions (under 70 characters) - Missed opportunity to sell the click
Long descriptions (over 160 characters) - Will be truncated in SERPs
Duplicate descriptions - Suggest templated or thin content

Heading Structure

A logical heading hierarchy (H1 → H2 → H3) helps both users and search engines understand content organization. Check for:

Pages missing an H1 tag
Multiple H1 tags on a single page
Skipped heading levels (H1 → H3 with no H2)
Headings that don’t include relevant keywords

Content Quality Signals

Search engines are increasingly sophisticated at evaluating content quality. Thin content, duplicate content, and missing content are all red flags.

Thin Content

Pages with very little text content (under 200-300 words) often struggle to rank because they don’t provide enough information to satisfy user intent. Thin content is especially problematic for:

Category pages with only product listings and no descriptive text
Tag or archive pages that are auto-generated
Placeholder pages that were published before content was written

Duplicate Content

Duplicate content doesn’t trigger a penalty per se, but it dilutes ranking signals. When multiple pages have identical or near-identical content, search engines must choose which one to index, and they may choose the wrong one.

Common duplicate content sources:

HTTP vs. HTTPS versions of the same page
www vs. non-www variations
URL parameters creating multiple versions of the same content
Printer-friendly or mobile-specific duplicate pages

Missing Content

Pages that return a 200 status but contain no meaningful body content are “soft 404s.” Search engines may eventually figure this out, but in the meantime, these pages waste crawl budget and confuse your site’s topical relevance.

URL Structure

Clean, descriptive URLs are both a ranking signal and a usability factor.

URL Length

Excessively long URLs (over 100-120 characters) can be harder to share, may get truncated in some tools, and correlate with lower click-through rates. Keep URLs concise and descriptive.

Clean URLs

URLs should be human-readable and free of unnecessary parameters, session IDs, or tracking fragments. Compare:

Good: /blog/technical-seo-audit-guide/
Bad: /blog/index.php?id=4827&session=abc123&ref=email

Pagination

For paginated content (category pages, blog archives), proper handling via rel="next" and rel="prev" tags helps search engines understand the relationship between pages in a series. While Google has said they no longer use these tags as an indexing signal, they still help other search engines and can influence crawl patterns.

Site Speed and Core Web Vitals

Page speed has been a confirmed Google ranking factor since 2010, and Core Web Vitals became a ranking signal in 2021. Slow pages hurt both rankings and user experience.

Largest Contentful Paint (LCP)

LCP measures how long it takes for the largest visible element (usually a hero image or heading) to render. Google considers an LCP of 2.5 seconds or less to be “good.”

Common LCP issues:

Unoptimized hero images (no compression, no modern formats like WebP/AVIF)
Render-blocking CSS or JavaScript
Slow server response times (TTFB)
Client-side rendering delays

Cumulative Layout Shift (CLS)

CLS measures how much the page layout shifts during loading. A CLS score of 0.1 or less is “good.” Layout shifts are frustrating for users and signal poor page quality.

Common CLS issues:

Images and iframes without explicit width/height attributes
Dynamically injected content (ads, banners, cookie notices)
Web fonts causing text reflow (FOIT/FOUT)

Interaction to Next Paint (INP)

INP replaced First Input Delay (FID) as a Core Web Vital in 2024. It measures the responsiveness of a page to all user interactions, not just the first one. An INP of 200 milliseconds or less is “good.”

Common INP issues:

Heavy JavaScript execution blocking the main thread
Long tasks that delay event handling
Third-party scripts (analytics, chat widgets, ads)

Image Optimization

Images are often the largest assets on a page. During your audit, check for:

Missing alt attributes (accessibility and SEO issue)
Images served without compression
Missing lazy loading on below-the-fold images (loading="lazy")
Images not served in modern formats (WebP, AVIF)

Structured Data and Rich Results

Structured data (Schema.org markup, typically in JSON-LD format) helps search engines understand the content and context of your pages. It can also unlock rich results in SERPs: star ratings, FAQ dropdowns, recipe cards, event dates, and more.

Common Schema Types

Article - Blog posts and news articles
Product - E-commerce product pages with price, availability, and reviews
FAQ - Frequently asked questions with expandable answers in SERPs
BreadcrumbList - Breadcrumb navigation shown in search results
Organization - Company information, logo, social profiles
LocalBusiness - Physical business details for local SEO

Structured Data Audit Checklist

Validate all structured data with Google’s Rich Results Test
Ensure required properties are present for each schema type
Check that structured data matches the visible page content (no cloaking)
Verify JSON-LD syntax is error-free
Look for deprecated schema types or properties

HTTP Status Codes

HTTP status codes tell both browsers and search engines what happened when they requested a page. Incorrect status codes can cause indexation problems, wasted crawl budget, and broken user experiences.

Redirects (301 and 302)

301 (Permanent Redirect) - Passes full link equity to the destination. Use for permanent URL changes.
302 (Temporary Redirect) - Signals the original URL may return. Search engines may not transfer ranking signals.

Audit for redirect chains (A → B → C → D), which slow down crawling and dilute link equity. Also check for redirect loops, where pages redirect back to each other infinitely.

404 Errors

Broken links that lead to 404 pages waste crawl budget and create a poor user experience. Internal broken links are especially damaging because they’re entirely within your control. During your audit:

Identify all internal links pointing to 404 pages
Check for external backlinks pointing to pages that no longer exist
Implement 301 redirects for high-value 404 pages
Create a custom 404 page that helps users navigate back to useful content

5xx Server Errors

Server errors (500, 502, 503) indicate backend problems. If Googlebot encounters frequent 5xx errors, it may reduce your crawl rate or even deindex affected pages. Monitor your server logs and error tracking to catch these issues before they impact SEO.

Redirect Chains and Loops

A redirect chain occurs when a URL redirects through multiple intermediate URLs before reaching its final destination. Each hop adds latency and can lose a small percentage of link equity. Best practice is to ensure all redirects point directly to the final destination URL.

A redirect loop occurs when URL A redirects to URL B, which redirects back to URL A (or through a longer cycle). These result in browser errors and make the affected pages completely inaccessible.

Automating Your Audit with crawler.sh

Running a technical SEO audit manually is tedious and error-prone. Even experienced SEOs miss things when reviewing hundreds or thousands of pages by hand. That’s why automation matters.

crawler.sh is a fast, lightweight SEO crawler that runs from your terminal. It performs a full site crawl and then runs 23 automated checks across every page, flagging the exact issues covered in this guide. The entire workflow is three commands.

Step 1: Crawl Your Site

crawler crawl https://example.com

This performs a breadth-first crawl of your entire site, respecting robots.txt, following internal links, and recording every page’s status code, response time, title, meta description, headings, and more. Results are saved to a .crawl file in NDJSON format.

You can control concurrency, depth, and page limits with flags. See the CLI reference for the full list of options:

crawler crawl https://example.com --max-pages 5000 --concurrency 10 --max-depth 5

Step 2: Run the SEO Audit

crawler seo example.com.crawl

This analyzes the crawl data and runs all 23 checks. You’ll get a categorized report covering:

Title Tag Checks

Missing title tags
Short titles (under 30 characters)
Long titles (over 60 characters)
Duplicate titles across pages

Meta Description Checks

Missing meta descriptions
Short descriptions (under 70 characters)
Long descriptions (over 160 characters)
Duplicate descriptions across pages

Content Checks

Missing H1 tags
Thin content pages (low word count)
Missing content / empty pages
Pages without meta viewport tag

Technical Checks

Broken internal links (4xx/5xx status)
Redirect chains
Slow pages (high response time)
Missing alt attributes on images

That’s the full list of checks - every major issue category from this guide, automated and applied to every page in a single pass.

Step 3: Export the Results

crawler seo example.com.crawl --export csv

This exports the audit findings to a CSV file that you can open in any spreadsheet application, share with your team, or import into a project management tool. Each row is a specific issue on a specific page, making it easy to prioritize and assign fixes.

You can also export to TXT format for a plain-text summary:

crawler seo example.com.crawl --export txt

Beyond SEO audit exports, crawler.sh supports multiple output formats for different use cases.

JSON Export

For programmatic processing or integration with other tools, export your crawl data as JSON:

crawler export example.com.crawl --format json

This produces a structured JSON file containing every crawled page with its full metadata, perfect for feeding into custom dashboards, data pipelines, or other SEO analysis tools.

Sitemap Generation

Need a fresh XML sitemap based on what’s actually live on your site? Generate one directly from your crawl data:

crawler export example.com.crawl --format sitemap

This sitemap generation feature creates a valid XML sitemap containing only the pages that returned a 200 status code during the crawl, ensuring your sitemap is always in sync with reality.

Content Extraction

crawler.sh also supports content extraction for pages in your crawl, allowing you to pull the main body content in clean Markdown format. This is useful for content audits, migration projects, and feeding content into other systems.

Putting It All Together: Your SEO Audit Checklist

Here’s a summary checklist you can follow for every technical SEO audit:

Crawlability and Indexation

Review robots.txt for unintended blocks
Check for stray noindex tags on important pages
Validate canonical tag implementation
Ensure XML sitemap is current and referenced in robots.txt
Assess crawl budget efficiency

On-Page SEO

Fix missing, short, long, and duplicate title tags
Fix missing, short, long, and duplicate meta descriptions
Ensure every page has a single H1 tag
Verify logical heading hierarchy

Content Quality

Identify and expand thin content pages
Resolve duplicate content with canonicals or redirects
Remove or redirect empty / soft-404 pages

URL Structure

Keep URLs under 120 characters
Ensure URLs are clean and descriptive
Implement pagination markup where needed

Site Speed

Achieve LCP under 2.5 seconds
Keep CLS under 0.1
Keep INP under 200 milliseconds
Optimize and lazy-load images

Structured Data

Implement relevant Schema.org markup
Validate with Rich Results Test
Keep structured data in sync with visible content

HTTP Status Codes

Fix all internal broken links (4xx errors)
Resolve redirect chains to single-hop 301s
Monitor and fix 5xx server errors
Eliminate redirect loops

Conclusion

A technical SEO audit isn’t a one-time project. Sites change constantly - new pages are published, old pages are removed, redesigns shift URL structures, and third-party scripts come and go. The sites that rank well are the ones that audit regularly and fix issues before they compound.

The difference between a manual audit and an automated one is the difference between auditing once a quarter and auditing every week. With crawler.sh, you can run a full technical SEO audit in under a minute, get actionable results across all 23 checks, and export everything your team needs to start fixing issues immediately.

Ready to find every technical SEO issue on your site? Download crawler.sh and run your first audit - install in one command, results in seconds.

curl -fsSL https://install.crawler.sh | sh