Definitions of common SEO and web crawling terms used across our guides.
Outgoing links and images that return 404 or fail to connect hurt user trust and waste crawl budget
A canonical URL is an HTML element that tells search engines which version of a page is the preferred one for indexing.
Chunking is the process of splitting long documents into smaller pieces that fit within a language model context window.
Click-through rate (CTR) is the percentage of users who click on a search result after seeing it, measuring how compelling your listing is.
Search engines and AI answer engines use dateModified and datePublished to gauge recency. Pages missing these or showing stale dates lose visibility.
A context window is the maximum amount of text a language model can process in a single request, measured in tokens.
Crawl budget is the number of pages a search engine will crawl on your site within a given timeframe.
Crawlability is the ability of search engine bots to access and navigate pages on a website.
Data cleaning is the process of removing noise, duplicates, and low-quality content from datasets before training language models.
Pages sharing the same meta description compete for clicks in search results
Multiple pages sharing the same H1 weakens differentiation and rankings
Pages sharing the same title compete with each other in search results
E-E-A-T stands for Experience, Expertise, Authoritativeness, and Trustworthiness - the quality criteria Google uses to evaluate web content and its creators.
Embeddings are dense vectors that capture semantic meaning, enabling machines to compare, search, and cluster content by meaning.
Pages with an H1 tag that contains no text waste heading potential
An H1 tag is the primary heading element on a web page, signaling the main topic to search engines and users.
Hallucination is when a language model generates false, unverifiable, or fabricated information presented as factual.
A headless browser is a web browser without a graphical user interface, used for automated testing and data extraction.
HTTP status codes are three-digit numbers returned by a server to indicate the result of a browser or crawler request.
Indexability refers to whether a search engine can add a page to its index and display it in search results.
Showing 20 of 73 terms