Scoring sites by user behaviour

The patent caps long visits. Not because Google distrusts users who linger. It caps them because someone may have walked away from the screen. A page that has been open for an hour, the patent notes, should not count as an hour of satisfaction. So a ceiling is set.

The detail is small. It hides under a stack of formulas in a 2015 patent titled Scoring Site Quality (US 9,195,944 B1). It is the first clue that what is being measured is not time. Time is a proxy. The real object — what Google has been trying to estimate for a long time — is whether a user got what they came for.

That object is hard to name. The patent names it openly. And once it is named, the rest of the patent — and most of what Google has published since — bends toward a single problem. Usefulness is not directly observable. Every signal a ranker uses is therefore a proxy for it. And proxies are not all equally cheap to fake.

Signals a publisher controls — keywords, headings, length, schema markup, internal links — can be pushed by one person in an afternoon. Signals produced by users at scale cannot. A million strangers’ clicks, returns, and time-to-back-button do not coordinate; the cost of faking them rises sharply with the scale at which the engine measures them. So the direction of travel falls out before anyone chooses it. A search engine that wants to keep improving has to lean further on the signals its adversaries find least cheap to fake. Behaviour at scale is one of the densest pockets available.

The 2015 patent

The 2015 patent is the first public slice of Google moving that way.

The mechanism itself is plain. A user clicks a search result, lands on a page, and at some point returns to the search results to keep looking. The interval between click and return is recorded. If the interval is very short, the visit is discarded as too brief to mean anything. If it is very long, the ceiling I just mentioned applies. Different content types receive different baselines: a person can judge an image at a glance, an article in a minute, a video only after watching for a while.

These per-visit measurements are then aggregated and turned into a site quality score. Not a page score. A site score.

Three details

Three details from the patent are worth slowing down for. Each is what the constraint above looks like in code.

The first is the unit of evaluation. A “site” in the patent isn’t always a domain. It can be a subdomain, a directory, or a cluster of resources on a server — whatever grouping the system finds useful. A low-quality page doesn’t only hurt the URL it appears on. It can pull down the directory it sits in. It can pull down the subdomain. It can pull down the whole collection. The unit moves up because per-page behaviour is sparse: most pages don’t have enough visits to score reliably, and aggregating to a collection raises the signal-to-noise ratio. The cost is collateral damage to neighbours of the bad page. The system takes the trade.

The second is what the score affects. The natural assumption is ranking. The patent lists more: whether to crawl a resource at all, whether to refresh it in the index, whether to add it to the index in the first place. The score governs Google’s resource allocation toward the site, not just where the site sits on a results page. A bad score doesn’t only bury a page. It can make the crawler less interested in coming back. If a collection isn’t worth ranking, it isn’t worth looking at.

The third is the filtering. The patent is explicit that anomalous behaviour — odd click distributions, suspicious cookies, irregular user-agent patterns, manipulated queries — is discarded before aggregation. This is the forge-resistance argument made operational: the system throws out the cheapest-to-fake fraction of its own input to protect the trustworthiness of the rest.

Read together, these details point somewhere. The evaluation unit has moved up. Not from page to site exactly — pages are still scored — but the frame now spans a collection. And the input has moved sideways: from what a page contains to what people do after they land on it.

This makes a familiar form of SEO harder. Keyword density, link graphs, structured markup — these are pushed by the publisher. Behaviour is pulled from the user. The publisher can shape the page; they cannot shape what users do next. They can, indirectly, by genuinely answering the query, which is the long way around.

The shift is from optimising signals to optimising outcomes. The two used to be close enough that gaming the former approximated the latter. The patent is one of several reasons to suspect that distance is widening.

The 2022 update

For seven years the trajectory was visible mostly in patents and quiet ranking shifts. In August 2022, Google said it out loud.

The post that launched the “helpful content update” introduced what it called, in plain language, a site-wide signal. Pages on sites with too much unhelpful content would underperform — all of them, not just the unhelpful ones. The wording reads almost like a paraphrase of the patent:

Any content — not just unhelpful content — on sites determined to have relatively high amounts of unhelpful content overall is less likely to perform well in Search.

The neighbourhood, named.

The same post introduced an opposition: people-first content versus search-engine-first content. Read against the proxy frame, this is the publisher-pushed / user-pulled distinction in policy language. The self-assessment list that follows asks publishers to consider whether a reader leaves satisfied, whether they would return again, whether they would search again afterwards for better information elsewhere. These are not content questions. They are behavioural predictions. The publisher is being asked to anticipate the signal the user will generate.

The 2024 policies

In March 2024, the helpful content system was folded into the core ranking system. The post is explicit: “There’s no longer one signal or system used to do this.” What was a named update in 2022 is now spread across many systems. The trajectory has passed the point of being a feature you can point at.

The same post introduced three new spam policies:

Expired domain abuse — buying a lapsed domain to inherit its past reputation, then hosting low-value content under it.
Scaled content abuse — producing pages at volume to manipulate ranking, “no matter whether content is produced through automation, human efforts, or some combination.”
Site reputation abuse — third-party pages hosted on a strong domain with little oversight, riding on the host’s ranking signals.

All three are the same shape. They are not page-level spam. They are attempts to borrow reputation — past reputation, fabricated reputation, parasitic reputation. They are what attackers do once the unit of evaluation has risen above the page. When the score lives at the collection, the gaming moves up to the collection. These policies are not new ideas. They are the patches the trajectory required.

Three points on one line

The patent, the update, the policies — three points on one line. A system pushed to lean further on the signals its adversaries can’t cheaply forge. Drawn upward toward the collection because per-page behaviour is too sparse to score on its own. And forced, once the collection becomes the unit, to defend the collection’s reputation as an attack surface in its own right.

A page is something one person can shape. A site’s behavioural footprint, at scale, is not. So the system asks the publisher fewer questions and the user more. Not because users are infallible — but because their answers, at the scale ranking operates, are the most expensive part of the system to forge.

References