
Brave has open-sourced a new tool named Cookiecrumbler, which leverages open-source large language models (LLMs) to detect cookie consent notices across the web and suggest non-breaking blocking solutions.
Alongside this, the company is now publishing the results of Cookiecrumbler’s website crawls on GitHub, inviting community contributions to improve detection accuracy and coverage.
According to the announcement, the motivation behind Cookiecrumbler is to scale and refine Brave’s cookie notice blocking without the collateral damage that generic filter rules often cause — such as broken layouts, missing site functionality, and rendering issues.
Brave has long blocked cookie consent banners by default, considering them both privacy-invasive and largely redundant within its ecosystem, which already blocks third-party tracking scripts and pixels. However, implementing banner-blocking without breaking websites remains a technical challenge due to the wide variety of consent banner implementations across sites and regions.
Brave Software Inc., known for its privacy-focused Chromium-based browser, has carved out a niche by integrating tracker blocking, HTTPS upgrades, and anti-fingerprinting protections directly into the browser. Cookie consent banners have been a consistent source of user frustration, and Brave has positioned their removal as a logical extension of its privacy philosophy. But unlike conventional approaches that rely on generalized adblock filter rules, Cookiecrumbler allows Brave to detect and handle cookie banners in a site-specific, non-invasive manner.
Cookiecrumbler automates the detection of cookie notices using a lightweight LLM deployed on Brave’s backend infrastructure. The tool identifies likely cookie banners by analyzing HTML elements from live site renderings in Puppeteer — a headless browser tool — launched from various regional proxies to simulate different geolocations. This enables detection of language- and region-specific notices, including non-English implementations, which are often missed by traditional methods.
Each website scan begins with a region-tailored version of the Tranco site popularity list. Crawling scripts then visit each site and invoke the Cookiecrumbler API. The tool evaluates the page, identifies potential cookie banners, classifies them using the LLM, and optionally recommends mitigation strategies. These results are then reviewed by human filter list maintainers to prevent overblocking and ensure accuracy before deployment.
Importantly, Brave is publishing the results of these crawls — including detected cookie notices and relevant metadata — as GitHub issues. This approach aims to crowdsource validation and correction from the broader ad blocking and privacy community. Brave also reports a measurable drop in breakage complaints and increased user retention following early internal use of Cookiecrumbler.
While Cookiecrumbler currently operates entirely on Brave’s servers, the team is exploring the possibility of embedding its capabilities into the browser itself, pending privacy reviews. This would bring intelligent cookie banner detection directly to users’ devices, potentially allowing real-time, privacy-preserving blocking without relying on third-party filter lists.
Leave a Reply