How to Spell Check a Website at Scale

Checking a single document for spelling and grammar errors can be difficult enough, but imagine checking a massive website with thousands of pages.

To be certain, spelling and grammar tools are ubiquitous for individual documents. However, the availability of tools is no guarantee of perfection.

Mistakes Made

I subjected this article to the Google Docs built-in spelling and grammar check (Command-Option-X on a Mac) and Grammarly. Yet the editor will undoubtedly have plenty of opportunities for corrections and amendments. Likely suspects include articles (a, an, the), word endings, and typos.

Screenshot of Grammarly interface for "Screamingfrog."

Grammarly finds a spelling error in the proper name Screaming Frog.

Now imagine the same task at scale.

Here is a scenario. You’ve just purchased a blog with 17,000 articles describing do-it-yourself products. The idea was to use the blog to drive traffic to your online craft supply shop. But you’ve noticed that the previous owners had numerous grammar and spelling errors.

You don’t relish the idea of checking 17,000 articles individually. So what do you do?

Here are a few options.

Not Too Technical

If your technical chops amount to using software, there are a few options for spell checking an entire site — including the 17,000 article DIY blog described above.

Screaming Frog. Screaming Frog SEO Spider is an essential search engine optimization and keyword research tool. It will also spell check an entire website.

The company has a detailed tutorial on setting up spelling and grammar crawls. Turn on the spell and grammar check, and like magic, SEO Spider will identify and report errors. You could also export a list of pages to update. SEO Spider supports several languages, too.

It’s a premium feature requiring the licensed version, which, at the time of writing, was £149.00 per year (about $195.95).

Screenshot of Screaming Frog spell check page

Screaming Frog makes it easy to add spell and grammar check to crawls.

SortSite. PowerMapper’s SortSite is a favorite tool for broken link monitoring and website accessibility testing. The tool also spell checks, finding misspelled words and placeholder text such as “lorem ipsum.” And, when configured, it can recognize unusual words or names.

A perpetual license for the desktop version of SortSite was $149 at the time of writing.

Screenshot of SortSite home page

SortSite is a powerful tool that includes a good spelling check too.

Various online tools. A quick Google search produces many free online spell checkers. For example, Internet Marketing Ninjas offers a free spelling checker for up to 1,000 pages. But the tool has a limited dictionary. It doesn’t recognize “podcast,” for example.

Technical

There are more options for full-site spelling and grammar checkers via an application programming interface or command-line software. Both require more work to set up than SEO Spider or SortSite, but they may offer a more robust review.

Moreover, it could be worth your time for a 17,000-article blog.

In each case, you would pass to the API (or Aspell, below) the text of each page. This might come from a database connection, an export, or a web crawler. The API would then return a list of spelling and grammar errors.

Bing Spell Check API. Search engines such as Microsoft Bing need to understand searchers’ spelling and grammar.

The Bing Spell Check API is driven by machine learning and goes beyond matching words in a dictionary. It’s one of the best choices in terms of the quality of results.

But it does have limitations. In “proof” mode, the API will only permit text strings of 4,096 characters or fewer. That adds up to something like 800 words. Longer articles would need to be split up and sent in a few “transactions.”

Pricing is tier-based. In March 2022, one might expect to pay $7 for every 25,000 monthly transactions.

WProofreader SDK. Using WebSpellChecker’s software development kit is analogous to deploying a jackhammer to insert a nail, but it will certainly do the job.

The SDK has components for adding spelling and grammar checks to apps, but for this context, it also has a standalone HTTP API. Rates vary with use.

Other APIs. Other API options beyond Bing and WebSpellChecker include GrammarBot, TextGears, and PerfectTense.

GNU Aspell. This command-line spell checker is free and typically installed on a Linux system (which runs most websites).

Using Aspell will still take some coding but relatively less than the other technical solutions above. Get the web pages into a text format, and then write a script to call Aspell for each file.

Original Article