Three Strikes & You’re A Splog!

Since Sentinel, when parsing RSS feeds, ignores all punctuation and most extremely short words, it can easily see through most simple text manipulations such as restructuring sentences and introducing false paragraph breaks. However, Blogwerx took things a step further and built in a thesaurus to Sentinel’s algorithm, making it capable of detecting copies that have been rewritten in minor ways and, potentially, even articles that have been “spun” by synonymizing software.

If this works as planned, it will put Sentinel a generation ahead of other plagiarism searching techniques, most of which require the use of a “dumb” search engine that only detects exact matches.

A drawback to the synonym checking feature of Sentinel is that, most likely, it will not be available to users of the free product. Though Blogwerx’s programmers have been able to add the feature to the service without hurting speed, the latest version of the software can process up to ten million feeds per day with the potential for many more as the service expands, the added burden of the service still prohibits it from being freely available at this time.

However, if early signs from Blogwerx are any indication, the paid versions of its service will begin at approximately five dollars a month, making it comparable to Feedburner and the most basic versions of Copysentry, the paid version of Copyscape.

One of the more interesting side features to Sentinel is the ability for users to mark infringing blogs as spam blogs (splogs). After three strikes of confirmed plagiarism, the blog is officially listed as a splog and moved into a database that will be publicly available via an API.

This could, potentially, be used to create applications that work to prevent scraping or aid search engines in blacklisting useless blogs. It can also make an excellent addition to other splog databases, such as SplogSpot, that work to catch all junk blogs but may not spot outright plagiarized blogs.

Revolution Theme for WordPress

Related Articles

Sorry, comments are closed.