Back in December I wrote an article on how site scrapers were gaming Google’s search algorithm in order to make advertising money from articles that they scrape from RSS feeds. Just after I wrote this piece, Google got religion about spam sites and revamped their search engine results to penalise content farms. Yet, the problem with scrapers still remains.
Here’s how it works. Someone who doesn’t have original content picks a few newsworthy topics to follow. Typically we are talking about politics and business. The scraper then sets up a website oriented around those topics and makes sure to check all the boxes that define a normal high-content, multi-author site like having Twitter and Facebook accounts linked to the site, having an about this site and an ‘about our team’ page, and a terms of service page. The scraper then finds out which high-quality and high-ranking sites have full RSS feeds so that the scraper can import the content from the RSS feed and duplicate it on the scraper site. After the scraper has stolen enough content and optimized the site with keywords that the search engines deem most relevant to the niche, the scraper then submits the site for inclusion on Google News. When Google News includes the site, validating it as a reputable news site to Google users, the scraper can then make money from advertising as it has a guaranteed stream of visitors to the site via Google search and Google News.
Two weeks ago, Yves Smith wrote a post at Naked Capitalism alerting us that evil site scrapers are back! She mentioned Zmarter, one flagrant violator I tipped in December. I have since shaken them off by filing a DMCA violation (a notice that they are infringing on the Digital Millennium Copyright Act). But apparently, they are still at it, scraping other sites’ content. There are many sites of this ilk out there but Yves mentioned another scraper site, favstocks.com, that is now getting a lot of traffic from Google News as well. When I looked up favstocks.com, I saw a lot of content from Credit Writedowns on their site – all of the links in the posts were stripped out of the content in order to prevent ‘link value leakage’ (typical search engine optimization nonsense). There was a lot of content from other leading finance blogs and sits as well: Zack’s Research, Mike Konczal, Naked Capitalism, Pragmatic Capitalism, Mike Shedlock and Econbrowser.
I wrote Google News and posted a note on Google News’ support forum but have received no response. I know of at least two other bloggers who are upset with this. James Hamilton of Econbrowser told me that favstocks was outranking him on Google Search even though his site is linked to by all of the top bloggers and financial news sites and is well-respected. He wrote in response to my note:
FavStocks is a rogue site which routinely reproduces material from www.econbrowser.com despite having been repeatedly instructed that they are doing so without permission. The site FavStocks unquestionably should be banned from Google News.
So, here you have a well-respected PhD Economist, Chairman of the Economics Department at UCSD, a major American university, blogger since 2005, being outranked for the content he actually wrote by a bunch of yahoos stealing his content and re-posting it. Do you see the problem here? This is exactly why Vivek Wadwha says Google Search Still Needs ‘A Lot More Work’.
Let me give you a feel for how this has played out. Around the time Yves was complaining about favstocks, I went to their site and wrote a comment on their comment section, hosted by Disqus, the comment website, asking that they remove content from Credit Writedowns. I also contacted them through the contact form on their site. No response. So I submitted a DMCA notice against them to Google AdSense, their main advertiser at the time. That got this gibberish e-mail response:
FavStocks is a registered service provider with the US Copyright Office. The DMCA provides service providers a safe harbor in a case of copyright infringement.
The DMCA notice you sent to Google is improper and they can not do anything to help you. The proper notice should have been sent to our registered designated agent as listed on our site and also listed on the US Copyright Office website. Please see the link below for our DMCA compliant information.
Even though your notice was improper and sent to the wrong person we did however identify and locate the content from the complaint and disabled the infringing content. It is our policy to disable accounts of repeat infringers.
This got me banned from their Disqus comment section AND caused them to switch away from AdSense to other advertisers. It did have the wanted effect though. My content is no longer scraped. But I have switched to summary feeds because I am sick of this game.
Clearly, if you go to the site you will see that FavStocks entire business model is about scraping content. So, they are just trying to cover the bases in order not to get penalised by the search engines. In the end, I see this as a sign of flaws in Google’s business model which relies far too much on a lack of human intervention. It makes customer service atrocious. That’s why I have yet to receive a response from Google News. That’s why FavStocks is still a Google News provider despite having almost no original content and scraping the majority of the content. Google’s business model is dependent on scalability which gives them tremendous operating leverage. That means that they can scale their individual business lines without a large amount of additional cost. If they have high growth, then the revenue is supposed to fall to the bottom line. I think Google is starting to reach the point where that operating leverage has dissipated. Their costs are growing exactly because of these kinds of situations. You need human intervention because computers just do not have enough sophistication to make the kind of judgments needed to always discern original content from copied content. I anticipate this new lack of scalability will be a challenge Google will continue to face as it looks to grow its business.