May 22 2012

Matt Cutts: Here’s How To Expose Your Competitors’ Black Hat SEO Practices

Google’s Matt Cutts has put out a new Webmaster Help video discussing how to alert Google when your competitors are engaging in webspam and black hat SEO techniques. The video was in response to the following user-submitted question: White hat search marketers read and follow Google Guidelines. What should they tell clients whose competitors use black hat techniques (such as using doorway pages) and whom continue to rank as a result of those techniques? “So first and foremost, I would say do a spam report, because if you’re violating Google’s guidelines in terms of cloaking or sneaky JavaScript redirects, buying links, doorway pages, keyword stuffing, all those kinds of things, we do want to know about it,” he says. “So you can do a spam report. That’s private. You can also stop by Google’s Webmaster forum, and that’s more public, but you can do a spam report there. You can sort of say, hey, I saw this content. It seems like it’s ranking higher than it should be ranking. Here’s a real business, and it’s being outranked by this spammer…those kinds of things.” He notes that are both Google employees and “super users” who keep an eye on the forum, and can alert Google about issues. “The other thing that I would say is if you look at the history of which businesses have done well over time, you’ll find the sorts of sites and the sorts of businesses that are built to stand the test of time,” says Cutts. “If someone is using a technique that is a gimmick or something that’s like the SEO fad of the day, that’s a little less likely to really work well a few years from now. So a lot of the times, you’ll see people just chasing after, ‘OK, I’m going to use guest books’, or iI’m going to use link wheels’ or whatever. And then they find, ‘Oh, that stopped working as well.’ And sometimes it’s because of broad algorithmic changes like Panda. Sometimes it’s because of specific web spam targeted algorithms.” I’m sure you’ve heard of Penguin . He references the JC Penney and Overstock.com incidents , in which Google took manual action. For some reason, he didn’t bring up the Google Chrome incident . This is actually a pretty timely video from Cutts, as another big paid linking controversy was uncovered by Josh Davis today (which Cutts acknowledged on Twitter). “So my short answer is go ahead and do a spam report,” Cutts continues. “You can also report it in the forums. But it’s definitely the case that if you’re taking those higher risks, that can come back and bite you. And that can have a material impact.” He’s not joking about that. Overstock blamed Google for “an ugly year” when its revenue plummeted. Even Google’s own Chrome penalty led to some questions about the browser’s market share . Cutts notes that Google is also happy to get feedback at conferences, on Twitter, online, blogs, forums, “if you’re seeing sites that are prospering and are using black hat techniques.” “Now, it’s possible that they have some low-quality links, and there are some links that people aren’t aware of that we see that are actually high quality,” Cutts notes. “But we’re happy to get spam reports. We’re happy to dig into them. And then we’ll try to find either new algorithms to try to rank the things more appropriately in the future. Or we’re certainly willing to take manual action on spam if it’s egregious or if it violates our guidelines. We have a manual web spam team that is willing to respond to those spam reports.”

May 21 2012

Google Penguin, Panda, Matt Cutts & Amit Singhal In Lego Art Form

Aaron Wall at SEOBook commissioned an art project , which features a number of Lego art-style pictures of various Google employees and SEO celebrities, as well as some Google update-specific pieces. We already looked at Google executive chairman Eric Schmidt (who was portrayed again as an evil ice cream man). Here are the artist’s renditions of Matt Cutts and Amit Singhal, the two most recognizable faces behind the Penguin and Panda updates: Matt Cutts graphic by SEOBook.com Matt Cutts graphic by SEOBook.com Amit Singhal graphic by SEOBook.com Cutts and Singhal have both called the Penguin update a success. Here are the Penguin and Panda pics: Penguin Update graphic by SEOBook.com Panda Update graphic by SEOBook.com Even the old school Florida update made an appearance in the project: Florida Update graphic by SEOBook.com Then there’s the “Google Got Caught Pushing Illegal Drugs Update”: Google AdWords Drugs Update graphic by SEOBook.com I don’t think that was Google’s official name, but it refers to when Google had to forfeit $500 million as the result of a settlement with the Justice Department in relation to ads for Canadian pharmacies. The investigation behind this had authorities tracking a fugitive to Mexico, and he had advertised the unlawful sale of drugs using AdWords.

May 17 2012

Matt Cutts On The Hardware & Software That Power Googlebot

Google uploaded a new Webmaster Help video from Matt Cutts, which addresses a question about the hardware/server-side software that powers a typical Googlebot server. “So one of the secrets of Google is that rather than employing these mainframe machines, this heavy iron, big iron kind of stuff, if you were to go into a Google data center and look at an example rack, it would look a lot like a PC,” says Cutts. “So there’s commodity PC parts. It’s the sort of thing where you’d recognize a lot of the stuff from having opened up your own computer,and what’s interesting is rather than have like special Googlebot web crawling servers, we tend to say, OK, build a whole bunch of different servers that can be used interchangeably for things like Googlebot, or web serving, or indexing. And then we have this fleet, this armada of machines, and you can deploy it on different types of tasks and different types of processing.” “So hardware wise, they’re not exactly the same, but they look a lot like regular commodity PCs,” he adds. “And there’s no difference between Googlebot servers versus regular servers at Google. You might have differences in RAM or hard disk, but in general, it’s the same sorts of stuff.” On the software side, Google of course builds everything itself, as to not have to rely on third-parties. Cutts says there’s a running joke at Google along the lines of “we don’t just build the cars oursevles, and we don’t just build the tires ourselves. We actually vulcanize the rubber on the tires ourselves.” “We tend to look at everything all the way down to the metal,” Cutts explains. “I mean, if you think about it, there’s data center efficiency. There’s power efficiency on the motherboards. And so if you can sort of keep an eye on everything all the way down, you can make your stuff a lot more efficient, a lot more powerful. You’re not wasting things because you use some outside vendor and it’s black box.” A couple months ago, Google put out a blog post discussing its data center efficiency , indicating that they are getting even more efficient. “In the same way that you might examine your electricity bill and then tweak the thermostat, we constantly track our energy consumption and use that data to make improvements to our infrastructure. As a result, our data centers use 50 percent less energy than the typical data center,” wrote Joe Kava, Senior Director, data center construction and operations at Google. Cutts says Google uses a lot of Linux-based machines and Linux-based servers. “We’ve got a lot of Linux kernel hackers,” he says. “And we tend to have software that we’ve built pretty much from the ground up to do all the different specialized tasks. So even to the point of our web servers. We don’t use Apache. We don’t use IIS. We use something called GWS, which stands for the Google Web Server.” “So by having our own binaries that we’ve built from our own stuff and building that stack all the way up, it really unlocks a lot of efficiency,” he adds. “It makes sure that there’s nothing that you can’t go in and tweak to get performance gains or to fix if you find bugs.” If you’re interested in how Google really works, you should watch this video too: Google says the average search query travels as much as 1,500 miles .

May 14 2012

Should The Google Penguin Update Hit Sites Like WPMU.org?

We recently told you about WPMU.org apparently getting hit by Google’s Penguin update. The site went from 8,580 visits (pretty standard for the site, having looked through the Analytics myself) to 1,527 a week later. It’s been hovering around similar numbers ever since, with a pretty clear dip right around Penguin time. We spoke with James Farmer, Founder and CEO of Incsub, which runs the site. Farmer maintains that WPMU.org engages in no keyword stuffing, link schemes, and has no quality issues. In fact, the site has actually done well throughout Google’s series of Panda updates. Farmer tells WebProNews, “We did great after Panda, it was like that update recognized we were decent folk… you can’t win them all huh?” “Apart from not being able to guess what Google was going to do in April, 3 years ago, we haven’t done anything wrong,” he says. Last week, Farmer received some second-hand info from Google’s Matt Cutts, who reportedly spoke with the Sydney Morning Herald about WPMU.org. According to Farmer, Cutts provided three problem links pointing to the site. These included a site pirating their software and two links from one spam blog using an old version of one of their WordPress themes with a link in the footer. Farmer reported that Cutts “said that we should consider the fact that we were possibly damaged by the removal of credit from links such as these.” It’s pretty interesting that if such links were the problem that it could have such a tremendous impact. It’s no wonder there have been so many discussions about negative SEO (competitors attacking each other with these kinds of tactics) since Penguin launched. The site has over 10,400+ Facebook likes, 15,600+ Twitter followers, 2,537 +1s and 4,276 FeedBurner subscribers, according to Farmer. Apparently not enough to outweigh some questionable links from third parties. “How could a bunch of incredibly low quality, spammy, rubbish (I mean a .info site… please!) footer links have made that much of a difference to a site of our size, content and reputation, unless Google has been absolutely, utterly inept for the last 4 years (and I doubt that that’s the case),” Farmer wrote in his article on the matter. When asked how many links he has out there just from footers for WordPress themes, he tells WebProNews, “Given that we stopped adding links years ago, actually not that many at all.” “However, the challenge is that given that we provided themes to a lot of multisite installs, which have since become overrun with splogs, there’s an enormous amount of links from not that many actual root domains,” he adds. “I’d guesstimate 1-2K, 99% of clearly low quality sites.” We asked if he’s heard from other WordPress theme creators, having similar issues. “Actually no, although that doesn’t surprise me that much,” he says. “Not many folk are as open as us, and in this field they probably have good reason to be. WordPress terms are very, very competitive so I wouldn’t be surprised if 9/10 competitors had something to hide!” Like many webmasters, Farmer just doesn’t know what to expect from Google, in terms of whether or not Google will consider the site to be one of the innocent casualties of Penguin. “I have no idea, I would love it if they did. I guess the thing I’m begging for is some sort of qualitative mechanism (NOT the manual webspam web, faster approach) that allows quality operators, like us, to survive and carry on providing Google users exactly the kind of helpful content they need!” Google does have a form users can submit to , if they think they’ve been wrongfully hit by the Penguin update. Google’s Matt Cutts recently told Danny Sullivan that Google considers the Penguin update a success, despite the large number of complaints from those commenting on blogs and in forums. Of course, the Penguin update, much like the Panda update, should be periodically coming back around , giving sites a chance to make fixes and recover. That also means however, sites will also have more chances to get hit. We asked Farmer if he thinks Penguin has helped or hurt search results in general, outside of his site’s issues. “Especially in the WP field they have gone wild ,” he emphasizes. “For example our flagship site WPMU DEV – if you go to search for that now a competitor writing something ridiculous about us and copyright appears above our massively popular Facebook page. It even looks like our YouTube channel has been demoted. Crazy stuff.” We’ve certainly seen some other questionable search results following the update, and others have complained aplenty. Do you think the search results have improved since Penguin? Should WPMU have been hit by Penguin?

May 11 2012

Penguin Update Will Come Back (Like Panda), According To Report

Danny Sullivan put out a new article with some fresh quotes from Matt Cutts . From this, we know that he has deemed the Penguin update a success. In terms of false positives, he says it hasn’t had the same impact as the Panda or Florida updates, though Google has seen “a few cases where we might want to investigate more.” Sullivan confirmed what many of us had assumed was the case: Penguin will continue into the future, much like the Panda update. Cutts is even quoted in the article: “It is possible to clean things up…the bottom line is, try to resolve what you can.” The Good News Depending on your outlook, this could either be taken as good or bad news. On the good side of things, it means you can come back. Just because your site was destroyed by Penguin, you still have a shot to get back in Google’s good graces – even without having to submit a reconsideration request. Google’s algorithmically, assuming that it does what it is supposed to, will detect that you are no longer in violation of Google’s guidelines, and treat your site accordingly. The Bad News The bad news is that there is always the chance it won’t work like it’s supposed to. As I’m sure you’re aware, there are many, many complaints about the Penguin update already. Here’s an interesting one. Many feel like it’s not exactly done what it is supposed to. Another perhaps not so positive element of the news is that sites will have to remain on their toes, wondering if something they’ve done will trigger future iterations of the Penguin update. Remember when Demand Media’s eHow as not hit by the Panda update when it first launched, but was then later hit by another iteration of it, and had to delete hundreds of thousands of articles , and undergo a huge change in design, and to some extent, business model? But on the other hand, eHow content is the better for it, despite a plethora of angry writers who no longer get to contribute content. There’s always the chance that some sites have managed to escape Penguin so far, but just haven’t been hit yet. Of course, Danny makes a great point in that “for any site that ‘lost’ in the rankings, someone gained.” It will be interesting to see how often the Penguin update gets a refresh. There were two Panda refreshes in April alone (bookending the Penguin update). It might be even more interesting to see how many complaints there are when the refreshes come back, and how often they’re noticed. Even the last Panda update went unconfirmed for about a week. Either way, be prepared for Penguin news to come peppered throughout the years to come. Just like Panda. We’ll certainly continue to cover both.

May 11 2012

Google Penguin Update Recovery: Matt Cutts Says Watch These 2 Videos

Danny Sullivan at Search Engine Land put up a great Penguin article with some new quotes from Matt Cutts. We’ve referenced some of the points made in other articles, but one important thing to note from the whole thing is that Cutts pointed to two very specific videos that people should watch if they want to clean up their sites and recover from the Penguin update. We often share Google’s Webmaster Help videos, which feature Cutts giving advice based on user-submitted questions (or sometimes his own questions). I’m sure we’ve run these in the past, but according to Sullivan, Cutts pointed to these: Guess what: in both videos, he talks about Google’s quality guidelines . That is your recovery manual, as far as Google is concerned. Here are some articles we’ve posted recently specifically on different aspects of the guidelines: Google Penguin Update: Don’t Forget About Duplicate Content Google Penguin Update: A Lesson In Cloaking Google Penguin Update Recovery: Hidden Text And Links Recover From Google Penguin Update: Get Better At Links Google Penguin Update: 12 Tips Directly From Google Google Penguin Update Recovery: Getting Better At Keywords Google Penguin Update: Seriously, Avoid Doorway Pages Google Penguin Update And Affiliate Programs So, in your recovery plan, take all of this into account, and these tips that Cutts lent his seal of approval to . And when all else fails, according to Cutts, you might want to just start over with a new site.

May 8 2012

Google’s Matt Cutts Talks Search Result Popularity Vs. Accuracy

Google’s head of webspam, Matt Cutts, posted a new Webmaster Help video today, discussing accuracy vs. popularity in search results. This video was his response to a user-submitted question: Does Google feel a responsibility to the public to return results that are truly based on a page’s quality (assuming quality is determined by the accuracy of info on a page) as opposed to popularity? “Popularity is different than accuracy,” says Cutts. “And in fact, PageRank is different than popularity. I did a video that talked about porn a while ago that basically said a lot of people visit porn sites, but very few people link to porn sites. So the Iowa Real Estate Board is more likely to have higher PageRank than a lot of porn sites, just because people link to the official governmental sites, even if they sometimes visit the porn sites a little bit more often.” Here’s that video, by the way: “So I do think that reputation is different than popularity, and PageRank encodes that reputation pretty well,” Cutts continues. “At the same time, I go to bed at night sleeping relatively well, knowing that I’m trying to change the world. And I think a lot of people at Google feel that way. They’re like trying to find the best way to return the best content. So we feel good about that. And at the same time, we do feel the weight, the responsibility of what we’re doing, because are we coming up with the best signals? Are we finding the best ways to slice and dice data and measure the quality of pages or the quality of sites? And so people brainstorm a lot. And I think that they do feel the weight, the responsibility of being a leading search engine and trying to find the very best quality content.” “Even somebody who has done a medical search, the difference between stage four brain cancer versus the query grade four brain cancer, it turns out that very specific medical terminology can determine which kinds of results you get. And if you just happen not to know the right word, then you might not get the best results. And so we try to think about how can we help the user out if they don’t necessarily know the specific vocabulary?” Interesting example. We’ve pointed to the example of “level 4 brain cancer” a handful of times in our Panda and pre-Panda coverage of content farms’ effects on search results. The top result for that query, by the way, is better than it once once, though the eHow result (written by a freelance writer claiming specialities in military employment, mental health and gardens – who has also written a fair amount about toilets), which was ranking before, is still number two. It’s worth noting that Google’s most recent list of algorithm updates includes some tweaks to surface more authoritative results . “So I would say that at least in search quality in the knowledge group, we do feel a lot of responsibility,” says Cutts. “We do feel like we know a lot of people around the world are counting on Google to return good quality search results. And we do the best we can, or at least we try really hard to think of the best ways we can think of to return high-quality search results.” “That’s part of what makes it a fun job,” he says. “But it definitely is one where you understand that you are impacting people’s lives. And so you do try to make sure that you act appropriately. And you do try to make sure that you can find the best content and the best quality stuff that you can. But it’s a really fun job, and it’s a really rewarding job for just that same reason.” Cutts then gets into some points that the antitrust lawyers will surely enjoy. “What makes me feel better is that there are a lot of different search engines that have different philosophies,” he says. “And so if Google isn’t doing a good job, I do think that Bing, or Blekko, or DuckDuckGo, or other search engines in the space will explore and find other ways to return things. And not just other general search engines, but people who want to do travel might go specifically to other websites. So I think that there’s a lot of opportunities on the web.” “I think Google has done well because we return relatively good search results. But we understand that if we don’t do a good job at that, our users will complain,” he says. “They’ll go other places. And so we don’t just try to return good search results because it’s good for business. It’s also because we’re Google searchers as well. And we want to return the best search results so that they work for everybody and for us included.” Well, users do complain all the time, and certainly some of them talking about using other services, but the monthly search market reports don’t appear to suggest that Google has run too many people off, so they must be doing something right.

May 7 2012

How Google Handles Font Replacement

Google’s Matt Cutts put up a new Webmaster Help video, discussing how Google handles font replacement. The video was created in response to a user-submitted question: How does Google view font replacement (ie. Cufan, SIFR, FLIR)? Are some methods better than others, are all good, all bad? “So we have mentioned some specific stuff like SIFR that we’re OK with. But again, think about this,” says Cutts. “You want to basically show the same content to users that you do to Googlebot. And so, as much as possible, you want to show the same actual content. So we’ve said that having fonts using methods like SIFR is OK, but ideally, you might concentrate on some of the newer stuff that has been happening in that space.” “So if you search for web fonts, I think Google, for example, has a web font directory of over 100 different web fonts,” Cutts says. “So now we’re starting to get the point where, if you use one of these types of commonly available fonts, you don’t even have to do font replacement using the traditional techniques. It’s actual letters that are selectable and copy and pastable in your browser. So it’s not the case that we tend to see a lot of deception and a lot of abuse.” “If you were to have a logo here and then underneath the logo have text that’s hidden that says buy cheap Viagra, debt consolidation, mortgages online, that sort of stuff, then that could be viewed as deceptive,” he adds. In fact, that’s exactly the kind of thing that can get you in trouble with Google’s Penguin update , even if Google doesn’t get you with a manual penalty. To avoid this, here’s more advice from Google, regarding hidden text . “But if the text that’s in the font replacement technique is the same as what is in the logo, then you should be in pretty good shape,” Cutts wraps up the video. “However, I would encourage people to check out some of this newer stuff, because the newer stuff doesn’t actually have to do some of these techniques. Rather, it’s the actual letters, and it’s just using different ways of marking that up, so that the browser, it looks really good. And yet, at the same time, the real text is there. And so search engines are able to index it and process it, just like they would normal text.”

May 4 2012

Google Algorithm Changes For April: Big List Released

As expected, Google has finally released its big list of algorithm changes for the month of April. It’s been an interesting month, to say the least, with not only the Penguin update, but a couple of Panda updates sprinkled in. There’s not a whole lot about either of those on this list, however, which is really a testament to just how many things Google is always doing to change its algorithm – signals (some of them, at least) which could help or hurt you in other ways besides the hugely publicized updates. We’ll certainly be digging a bit more into some of these in forthcoming articles. At a quick glance, I noticed a few more freshness-related tweaks. Google has also expanded its index base by 15%, which is interesting. As far as Penguin goes, Google does mention: “Keyword stuffing classifier improvement. [project codename “Spam”] We have classifiers designed to detect when a website is keyword stuffing. This change made the keyword stuffing classifier better.” Keyword stuffing is against Google’s quality guidelines, and was one of the specific things Matt Cutts mentioned in his announcement of the update. Interestingly, unlike previous lists, there is no mention of Panda whatsoever on this list, though there were 2 known Panda data refreshes during April. Here’s the list in its entirety: Categorize paginated documents.  [launch codename “Xirtam3”, project codename “CategorizePaginatedDocuments”] Sometimes, search results can be dominated by documents from a paginated series . This change helps surface more diverse results in such cases. More language-relevant navigational results.  [launch codename “Raquel”] For navigational searches when the user types in a web address, such as [bol.com], we generally try to rank that web address at the top. However, this isn’t always the best answer. For example, bol.com is a Dutch page, but many users are actually searching in Portuguese and are looking for the Brazilian email service, http://www.bol.uol.com.br/. This change takes into account language to help return the most relevant navigational results. Country identification for webpages.  [launch codename “sudoku”] Location is an important signal we use to surface content more relevant to a particular country. For a while we’ve had systems designed to detect when a website, subdomain, or directory is relevant to a set of countries. This change extends the granularity of those systems to the page level for sites that host user generated content, meaning that some pages on a particular site can be considered relevant to France, while others might be considered relevant to Spain. Anchors bug fix.  [launch codename “Organochloride”, project codename “Anchors”] This change fixed a bug related to our handling of anchors. More domain diversity.  [launch codename “Horde”, project codename “Domain Crowding”] Sometimes search returns too many results from the same domain. This change helps surface content from a more diverse set of domains. More local sites from organizations.  [project codename “ImpOrgMap2”] This change makes it more likely you’ll find an organization website from your country (e.g. mexico.cnn.com for Mexico rather than cnn.com). Improvements to local navigational searches.  [launch codename “onebar-l”] For searches that include location terms, e.g. [ dunston mint seattle ] or [ Vaso Azzurro Restaurant 94043 ], we are more likely to rank the local navigational homepages in the top position, even in cases where the navigational page does not mention the location. Improvements to how search terms are scored in ranking.  [launch codename “Bi02sw41”] One of the most fundamental signals used in search is whether and how your search terms appear on the pages you’re searching. This change improves the way those terms are scored. Disable salience in snippets.  [launch codename “DSS”, project codename “Snippets”] This change updates our system for generating snippets to keep it consistent with other infrastructure improvements. It also simplifies and increases consistency in the snippet generation process. More text from the beginning of the page in snippets.  [launch codename “solar”, project codename “Snippets”] This change makes it more likely we’ll show text from the beginning of a page in snippets when that text is particularly relevant. Smoother ranking changes for fresh results.  [launch codename “sep”, project codename “Freshness”] We want to help you find the freshest results, particularly for searches with important new web content, such as breaking news topics. We try to promote content that appears to be fresh. This change applies a more granular classifier, leading to more nuanced changes in ranking based on freshness. Improvement in a freshness signal.  [launch codename “citron”, project codename “Freshness”] This change is a minor improvement to one of the freshness signals which helps to better identify fresh documents. No freshness boost for low-quality content.  [launch codename “NoRot”, project codename “Freshness”] We have modified a classifier we use to promote fresh content to exclude fresh content identified as particularly low-quality. Tweak to trigger behavior for Instant Previews.  This change narrows the trigger area for Instant Previews  so that you won’t see a preview until you hover and pause over the icon to the right of each search result. In the past the feature would trigger if you moused into a larger button area. Sunrise and sunset search feature internationalization.  [project codename “sunrise-i18n”] We’ve internationalized the  sunrise and sunset  search feature to 33 new languages, so now you can more easily plan an evening jog before dusk or set your alarm clock to watch the sunrise with a friend. Improvements to currency conversion search feature in Turkish.  [launch codename “kur”, project codename “kur”] We launched improvements to the currency conversion search feature in Turkish. Try searching for [ dolar kuru ], [ euro ne kadar ], or [ avro kaç para ]. Improvements to news clustering for Serbian.  [launch codename “serbian-5”] For news results, we generally try to cluster articles about the same story into groups. This change improves clustering in Serbian by better grouping articles written in Cyrillic and Latin. We also improved our use of “stemming” — a technique that relies on the “ stem ” or root of a word. Better query interpretation.  This launch helps us better interpret the likely intention of your search query as suggested by your last few searches. News universal results serving improvements.  [launch codename “inhale”] This change streamlines the serving of news results on Google by shifting to a more unified system architecture. UI improvements for breaking news topics.  [launch codename “Smoothie”, project codename “Smoothie”] We’ve improved the user interface for news results when you’re searching for a breaking news topic. You’ll often see a large image thumbnail alongside two fresh news results. More comprehensive predictions for local queries.  [project codename “Autocomplete”] This change improves the comprehensiveness of autocomplete predictions by expanding coverage for long-tail U.S. local search queries such as addresses or small businesses. Improvements to triggering of public data search feature.  [launch codename “Plunge_Local”, project codename “DIVE”] This launch improves triggering for the  public data search feature , broadening the range of queries that will return helpful population and unemployment data. Adding Japanese and Korean to error page classifier.  [launch codename “maniac4jars”, project codename “Soft404”] We have signals designed to detect crypto 404 pages (also known as “soft 404s”), pages that return valid text to a browser, but the text only contains error messages, such as “Page not found.” It’s rare that a user will be looking for such a page, so it’s important we be able to detect them. This change extends a particular classifier to Japanese and Korean. More efficient generation of alternative titles.  [launch codename “HalfMarathon”] We use a variety of signals to generate titles in search results. This change makes the process more efficient, saving tremendous CPU resources without degrading quality. More concise and/or informative titles.  [launch codename “kebmo”] We look at a number of factors when deciding what to show for the title of a search result. This change means you’ll find more informative titles and/or more concise titles with the same information. Fewer bad spell corrections internationally.  [launch codename “Potage”, project codename “Spelling”] When you search for [mango tea], we don’t want to show spelling predictions like “Did you mean ‘mint tea’?” We have algorithms designed to prevent these “bad spell corrections” and this change internationalizes one of those algorithms. More spelling corrections globally and in more languages.  [launch codename “pita”, project codename “Autocomplete”] Sometimes autocomplete will correct your spelling before you’ve finished typing. We’ve been offering advanced spelling corrections in English, and recently we extended the comprehensiveness of this feature to cover more than 60 languages. More spell corrections for long queries.  [launch codename “caterpillar_new”, project codename “Spelling”] We rolled out a change making it more likely that your query will get a spell correction even if it’s longer than ten terms. You can watch  uncut footage  of when we decided to launch this from our past blog post. More comprehensive triggering of “showing results for” goes international.  [launch codename “ifprdym”, project codename “Spelling”] In some cases when you’ve misspelled a search, say [pnumatic], the results you find will actually be results for the corrected query, “pneumatic.” In the past, we haven’t always provided the explicit user interface to say, “Showing results for pneumatic” and the option to “Search instead for pnumatic.” We recently started showing the explicit “Showing results for” interface more often in these cases in English, and now we’re expanding that to new languages. “Did you mean” suppression goes international.  [launch codename “idymsup”, project codename “Spelling”] Sometimes the “Did you mean?” spelling feature predicts spelling corrections that are accurate, but wouldn’t actually be helpful if clicked. For example, the results for the predicted correction of your search may be nearly identical to the results for your original search. In these cases, inviting you to refine your search isn’t helpful. This change first checks a spell prediction to see if it’s useful before presenting it to the user. This algorithm was already rolled out in English, but now we’ve expanded to new languages. Spelling model refresh and quality improvements.  We’ve refreshed spelling models and launched quality improvements in 27 languages. Fewer autocomplete predictions leading to low-quality results.  [launch codename “Queens5”, project codename “Autocomplete”] We’ve rolled out a change designed to show fewer autocomplete predictions leading to low-quality results. Improvements to SafeSearch for videos and images.  [project codename “SafeSearch”] We’ve made improvements to our SafeSearch signals in videos and images mode, making it less likely you’ll see adult content when you aren’t looking for it. Improved SafeSearch models.  [launch codename “Squeezie”, project codename “SafeSearch”] This change improves our classifier used to categorize pages for SafeSearch in 40+ languages. Improvements to SafeSearch signals in Russian.  [project codename “SafeSearch”] This change makes it less likely that you’ll see adult content in Russian when you aren’t looking for it. Increase base index size by 15%.  [project codename “Indexing”] The base search index is our main index for serving search results and every query that comes into Google is matched against this index. This change increases the number of documents served by that index by 15%. *Note: We’re constantly tuning the size of our different indexes and changes may not always appear in these blog posts. New index tier.  [launch codename “cantina”, project codename “Indexing”] We keep our index in “tiers” where different documents are indexed at different rates depending on how relevant they are likely to be to users. This month we introduced an additional indexing tier to support continued comprehensiveness in search results. Backend improvements in serving.  [launch codename “Hedges”, project codename “Benson”]   We’ve rolled out some improvements to our serving systems making them less computationally expensive and massively simplifying code. “Sub-sitelinks” in expanded sitelinks.  [launch codename “thanksgiving”] This improvement digs deeper  into  megasitelinks  by showing sub-sitelinks instead of the normal snippet. Better ranking of expanded sitelinks.  [project codename “Megasitelinks”] This change improves the ranking of megasitelinks by providing a minimum score for the sitelink based on a score for the same URL used in general ranking. Sitelinks data refresh.  [launch codename “Saralee-76”] Sitelinks (the links that appear beneath some search results and link deeper into the site) are generated in part by an offline process that analyzes site structure and other data to determine the most relevant links to show users. We’ve recently updated the data through our offline process. These updates happen frequently (on the order of weeks). Less snippet duplication in expanded sitelinks.  [project codename “Megasitelinks”] We’ve adopted a new technique to reduce duplication in the snippets of expanded sitelinks. Movie showtimes search feature for mobile in China, Korea and Japan.  We’ve expanded our movie showtimes feature for mobile to China, Korea and Japan. No freshness boost for low quality sites.  [launch codename “NoRot”, project codename “Freshness”] We’ve modified a classifier we use to promote fresh content to exclude sites identified as particularly low-quality. MLB search feature.  [launch codename “BallFour”, project codename “Live Results”] As the MLB season began, we rolled out a new MLB search feature. Try searching for [ sf giants score ] or [ mlb scores ]. Spanish football (La Liga) search feature.  This feature provides scores and information about teams playing in La Liga. Try searching for [ barcelona fc ] or [ la liga ]. Formula 1 racing search feature.  [launch codename “CheckeredFlag”] This month we introduced a new search feature to help you find Formula 1 leaderboards and results. Try searching [ formula 1 ] or [ mark webber ]. Tweaks to NHL search feature.  We’ve improved the NHL search feature so it’s more likely to appear when relevant. Try searching for [ nhl scores ] or [ capitals score ]. Keyword stuffing classifier improvement.  [project codename “Spam”] We have classifiers designed to detect when a website is  keyword stuffing . This change made the keyword stuffing classifier better. More authoritative results.  We’ve tweaked a signal we use to surface more authoritative content. Better HTML5 resource caching for mobile.  We’ve improved caching of different components of the search results page, dramatically reducing latency in a number of cases. More to come…

May 3 2012

Matt Cutts: Excessive Blog Updates To Twitter Not Doorways, But Possibly Annoying

Google’s head of webspam took on an interesting question from a user in a new Webmaster Help video: Some websites use their Twitter account as an RSS like service for every article they post. Is that ok or would it be considered a doorway? I know he shoots these videos in advance, but the timing of the video’s release is interesting, considering that it’s asking about doorways. Google’s Penguin Update was unleashed on the web last week, seeking out violators of Google’s quality guidelines , and dealing with them algorithmically. One of Google’s guidelines is: Avoid “doorway” pages created just for search engines, or other “cookie cutter” approaches such as affiliate programs with little or no original content. There is no shortage of questions from webmasters wondering what exactly Google is going after with the update, which will likely come with future iterations, not unlike the Panda update. For more on some things to avoid, browse our Penguin coverage . Using your Twitter feed like an RSS feed, however, should not put you in harm’s way. “Well, I wouldn’t consider it a doorway because a doorway is typically when you make a whole bunch of different pages, each page is targeting one specific phrase,” he says. “And then when you land there, usually it’s like, click here to enter And then it takes you somewhere, and monetizes you, or something along those lines. So I wouldn’t consider it a doorway.” Cutts does suggest that such a practice can be annoying to users, however. “Could it be annoying?” he continues. “Yes, it could be annoying, especially if you’re writing articles like every three minutes or if those articles are auto-generated somehow. But for example, in FeedBurner, I use a particular service where, when I do a post on my blog, it will automatically tweet to my a Twitter stream, and it will say New Blog Post, colon, and whatever the title of the blog post is. And that’s perfectly fine.” “That’s a good way to alert your users that something’s going on,” he adds. “So there’s nothing wrong with saying, when you do a blog post, automatically do a tweet. It might be really annoying if you have so many blog posts, that you get so many tweets, that people start to ignore you or unfollow you. But it wouldn’t be considered a doorway.” OK, so you’re safe from having to worry about that being considered a doorway in Google’s eyes. I’m not sure I entirely agree with Cutts’ point about it being annoying, however. Yes, I suppose it can be annoying. That really depends on the user, and how they use Twitter. I’m guessing that it is, in fact, annoying to Cutts. Just as some sites treat their Twitter feed like an RSS feed, however, there are plenty of Twitter users who use it as such. A lot of people don’t use RSS, and would simply prefer to get their news via Twitter feed. Some users in this category (I consider myself among them) follow sites on Twitter because they want to follow the content they’re putting out. It’s really about user preference. Not everybody uses Twitter the same way, so you have to determine how you want to approach it. Cutts is definitely right in that some may unfollow you, but there could be just as many who will follow you because they want the latest. Either way, it doesn’t appear to be an issue as far as Google rankings are concerned.