Jun 25 2012

These Are The 10 Videos Webmasters Need To Watch, According To Google

Google has updated its Webmaster Academy site to feature more videos. Some of the videos on the site are old, but some are brand new. None fo them are incredibly long, so if you have a few minutes to spare, I recommend watching all of them. Some webmasters are pretty used to videos from Google’s Matt Cutts, and he does appear in some of these, but there are also some other faces in the mix. Of course the site itself has complementary information to go along with the videos, but watching the videos themselves is a good start. Here they are: Google’s Matt Cutts explains how search works: Jen Lee of Google’s search quality team explains how to find your site on Google: Cutts talks about snippets: Alexi Douvas from Google’s search quality team talks about creating content that performs well in Google search results. It’s worth noting that this one was uploaded just today (post Panda and Penguin): Michael Wyszomierski from Google’s search quality team talks about webspam content violations: Betty Huang from Google’s search quality team talks about how malicious parties can spam your site: A hairless Cutts (for more on that story, see here ) discusses how a site that focuses on video or images can improve its rankings: Lee talks about using sitemaps to help Google find content hosted on your site: An introduction to Google+: Cutts and colleague Othar Hansson discuss authorship markup:

Jun 25 2012

Here’s What Google’s Matt Cutts Says About Affiliate Links And Nofollow

Google’s Matt Cutts participated in a keynote discussion at SMX Advanced earlier this month. Among various other topics, Cutts talked briefly about affiliate links with moderator Danny Sullivan. SMX just uploaded the relevant clip of the discussion to its YouTube channel today, and to reiterate the point Cutts made, fellow Googler John Mueller posted the video to Google+: John Mueller 5 minutes ago Regarding affiliate links and "nofollow" – here's what Matt had to say: "We handle the vast majority of affiliate stuff correctly because if it is a large enough affiliate network we know about it and we handle it on our side. Even though we handle I believe the vast majority of affiliate links appropriately if you are at all worried about it, I would go ahead and just add the nofollow because you might be earning money from that." 1    0 Powered by socialditto In Google’s quality guidelines (the basis for the Penguin update), affiliate programs come up more than once. “Avoid ‘doorway’ pages created just for search engines, or other ‘cookie cutter’ approaches such as affiliate programs with little or no original content,” Google says. “If your site participates in an affiliate program, make sure that your site adds value. Provide unique and relevant content that gives users a reason to visit your site first.” Google has a page about what it means by “ little or no original content ,” which talks about “thin affiliate sites”. There, Google says, “These sites collect pay-per-click (PPC) revenue by sending visitors to the sites of affiliate programs, while providing little or no value-added content or service to the user. These sites usually have no original content and may be cookie-cutter sites or templates with no unique content.”

Jun 20 2012

Matt Cutts: Google’s Updates Are Car Parts, Data Refreshes Are Gas

Google frequently updates its algorithm, and sometimes these updates have huge effects on numerous sites. Panda and Penguin are two of the most well-known these days. Google also launches regular data refreshes for these updates. While even these data refreshes are enough to keep webmasters on their toes, they are much smaller than the updates themselves. Google’s Matt Cutts has talked about the difference between an algorithm update and a data refresh in the past. He put out a blog post all the way back in 2006 on the topic. Given that this was years before Panda and Penguin, it seems worth highlighting now, as businesses continue to struggle with these updates ( tip of the hat to Search Engine Journal for linking to this post). Here are the straight forward definitions Cutts gave: Algorithm update: Typically yields changes in the search results on the larger end of the spectrum. Algorithms can change at any time, but noticeable changes tend to be less frequent. Data refresh: When data is refreshed within an existing algorithm. Changes are typically toward the less-impactful end of the spectrum, and are often so small that people don’t even notice. In that post, Cutts also pointed to a video of himself talking about the differences: Algorithm updates involve specific signals being tweaked. For instance, PageRank could matter more, or less, Cutts explains in the video. With a data refresh the input to that algorithm is being changed. The data that the algorithm works on is being changed. He uses a car metaphor, saying that an algorithm update is like changing a part in the car, such as the engine. A data refresh, he says, is more like changing the gas. Data refreshes happen all the time, he says. PageRank, for example, gets refreshed constantly. In the end, I’m not sure how much any of this matters to the average webmaster. If your site was hit by an update, or by a data refresh, you probably don’t care what the technical name for it is, as long as you can identify the update it’s based on, and make the necessary adjustments to gain back your Google traffic.

Jun 13 2012

Matt Cutts: Here’s What You Should Read To Learn About Search Engines

Google’s Matt Cutts posted an interesting video today, responding to a user-submitted question: “What resources (textbooks, online PDFs etc) would you recommend to people interested in learning more about LSI, search engine algorithms, etc?” Cutts first suggests checking out the original PageRank papers. “So there’s a whole bunch of different stuff about the anatomy of a large-scale hypertext search engine and then also a bunch of papers about PageRank,” he says . Here’s The Anatomy of a Large-Scale Hypertextual Web Search Engine by Google co-founders Larry Page and Sergey Brin. Here’s “ The PageRank Citation Ranking: Bringing Order to the Web ” (pdf). Cutts also recommends some textbooks. “One is Modern Information Retrieval,” he says. “That’s got a lot of good stuff about the scoring and the science and thinking about that. And then there’s also one called Managing Gigabytes. I think Ian Witten wrote that one. And that one is just a little bit more about the logistics and being able to horse around that much data and thinking about some of the machine’s issues and how does a large scale engine work.” Here are some links: Modern Information Retrieval Managing Gigabytes “So those three together, and then of course, you can always do searches,” says Cutts. “Google Research actually has a ton of different papers that we’ve published. So you might want to look into that a little bit as well. But basically PageRank, the early Google papers, can give you an idea of how to write a very simple search engine that can scale to 100 million documents or so, Managing Gigabytes, and Modern Information Retrieval, and that will give you a pretty good view of the sort of different parts of the space.” Here’s a list of all the areas of focus Google Research has papers on:

Jun 12 2012

Google Talks Showing Multiple Results From The Same Site

Google’s head of webspam, Matt Cutts, put out a new Webmaster Help Video, responding to the user submitted question: Under which circumstances will Google decide to display multiple results from the same website? “The answer has changed over the years,” he says. “But the high level answer is, when we think it’s useful and it doesn’t hurt diversity too much.” Cutts talks about a strategy Google used for years, called host crowding, where Google would group results from the same site together, but says people would get around this, and game the system by using different subdomains. He also talks about some other limitations of host crowding. Discussing how things are these days, Cutts says, “You want to show as many results as you think is useful, and that’s the tricky bit. What the user is looking for can vary depending on what they’re searching for. For example, if they type in something like HP or IBM, probably a lot of pages or a lot of results from HP is a good answer. So several people have noted that it’s possible to get more than two, more than four, lots of results from Hewlett Packard if you search for HP. But that’s OK. The user has indicated that’s their interest by doing that query.” He continues, “But in general, what we try to balance is this trade-off between a good diversity of results, because you don’t know exactly what the user was looking for, so you want to give them a little bit of a sampling to say, ‘OK, here’s a bunch of different possible interpretations. Here’s what you might be looking for.’ And then we also want to absolutely give the results that we think match the query well, and sometimes that can be from multiple pages within the same site.” “So there’s always a tension,” says Cutts. “There’s always a trade-off in trying to figure out what is the best set of search results to return. There’s no objectively true or perfect way to do it. We’ve varied our scoring. We’ve varied our user interfaces. And if there’s one thing you can count on, it will be that Google will continue to test out ideas. Google will continue to evolve how often we think it’s appropriate to show how many results from how many sites in the search results.” Google, as you may know, makes changes to its algorithm every day. Each month, Google puts out a big list of recent changes. Here are the changes Google made in May . Those are just the actual changes. Google also runs 20,000 search experiments a year .

Jun 1 2012

Matt Cutts Addresses Duplicate Content Issue In New Video

This week, Google posted a new Webmaster Help video featuring Matt Cutts talking about a potential duplicate content issue. This time, he even broke out the whiteboard to illustrate his points. Specifically, Cutts addressed the user-submitted question: Many sites have a press release section, or a news section that re-posts relevant articles. Since it’s all duplicate content, would they be better off removing these sections even with plenty of other unique content? “The answer is probably yes, but let me give you a little bit of color about the reasoning for that,” Cutts says in the video. “So a lot of the times at Google, we’re thinking about a continuum of content, and the quality of that content, and what defines the value add for a user. So let’s draw a little bit of an axis here and think a little bit about what’s the difference between high quality guys versus low quality guys? Take somebody like The New York Times. Right? They write their own original content. They think very hard about how to produce high quality stuff. They don’t just reprint press releases. You can’t just automatically get into The New York Times. It’s relatively hard. Right?” “At the other end of this spectrum is the sort of thing that you’re talking about, where you might have a regular site, but then one part of that site, one entire section of that site, is entirely defined by maybe just doing a news search, maybe just searching for keywords in press releases,” he continues. “Whatever it is, it sounds like it’s pretty auto-generated. Maybe it’s taking RSS feeds and just slapping that up on the site. So what’s the difference between these?” “Well, The New York Times is exercising discretion,” Cutts explains. “It’s at exercising curation in terms of what it selects even when it partners with other people, and whenever it puts other content up on its site. And most of its content tends to be original. Most the time it’s thinking about, OK, how do we have the high quality stuff, as opposed to this notion– even if you’ve got high quality stuff on the rest of your site, what is the value add of having automatically generated, say, RSS feeds or press releases, where all you do is you say, OK. I’m going to do a keyword search for Red Widgets and see everything that matches. And I’m just going to put that up on the page.” “So on one hand, you’ve got content that’s yours, original content–there’s a lot a curation. On the other hand, you’ve got something that’s automated, something that’s more towards the press release side of things, and it’s not even your content. So if that’s the case, if you’re just looking for content to be indexed, I wouldn’t go about doing it that way.” For many in the SEO realm, there aren’t any new revelations here, but duplicate content is an issue that continues to be a problem many worry about, even after so many years. It’s still part of Google’s quality guidelines, and as you probably know, the Penguin update is designed to algorithmically enforce those, so that on its own is a good reason to exercise caution in this area.