How Search Works website


Transcript

Hey everybody. This is Matt Cutts. And today I wanted to talk about a website that you might have visited or you might not. It's called How Search Works. And if you haven't seen it, I highly encourage that you check it out. I think if you search for How Search Works on your favorite search engine, you'll probably be able to find it. But it's just a small site that talks about advances we've had in crawling, our algorithms, how we fight spam. And we've even made public a lot of our removal policies. Some really concrete, nitty gritty stuff. Even if you have seen the site before, I wanted to sort of walk you through some of the things that you might not have noticed and which are actually quite nice. So if you go to the main page of How Search Works and you sort of scroll through a little bit, you'll notice that it's almost like an infographic. But it's actually interactive. So you can click around and find all kinds of fun Easter eggs. One of the things that I really enjoy is if you get down to the very bottom of the page, it'll tell you, you've been on this page for 150 seconds, or something like that. In that time, Google has actually handled, you know, 5.7 million searches, or something along those lines. It's kind of interesting. I think we've said before that Google handles over two billion searches a day or something along those lines. But it's pretty neat to just figure out the amount of math. And you could extrapolate and say, OK, how many searches a day does Google have. Now that's, I think, a static counter based on when the site launched. But it's kind of a fun little interesting thing to play with. As you look through the site, you'll also find that we talk a lot about how we do evaluation. So we've got some videos about how we evaluate search quality. Just to remind people of a misconception a lot of people have, we do evaluate new algorithms and then send them out to what we call quality raters. And they look at whether they get one set of search results on the left and one set of search results on the right. And they have to decide which one looks better to them. And they don't know what algorithm's being evaluated. And whenever they vote, we take that data and we say, OK, which search results got better. And which ones got worse? But we don't take those votes, those ratings from the quality raters, and directly apply them in our ranking algorithm. Now what's kind of fun is, we actually show the funnel for the things in a recent year-- I think it was 2012-- where we went through 118,000 ideas, where we just played around with a new way of doing generating search results. And, using the ratings that we'd already gotten from quality raters, we were able to say, oh, in general, this looks like a promising experiment, for example. From there, we did 10,000 what we call side-by-sides, where again, you get these side-by-side sets of search results. And it's like a blind taste test. And you ask people, which one do you like better. Based on that, we did 7,000 of what we call live traffic experiments, where we actually take an experiment and we put it out on our main website. And we look at how often people click on various search results to try to determine whether we're actually making the search results better. And so the net result was that we were able to launch 665 algorithmic changes, things that changed on our search results page in 2012, which is kind of interesting. To put that into context, that's roughly two changes to how we generate the search results page every single day, for the entire year. So it's kind of funny when people come and ask, well OK, what happened on such and such date. Because there's usually a lot of stuff happening, things rolling out, new data being deployed. And those are actual changes, not just data being refreshed, that we're talking about. So that gives you a little bit of a feel for the scale of how many different changes we're exploring at any given point. Now the part of How Search Works that I enjoy the most is the spam section. And there's a lot of nitty gritty detail there. We went into all kinds of information that you might not have seen before. So for example, there's a spam carousel. And that is updated periodically. So you actually get to see spam right after we've removed it. So we'll show you a screen shot so that you don't run into danger of getting infected by malware or something. But it's literally like you can watch over our shoulder as we're removing spam. And so you get a chance to see the sorts of stuff that we have to deal with every single day. Right below the spam carousel, you'll see that we have different types of spam. So we talk about the categories of spam. I think that's pretty helpful to know because that lets you know the sorts of stuff that we have to deal with. So the major categories are cloaking or sneaky redirects, hacked sites, hidden text or keyword stuffing, parked domains, pure spam, which is just another name for black hat, when it's like, any savvy user would hopefully be able to recognize it as absolute spam. Things like spamming free hosts, or dynamic DNS providers, thin content with very little added value, unnatural links from a site, unnatural links to a site-- and then user-generated spam, where you might have good content up front, but maybe so many spam comments that it's actually causing bad search results or a bad user experience. So there are more specific, more granular, more detailed things, within each one of those. So unnatural links from a site might involve someone who was selling links that pass PageRank, for example. But that gives you an idea of the overall categories that we look at whenever we're actually fighting spam. The other thing that's kind of interesting if you surf down the page and look a little bit, is we give you several different graphs. We actually tell you month by month the actions that we've taken on spam, so what types of actions and how many actions we took. And if you look, you'll see that the vast majority of what we tackle is what we classify as pure spam or black hat spam. So that just means that it's stuff that, you know, it's gibberish, it's something that anybody would be able to recognize if they're sufficiently savvy. It might be machine-generated, auto-generated sort of spam, hopefully the sort of thing that anybody would look at and be like, wow, I hope I don't see that in my search results. Something that you might not notice is the next biggest category within recent years has been hacked sites. And it's kind of funny, because back in 2010, there was some SEO who wrote something like, what's the web spam team been doing. I haven't seen a lot of action from them recently. And we were actually engaged in a pitched battle, hand-to-hand combat on hacked sites, which, if you're just a regular SEO, or even a regular black hat SEO, back then you might not have noticed as much. So it's not the case that we were taking a break or taking things easy. We were working very hard on spam. It was just a type of spam that most people hadn't encountered yet. And we're going to keep working on all those kinds of things. So you can get those kinds of insights when you look through these graphs and see, OK, this is the history of the sort of stuff that Google has had to tackle in terms of spam. What's also interesting is we've started to do more and more messaging over time. Now we could probably do better and think about other ways to get more concrete, more actionable messages to webmasters. And we're going to keep exploring that. But when you look at the milestones in terms of what we've done in terms of communication, it actually is pretty exciting. And you can see the volume spike up as we've started to give more and more information. At this point, for pretty much any direct action that you take, that the manual webspam team takes that affects your ranking, the webmaster will get a message about that. And that's really helpful, because at least you know that there's an issue. And you can start to deal and dig into it and start to investigate a little bit. So it's kind of interesting. You know, I'm looking at one graph that says, in January of 2013, we sent over 431,000 messages as a result of actions that we took on the webspam team. And so the other thing that you should think about is the scale at which we're operating. Now remember, that's manual webspam actions, which then generated some sort of message to the webmaster. The idea that we could have a one-on-one conversation with 431,000 different owners of websites sort of shows you the scale that we're operating at and why it's hard, and why, so far, we haven't figured out a way to have a one-on-one conversation with every single webmaster who wants to rank number one, or rank highly, or has questions about potential webspam action. But what you can see below that is a graph that shows the reconsideration requests that have been submitted. And so for a random week in 2013, there were roughly 5,000 reconsideration requests. And bear in mind, this is interesting. So over a month, 430,000 messages go out. And then in a week, we get 5,000 reconsideration request messages. So if you take that week-long baseline  and turn it into a month, call it about 20,000 reconsideration request processing messages that we handle during a month. Now what's interesting about that is, if you do the math, that basically means, of all the people we alert of manual webspam action, right now at least, less than 5% of those people request reconsideration. So that actually means that most the time we're killing spam and the spammers are not saying, hey this is not right. I want to contest this. They're actually saying, OK you caught me. I'm going to move on to try to do it on a different URL where you won't catch me next time. So it's kind of neat to take some of these numbers and compare them out and play a little bit with realizing what insights can we get from these kinds of graphs. And it shows you the scale of the problem. If you have 20,000 people a month who want to talk to you about why they think their website should rank highly when we think that it has at least violated the guidelines, you see the sort of difficulties we have in trying to talk to everybody. We'll keep trying to do better. We'll keep trying to be more transparent. But I think it's fantastic that we've got this How Search Works website. We've got some dashboard where you can see how things are going. And you can even see live examples of spam as they get thrown out. So we'll keep looking at ways to make things even better. But I think you'd really enjoy the website. If you get a chance to check it out, dig in and just absorb some of the information that's available on the website. Thanks very much. 


Learn more about how Cultured Digital approaches SEO.


Want to chat?

Let Cultured Digital take care of your Organic search growth.

We've been doing this since 2001 and have helped 100s of companies around the world. Cultured Digital is a Sunderland SEO company, we'll be more than happy to put you in touch with current and past clients to talk about how we've helped them grow.

Get in touch