Google诚信指数(TrustRank)是指网站对于Google而言值得信赖的程度。
网站TrustRank的计算采用人工和机器连接分析相结合的方式。通过Google或其他一些检索机构的专家,可以先确定一批站点的TR值,在通过机器的连接结构分析来确定互联网上其他站点TrustRank值,然后以TR值的高低来做为网页排名的一个重要依据。跟PR值原理类似,如果其他站点获得了来自高Tr值站点的连接也将获得更高的TR值。Google TrustRank应该是以站点而不是页面为单位的。
由于搜索引擎在计算网页排名的时候,非常依赖连接,而且连接的质量越来越显得重要。这种情况,需要对连接的来源站点质量进行判断。更重要的是,以前依靠连接和相关性来决定排名的方式,已遭到了各种各样作弊行为的挑衅,Spam的横行,直接导致了Google必须找到一种新的反作弊机制,以确保高质量的站点来获得搜索引擎的亲徕。这种情况下Sandbox和TrustRank被提了出来。意图确保好的站点能获得更高的搜索表现,并加强对站点的审核。Google自己关于TrustRank的最初论述也提到了这些。
Google自己最早关于TrustRank的论述:
Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine’s results. While human experts can identify spam, it is too expensive to manually evaluate a large number of pages. Instead, we propose techniques to semi-automatically separate reputable, good pages from spam. We first select a small set of seed pages to be evaluated by an expert. Once we manually identify the reputable seed pages, we use the link structure of the web to discover other pages that are likely to be good. In this paper we discuss possible ways to implement the seed selection and the discovery of good pages. We present results of experiments run on the World Wide Web indexed by AltaVista and evaluate the performance of our techniques. Our results show that we can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites.
垃圾页面经常使用各种各样的作弊方法来获取一个好的搜索
