We recently came across an article by Rand Fishkin, a recognized and widely cited search marketing industry expert with 15 years of experience in the field, about Google documents leak.
While this documentation doesn’t fully explain how Google ranks sites in its search results, it does provide interesting details about the data that Google collects. Many of the findings in this documentation contradict some of Google’s public claims. For example, the information suggests that subdomains are not counted separately and that domain age is not collected or factored into rankings.
Like Rand, we believe that everyone involved in SEO has the right to study these documents, understand their implications, and draw their own conclusions. Many of our own partners, in fact, are travel bloggers with websites earning through the Travelpayouts brands.
At a time when Google is introducing frequent updates that critically affect the positions of travel blogs in the search results, as well as the volume of traffic and earnings from affiliate marketing, this information is extremely important and interesting. It is especially valuable for travel bloggers and affiliates who are looking to adjust their SEO and blogging strategies.
We won’t be rehashing the entirety of Rand’s article or anything along those lines, as you can read all the first-hand information on the SparkToro blog. Instead, we enlisted the support of our SEO expert, Anton Ivlichev. We have asked him to highlight the interesting data he noticed in the leaked documents and explain how this information can be utilized by travel bloggers.
Should You Trust the Leaked Google API Documents and This Article?
- To verify the authenticity of the Google API Content Warehouse documents, Rand consulted three ex-Googlers. One declined to comment, while the other two confirmed that the documents seemed legitimate based on their experience.
- Additionally, Rand received confirmation from Mike King, the founder of iPullRank, who affirmed that the documents seemed authentic and contained significant information about Google’s inner workings.
- The most recent date in the API documents is August 2023, suggesting that the documentation was up-to-date as of summer 2023 and possibly even as recent as March 2024. The documentation references deprecated features and includes notes indicating that some features should no longer be used, which implies that those without such notes were still active as of the March 2024 leak.
- Google’s leaked documentation does not definetely prove which elements are used in their ranking systems. Therefore, the conclusions drawn from this Travelpayouts article and Rand’s article cannot be considered definitive evidence that Google uses certain elements or gives them specific weight when ranking sites in its search results.
What Myths Does the Google Documents Leak Dispel? An Explanation by Anton Ivlichev, Travelpayouts’ SEO Expert
Anton Ivlichev, SEO expert at Travelpayouts
The leaked documentation describes 2,596 API modules containing 14,014 attributes (functions). These modules encompass various aspects, including YouTube, Google Assistant, Google Books, video search, links, web documents, data collection infrastructure, internal calendar systems, and People API.
There’s a sense that Google’s representatives may be intentionally misleading users and professionals about the ways in which their systems function. This suspicion arises because certain elements, previously claimed to hold no weight, are evidently taken into account by Google’s algorithm.
1. Domain Authority
Despite Google’s repeated assertions that they don’t rely on domain authority, there exists a function in their code known as “siteAuthority”.
What does that mean?
To secure your website placement on the first few pages of Google search results, it’s essential to cultivate a network of high-quality links.
2. Clickstream Factors
According to the documentation, Google analyzes user clicks on search results. Specifically, the platform focuses on the following elements:
- badClicks
- goodClicks
- lastLongestClicks
- unsquashedClicks
- unsquashedImpressions
- unsquashedlastLongestClicks
What does that mean?
You need to improve your “last click” rate. This metric is crucial, as it signifies the moment when a user clicked on your site and didn’t return to the search results (indicating that they found the answer to their query).
In the documentation, there are references to “unsquashed clicks”. “Squashing”, as per Google’s patent, refers to a feature that prevents one significant signal from overpowering others. In essence, the click data needs to be averaged. This implies that simply adjusting a few correct transitions won’t significantly impact these indicators.
3. Sandbox
The sandbox effect occurs when a newly created site struggles to achieve a high rank in the search results. Although Google publicly denies its existence, the documentation for the “PerDocData” module includes a “hostAge” attribute. This attribute is utilized to sandbox new spam sites, thus safeguarding the search engine from low-quality content.
What does that mean?
In other words, if you have a new site on a new domain, don’t anticipate rapid growth. If you require swift expansion, consider creating a site on a drop domain (a domain that was once used and will be free soon) with an established history.
4. Using Google Chrome’s Data
Representatives have asserted that Google doesn’t incorporate data from Chrome into its organic search rankings. However, one of the modules concerning page quality indicators utilizes statistics from Chrome visits. Furthermore, the Google Chrome attribute appears in the module related to links. This enables Google to gauge site-level views from Chrome.
What does that mean?
This knowledge can be leveraged to optimize one’s website for Google Chrome, thereby avoiding the accumulation of negative statistics on user behavior.
Looking beyond traditional SEO, one can deduce that Google Chrome usage can potentially influence favorable behavioral factors on a website.
5. The History of Page’s Content
The “urlHistory” feature is responsible for storing a page’s history, while retaining the last 20 versions.
What does that mean?
This suggests that you would need to change a specific webpage 20 times for Google to disregard its old content.
5 Tips on How Travel Bloggers Can Use This New Information About Google
Anton Ivlichev, SEO expert at Travelpayouts
- Focus on Building a Quality Link Profile: Google still highly values links, and there’s no evidence in these leaked documents suggesting otherwise.
- Assess Link Value: Pay attention to the “sourceType”, which indicates the storage location of a page in Google’s index as well as its value. Google’s index is categorized into tiers based on storage types:
- Flash Storage: holds the most important and frequently updated content.
- Solid-State Drives: contains less critical content.
- Hard Disks: stores content that isn’t regularly updated.
Higher-tier pages and news resources carry more weight, making links from these sources more valuable.
- Optimize Content Placement: Position the most crucial content at the beginning of the page. Google’s system has a maximum character limit for page analysis.
- Craft Relevant Headlines: Ensure your headlines align with user queries. The “titlematchScore” indicates Google’s evaluation of how well the page title matches the user’s search. Start your titles with targeted keywords for better visibility.
- Aim for Authority: Strive to present your site as a large resource in Google’s eyes. Google flags sites as “small personal sites”, thus indicating a preference for larger sites and aggregators.
Share in the comments: was this transcript of the Google documents leak helpful to you? Did you find the tips from our expert practical and beneficial?
SEO experts, we’re eager to hear from you: what interesting points have you noticed in the Google API?