The Xindexer 2 Loop

The entire purpose of building Xindexer 2 was to automate all aspects of the indexing process to include checking for dead links and checking with Google on a regular basis to see if your link has been indexed or not. The result of over 12 months of development time and testing is what we call the Xindexer 2 Loop. This is a huge evolution over Xindexer 1 and all of our competitors where URLs are only worked for one day and then forgotten about. In order to implement all of these things into a system that could handle millions of URLs we had to create a few tools, specifiacally a link checker and an index checker and a very detailed description of what is happening.

Link Checker

The link checker is a tool that verifies that your link that is in the system is a good link. We all know that a lot of the links that SEnukeXCr creates end up being deleted by system admins, blocked by registration pages or just plain bad in the first place. There are a lot of places where the process can break and blindly submitting all of the URLs that SENukeXCr creates to an indexing service is a mistake (unless you submit them to Xindexer 2 that is). These links need to be screened and this is where the link checker comes into play.

Every link that Xindexer 2 processes under a subscription plan or by using credits is checked for its status BEFORE it is added to your queue. In other words, if we find a dead link in the list that is submitted to us, we filter it out BEFORE you get charged for it. This saves you a lot of money and resources as we only process URLs that are actually going to do you some good.

We take this one step further and break down how a link fails into five categories. Dead, Error, Soft Match, Timeout and No Follow

  • Dead links are links that do not have your Money Site links on the page, the page is working (i.e. it's not returning a 404) but it is blank.
  • Error links are links that the server tried to load but was given an error code from the server, i.e. we tried to load http://profilepages.com and the server returned a 500 Internal Server Error code. If our bots are getting an error code then so will the Google Bots.
  • Soft Match links are when the admin of a site has some code in place to modify the outbound link. This is done to prevent us from getting credit for putting a link on their site, i.e. your link should be this: http://moneysite.com but the admin has changed it to this: http://profilepages.com?outlink=http://moneysite.com. So when Google bots see this link, they see it as one link and your site doesn't get credit for it. If a user clicks on it, the code is built to automatically redirect the user to the proper site. This is purely a defensive measure that system admins put in place (it is quite effective).
  • Timeout links are links that timeout when we try to load them. This is a common occurrence on the profile link pages as they are typically run on under powered servers and take a long time to load. Google has placed a premium on page load times, if it takes 30 seconds to load a page, the link on that page is not helping very much.
  • No Follow links are links that system admins add a rel="nofollow" tag. This is also a defensive maneuver; however, it is not nearly as effective, Google still follows nofollow links. Due to this (and the fact that almost all of the wiki modules have nofollow tags) we have turned this checker off by default. If you want to filter out nofollow links, then you will need to log into the advanced dashboard, go to the settings page and turn this feature on.

Index Checker

The objective of what we are doing is to get your links indexed in Google. Once a link gets indexed, then we no longer need to work it because our objective has been met. We also run an index check on your links BEFORE we add them to the queue, again this saves you money as there is no need to enter a link into the loop that is already indexed.

Running this checker is challenging simply due to the fact that Google has a "No Scraping" policy, which is quite ironic since their entire business plan is built around scraping everybody else. In order to check the index status of your links, we have to employ a large proxy network so that we can throttle the queries to Google (Google is very quick to ban IP Addresses that violate this no scraping policy). Indexing is a pretty straight forward process as long as you have enough proxies, we simply run a query to Google and check the results to see if your link is there or not. If it is, then it is marked as "indexed" and your indexing rate goes up.

The Loop

We have broken down the process into a 7 day process, each major step below represents one day:

  1. Post
  2. Wait
  3. Wait
  4. Wait
  5. Wait
  6. Link Check
  7. Index Check

This 7 day process is repeated 4 times for URLs that need it for a total of 28 days of processing. Remember that as URLs are filtered out (dead, error, indexed, etc) they are removed from the process. If a link is still alive and not indexed after the 28th day, then it is marked as archived. Once a link is marked as archived, it will lower your indexing rate (i.e. we failed to get that link indexed).

Status

We needed a way to track all of this and what we have come up with is a status system. The list is quick long as there are a lot of steps in what we are doing. The posts, link check and index check are broken down further to represent the difference between a link that has been processed and one that is still waiting to get processed. The links marked as "In" have not been processed yet and links marked as "Out" have been processed. The numbers that are associated with the status are the system day. This is how we track where the links are in the process, day zero is the first day of posting in our network. So Index Out 13 is a URL that has just gone through the index checker and is still alive and not indexed. When the system resets at midnight, this link will be updated to Post In 14 and it will start the 7 day process over again.

Here's the entire list of statuses:

  1. Dead
  2. Error
  3. Soft Match
  4. No Follow
  5. Timeout
  6. (not used)
  7. Free Queue
  8. Free Post
  9. Free Archive
  10. Archived
  11. Indexed
  12. Custom
  13. Queued
  14. Post In 0
  15. Post Out 0
  16. Idle 1
  17. Idle 2
  18. Idle 3
  19. Idle 4
  20. Link In 5
  21. Link Out 5
  22. Index In 6
  23. Index Out 6
  24. Post In 7
  25. Post Out 7
  26. Idle 8
  27. Idle 9
  28. Idle 10
  29. Idle 11
  30. Link In 12
  31. Link Out 12
  32. Index In 13
  33. Index Out 13
  34. Post In 14
  35. Post Out 14
  36. Idle 15
  37. Idle 16
  38. Idle 17
  39. Idle 18
  40. Link In 19
  41. Link Out 19
  42. Index In 20
  43. Index Out 20
  44. Post In 21
  45. Post Out 21
  46. Idle 22
  47. Idle 23
  48. Idle 24
  49. Idle 25
  50. Link In 26
  51. Link Out 26
  52. Index In 27
  53. Index Out 27