This June probably set the record for large scale failures on the Internet. First Facebook experienced an outage that lasted hours from May 31st to June 1st. On June 14 a several users were complaining on Twitter that Google services were down – luckily it lasted about 10-15 minutes and had no press coverage. On the same day Amazon Web Services experienced an outage again on their East Coast datacenter impacting many services including Heroku, a cloud application provider. Lastly Twitter went down for a little over two hours on June 21st.
These failures impacted not only the users of the sites in question, but also end users of websites that relied on widgets, ads, or infrastructure delivered from these companies.
Failures Are Bound to Happen
It was not the first time these companies experienced problems and it will not be the last time they fail. It does not matter how big a company or their infrastructure is, failures are bound to happen. It could be a complete failure impacting anyone, or a partial failure impacting a lucky few.
Third Party Performance Impact
Webpages, whether for desktop or mobile users, are a complex mesh of markup language, HTTP requests, stylesheets, and JavaScript code that come together to present a beautiful visual output to the end-user.
Not all requests have the same impact on the load of the webpage. Some requests, like inline JavaScript, have the biggest impact whereby they can delay the loading of the content completely. Steve Souders, the father of web performance, rightly named these type of request Single Point of Failure, or SPOF.
Unfortunately for websites and their end users, the great majority of third-party providers like adservers, trackers, and widgets, rely on inline JavaScript that is blocking. As a result when these vendors were having performance problems or downtime the end users of their clients also experience problems.
Mitigating Risks
All four Catchpoint founders worked at DoubleClick between 1997 and 2008, the largest third-party provider for online advertising, and were deeply involved with managing, building, and monitoring the infrastructure and the tags that were placed on thousands of websites. My very first encounter with Souders was not a pleasant one, I got criticized for slowing down Yahoo homepage with DoubleClick’s inline JavaScript Tag – which of course was blocking (guilty as charged to this day).
During our tenure at DoubleClick we learned the hard way that failure will happen and that the impact was HUGE. Therefore we incorporated managing failure in our plans, processes, and infrastructure. We were one of the first adserving companies that introduced SLAs in the 90sand we monitored our system carefully and ensured we met those SLAs. If we did not, we had to give money back to our customers.
Performance Is not the Only Risk
While SLAs on speed and availability are key because they ensure the provider has skin in the game and the website has an insurance, they might not be sufficient to mitigate all risks. A third-party vendor can impact a publisher in many different ways besides slowing down the page. Their code is executing on the browser and it can conflict with code on the page, it can break user experience in certain browsers, it can deliver inappropriate content or worst malware, and their system could store data about the user without the user or the website knowing inadvertently or worst on purpose.
These are all risks that websites and their providers must overcome, and the only way that can be achieve is if the providers clearly follow certain rules. Based on our previous and current experience in the area we wrote up a set of rules that we believe third-party providers should aim to follow in order to mitigate risks and be successful.
The Golden Rules
1. Provide and enforce meaningful SLAs. As a service provider to other companies you must have SLAs set in place. You should make sure they are properly monitored and enforced – so that everyone in your organization is aware of the consequences of “failure”.
2. Don’t block content, rely on “ASYNC tags”. The one line JavaScript code you are providing might be very easy to implement, but it is most likely blocking the loading of the page when your system is down. Make sure your tags load async, and they do not block the load of other requests on the page – or key events like “DomContentLoaded” and “onload”. Recommending that the tag is placed before closing “body” tag, will not be sufficient if it is not ASYNC – users will still see the hourglass and webpage might not function properly.
3. Rely on a global infrastructure that covers your clients needs (self built or CDN). Your clients place your tags on their pages and have visitors from various geographical locations. Therefore ensure that your service is reachable and close to these locations. If your servers are in US West Coast, and you clients have mainly European users – you might want to rethink the location of your infrastructure and rely on CDNs for static libraries and distributed DNS system.
4. Write safe client side code. If your JavaScript is executing on the window document (as opposed to your own dedicated iframe) ensure that your variables, methods, objects and events do not conflict or override what is already in the webpage. You should not be overriding, hiding, or blocking any of the content or pre-existing behavior as it could lead to bad user experience (unless specifically part of the service – such as overlaying ads).
5. Ensure security of the content you deliver. Ensure that your system/service does not deliver malware, phishing, or viruses. In some cases you might also ensure that the content is appropriate for the site, for example not delivering alcohol related ads on kids websites.
6. Don’t hide in the shadows. Ensure your tag serving domain has an index page with your company information, contact, and a link to your privacy policy. Also don’t anonymize domain Whois information. Nothing worse than a savvy end-user noticing a domain holding page load – and not being to find out who the domain belongs to.
7. Respect end-user privacy and your clients privacy. Make sure you have a clear privacy policy which your clients agree to and make available to their end users. Do not track any information that breaks your policy of those of your clients.
8. Work properly on all browser and devices. Handle all browsers and versions properly (might not have the functionality, but it should not break it or cause JavaScript errors). Be up to date with new technology.
9. Don’t be a resource hog. Ensure your content does not become a hog of bandwidth, CPU, Memory, or other resources. Remember you are not the only tag in the page and not the primary content the end-user is visiting the page for. So be a minimalist and smart on end-user resource utilization.
10. Have a test plan for new releases. Test your tags and new releases on actual web pages of your clients and make sure it does not break them. Test on different browsers and devices to ensure it works as planned. If it is a major change, notify clients of the change and involve them in the testing process. Don’t just test a new release on your client’s pages without clients knowing of the change – you might break something for their end users and the site is not aware of it.
Whether you are a vendor or a website, we hope you will find these rules helpful in ensuring great user experience. If you have any suggestions or feedback please feel free to share with us.
From Velocity 2012 - Mehdi - Catchpoint
Related Articles:
- Facebook Outage: Wake Up Call For Websites! (6/2012)
- The vendor who flunked the web performance test! (5/2012)
- Webpages turning into Airports without Traffic Controller! (10/2011)
- 3rd Party Monitoring – A process (12/2010)
- True Availability of a Webpage (7/2010)
- Monitoring the Performance of 3rd Party Providers (7/2010)