Best Practices for Speeding Up Your Web Site - Yahoo Developer Network

Defining the coordinates of image maps can be tedious and error prone. use the ExpiresDefault directive to set an expiration date relative to the .. If you're not taking advantage of the flexible validation model that ETags.

Mix in ads with a double listing and sometimes there will only be 1 website listed above the fold. I've even seen some Bing search results where organic results have a "Web" label on them - which is conveniently larger than the ad label that is on ads.

That is in addition to other tricks like On mobile devices organic search results can be so hard to find that people ask questions like " Are there any search engines where you don't have to literally scroll to see a result that isn't an advertisement? But other than that, it is slim pickings. In an online ecosystem where virtually every innovation is copied or deemed spam, sustainable publishing only works if your business model is different than the central network operators.

Not only is there the aggressive horizontal ad layer for anything with a hint of commercial intent, but now the scrape layer which was first applied to travel is being spread across other categories like ecommerce. And alarms are going off at Amazon now. Yes, Prime is killer, but organic search traffic is going to tank. Simply look at the market caps of the big tech monopolies vs companies in adjacent markets. The aggregate trend is expressed in the stock price.

And it is further expressed in the inability for the unicorn media companies to go public.

Indeed you will rarely see advertising around news cycles in Google Search either. Sure it is not the ad revenues they are stealing. Rather it is the content. Some publishers have tried to offset this by putting more ads on their own site while also getting further distribution by adopting the proprietary AMP format. Those who realized AMP was garbage in terms of monetization viewed it as a way to offer teasers to drive users to their websites.

The partial story approach is getting killed though. Either you give Google everything, or they want nothing. That is, after all, how monopolies negotiate - ultimatums. Those who don't give Google their full content will soon receive manual action penalty notifications Important: If not, Google will direct users to the non-AMP urls. And the urls won't be in the Top Stories carousel. Site owners will receive a manual action: Every month Google announces new ad features.

Economics drive everything in publishing. But you have to see how one market position enables another.

Nobody rolls out their own fab and builds up from raw silicon; we all reuse some component or another, even if it's a language runtime, a web framework like Django or Rails, a protocol like Paxosa fast databaseor a library, say for numerical analysis or even natural language processing.

Even CS theoreticians are, in a sense, reusing techniques from math. Everybody stands on the shoulders of the giants that came before them, and all that. But it's critical to keep tabs on the ratio known as "glue versus thought. But the former is eminently mundane, replaceable, and outsource-able. The latter is typically what gives a company its edge, what is generally regarded as a competitive advantage.

So, what is Yahoo signaling to the world? Let's get some perspective here: Summly wasn't reading Ulysses by James Joyce and extracting the fact that the three-masted ship Leopold Bloom sees on the horizon is a metaphor for the Holy Trinity and therefore represents the Catholic Church. It wasn't reading a 12 page article in Harper's and extracting the cleverest puns and pop culture send-offs lovingly embedded by a writer who is good at his craft and earning below his potential.

And it wasn't taking my blog posts and somehow conveying the nuanced ennui I harbor for bolt-on engineering. It was summarizing news. Articles that are already written with a TL;DR in the first paragraph. Very rarely, the first paragraph contains what journalists call a "hook," and the infamous 5-W's are embedded in the second paragraph. And if Yahoo were to look at the work of anyone who is active in NLP e.

Claire CardieLillian Leeit'd immediately discover that this is a deep field full of exciting developments at its core.

Yahoo! Search - Wikipedia

Gluing an NLP engine up to news surely adds some value, but pales in comparison to what cutting edge NLP algorithms can accomplish. So if Yahoo is to be a technology company, it needs to do core technology acquisitions that give it a competitive advantage.

Glue is not that kind of advantage. I wrote about this before when my post on MongoDB's fault tolerance was getting really dumb responses from people who seemed to have difficulty reading: I lived through 8 years of a non-reading president along with everyone else. Reducing the number of components in turn reduces the number of HTTP requests required to render the page. This is the key to faster pages. One way to reduce the number of components in the page is to simplify the page's design.

But is there a way to build pages with richer content while also achieving fast response times?

Here are some techniques for reducing the number of HTTP requests, while still supporting rich page designs. Combined files are a way to reduce the number of HTTP requests by combining all scripts into a single script, and similarly combining all CSS into a single stylesheet.

Combining files is more challenging when the scripts and stylesheets vary from page to page, but making this part of your release process improves response times.

CSS Sprites are the preferred method for reducing the number of image requests.

Combine your background images into a single image and use the CSS background-image and background-position properties to display the desired image segment. Image maps combine multiple images into a single image. The overall size is about the same, but reducing the number of HTTP requests speeds up the page.

Image maps only work if the images are contiguous in the page, such as a navigation bar. Defining the coordinates of image maps can be tedious and error prone.

Using image maps for navigation is not accessible too, so it's not recommended. Inline images use the data: URL scheme to embed the image data in the actual page.

This can increase the size of your HTML document. Combining inline images into your cached stylesheets is a way to reduce HTTP requests and avoid increasing the size of your pages.

Inline images are not yet supported across all major browsers. Reducing the number of HTTP requests in your page is the place to start. This is the most important guideline for improving performance for first time visitors. Making your page fast for these first time visitors is key to a better user experience. Use a Content Delivery Network tag: Deploying your content across multiple, geographically dispersed servers will make your pages load faster from the user's perspective.

But where should you start? As a first step to implementing geographically dispersed content, don't attempt to redesign your web application to work in a distributed architecture. Depending on the application, changing the architecture could include daunting tasks such as synchronizing session state and replicating database transactions across server locations.

Attempts to reduce the distance between users and your content could be delayed by, or never pass, this application architecture step. This is the Performance Golden Rule. Rather than starting with the difficult task of redesigning your application architecture, it's better to first disperse your static content.

This not only achieves a bigger reduction in response times, but it's easier thanks to content delivery networks. A content delivery network CDN is a collection of web servers distributed across multiple locations to deliver content more efficiently to users.

Yahoo! Search

The server selected for delivering content to a specific user is typically based on a measure of network proximity. For example, the server with the fewest network hops or the server with the quickest response time is chosen. For start-up companies and private web sites, the cost of a CDN service can be prohibitive, but as your target audience grows larger and becomes more global, a CDN is necessary to achieve fast response times.

Switching to a CDN is a relatively easy code change that will dramatically improve the speed of your web site. Add an Expires or a Cache-Control Header tag: A first-time visitor to your page may have to make several HTTP requests, but by using the Expires header you make those components cacheable.

What's Actually Wrong with Yahoo's Purchase of Summly

This avoids unnecessary HTTP requests on subsequent page views. Expires headers are most often used with images, but they should be used on all components including scripts, stylesheets, and Flash components. Browsers and proxies use a cache to reduce the number and size of HTTP requests, making web pages load faster. A web server uses the Expires header in the HTTP response to tell the client how long a component can be cached.

This is a far future Expires header, telling the browser that this response won't be stale until April 15, Thu, 15 Apr This example of the ExpiresDefault directive sets the Expires date 10 years out from the time of the request. ExpiresDefault "access plus 10 years" Keep in mind, if you use a far future Expires header you have to change the component's filename whenever the component changes.

Using a far future Expires header affects page views only after a user has already visited your site. It has no effect on the number of HTTP requests when a user visits your site for the first time and the browser's cache is empty. Therefore the impact of this performance improvement depends on how often users hit your pages with a primed cache. A "primed cache" already contains all of the components in the page. We measured this at Yahoo!

By using a far future Expires header, you increase the number of components that are cached by the browser and re-used on subsequent page views without sending a single byte over the user's Internet connection. It's true that the end-user's bandwidth speed, Internet service provider, proximity to peering exchange points, etc. But there are other variables that affect response times. Compression reduces response times by reducing the size of the HTTP response.

The web server notifies the web client of this via the Content-Encoding header in the response. The only other compression format you're likely to see is deflate, but it's less effective and less popular. If you use Apache, the module configuring gzip depends on your version: There are known issues with browsers and proxies that may cause a mismatch in what the browser expects and what it receives with regard to compressed content.

Fortunately, these edge cases are dwindling as the use of older browsers drops off. The Apache modules help out by adding appropriate Vary response headers automatically.