19 KiB
The cost of small modules
About a year ago I was refactoring a large JavaScript codebase into smaller modules, when I discovered a depressing fact about Browserify and Webpack:
“The more I modularize my code, the bigger it gets. ”– Nolan Lawson
Later on, Sam Saccone published some excellent research on Tumblr and Imgur‘s page load performance, in which he noted:
“Over 400ms is being spent simply walking the Browserify tree.”– Sam Saccone
In this post, I’d like to demonstrate that small modules can have a surprisingly high performance cost depending on your choice of bundler and module system. Furthermore, I’ll explain why this applies not only to the modules in your own codebase, but also to the modules within dependencies, which is a rarely-discussed aspect of the cost of third-party code.
Web perf 101
The more JavaScript included on a page, the slower that page tends to be. Large JavaScript bundles cause the browser to spend more time downloading, parsing, and executing the script, all of which lead to slower load times.
Even when breaking up the code into multiple bundles – Webpack code splitting, Browserifyfactor bundles, etc. – the cost is merely delayed until later in the page lifecycle. Sooner or later, the JavaScript piper must be paid.
Furthermore, because JavaScript is a dynamic language, and because the prevailingCommonJS module system is also dynamic, it’s fiendishly difficult to extract unused code from the final payload that gets shipped to users. You might only need jQuery’s $.ajax, but by including jQuery, you pay the cost of the entire library.
The JavaScript community has responded to this problem by advocating the use of small modules. Small modules have a lot of aesthetic and practical benefits – easier to maintain, easier to comprehend, easier to plug together – but they also solve the jQuery problem by promoting the inclusion of small bits of functionality rather than big “kitchen sink” libraries.
So in the “small modules” world, instead of doing:
var _ = require('lodash')
_.uniq([1,2,2,3])
You might do:
var uniq = require('lodash.uniq')
uniq([1,2,2,3])
Packages vs modules
It’s important to note that, when I say “modules,” I’m not talking about “packages” in the npm sense. When you install a package from npm, it might only expose a single module in its public API, but under the hood it could actually be a conglomeration of many modules.
For instance, consider a package like is-array. It has no dependencies and only containsone JavaScript file, so it has one module. Simple enough.
Now consider a slightly more complex package like once, which has exactly one dependency:wrappy. Both packages contain one module, so the total module count is 2. So far, so good.
Now let’s consider a more deceptive example: qs. Since it has zero dependencies, you might assume it only has one module. But in fact, it has four!
You can confirm this by using a tool I wrote called browserify-count-modules, which simply counts the total number of modules in a Browserify bundle:
$ npm install qs
$ browserify node_modules/qs | browserify-count-modules
4
This means that a given package can actually contain one or more modules. These modules can also depend on other packages, which might bring in their own packages and modules. The only thing you can be sure of is that each package contains at least one module.
Module bloat
How many modules are in a typical web application? Well, I ran browserify-count-moduleson a few popular Browserify-using sites, and came up with these numbers:
- requirebin.com: 91 modules
- keybase.io: 365 modules
- m.reddit.com: 1050 modules
- Apple.com: 1060 modules (Added. Thanks, Max!)
For the record, my own Pokedex.org (the largest open-source site I’ve built) contains 311 modules across four bundle files.
Ignoring for a moment the raw size of those JavaScript bundles, I think it’s interesting to explore the cost of the number of modules themselves. Sam Saccone has already blown this story wide open in “The cost of transpiling es2015 in 2016”, but I don’t think his findings have gotten nearly enough press, so let’s dig a little deeper.
Benchmark time!
I put together a small benchmark that constructs a JavaScript module importing 100, 1000, and 5000 other modules, each of which merely exports a number. The parent module just sums the numbers together and logs the result:
// index.js
var total = 0
total += require('./module_0')
total += require('./module_1')
total += require('./module_2')
// etc.
console.log(total)
// module_1.js
module.exports = 1
I tested five bundling methods: Browserify, Browserify with the bundle-collapser plugin, Webpack, Rollup, and Closure Compiler. For Rollup and Closure Compiler I used ES6 modules, whereas for Browserify and Webpack I used CommonJS, so as not to unfairly disadvantage them (since they would need a transpiler like Babel, which adds its own overhead).
In order to best simulate a production environment, I used Uglify with the --mangle and--compress settings for all bundles, and served them gzipped over HTTPS using GitHub Pages. For each bundle, I downloaded and executed it 15 times and took the median, noting the (uncached) load time and execution time using performance.now().
Bundle sizes
Before we get into the benchmark results, it’s worth taking a look at the bundle files themselves. Here are the byte sizes (minified but ungzipped) for each bundle (chart view):
100 modules | 1000 modules | 5000 modules | |
---|---|---|---|
browserify | 7982 | 79987 | 419985 |
browserify-collapsed | 5786 | 57991 | 309982 |
webpack | 3954 | 39055 | 203052 |
rollup | 671 | 6971 | 38968 |
closure | 758 | 7958 | 43955 |
100 modules | 1000 modules | 5000 modules | |
---|---|---|---|
browserify | 1649 | 13800 | 64513 |
browserify-collapsed | 1464 | 11903 | 56335 |
webpack | 693 | 5027 | 26363 |
rollup | 300 | 2145 | 11510 |
closure | 302 | 2140 | 11789 |
The way Browserify and Webpack work is by isolating each module into its own function scope, and then declaring a top-level runtime loader that locates the proper module whenever require() is called. Here’s what our Browserify bundle looks like:
(function e(t,n,r){function s(o,u){if(!n[o]){if(!t[o]){var a=typeof require=="function"&&require;if(!u&&a)return a(o,!0);if(i)return i(o,!0);var f=new Error("Cannot find module '"+o+"'");throw f.code="MODULE_NOT_FOUND",f}var l=n[o]={exports:{}};t[o][0].call(l.exports,function(e){var n=t[o][1][e];return s(n?n:e)},l,l.exports,e,t,n,r)}return n[o].exports}var i=typeof require=="function"&&require;for(var o=0;o
Whereas the Rollup and Closure bundles look more like what you might hand-author if you were just writing one big module. Here’s Rollup:
(function () {
'use strict';
var total = 0
total += 0
total += 1
total += 2
// etc.
If you understand the inherent cost of functions-within-functions in JavaScript, and of looking up a value in an associative array, then you’ll be in a good position to understand the following benchmark results.
Results
I ran this benchmark on a Nexus 5 with Android 5.1.1 and Chrome 52 (to represent a low- to mid-range device) as well as an iPod Touch 6th generation running iOS 9 (to represent a high-end device).
Here are the results for the Nexus 5 (tabular results):
And here are the results for the iPod Touch (tabular results):
At 100 modules, the variance between all the bundlers is pretty negligible, but once we get up to 1000 or 5000 modules, the difference becomes severe. The iPod Touch is hurt the least by the choice of bundler, but the Nexus 5, being an aging Android phone, suffers a lot under Browserify and Webpack.
I also find it interesting that both Rollup and Closure’s execution cost is essentially free for the iPod, regardless of the number of modules. And in the case of the Nexus 5, the runtime costs aren’t free, but they’re still much cheaper for Rollup/Closure than for Browserify/Webpack, the latter of which chew up the main thread for several frames if not hundreds of milliseconds, meaning that the UI is frozen just waiting for the module loader to finish running.
Note that both of these tests were run on a fast Gigabit connection, so in terms of network costs, it’s really a best-case scenario. Using the Chrome Dev Tools, we can manually throttle that Nexus 5 down to 3G and see the impact (tabular results):
Once we take slow networks into account, the difference between Browserify/Webpack and Rollup/Closure is even more stark. In the case of 1000 modules (which is close to Reddit’s count of 1050), Browserify takes about 400 milliseconds longer than Rollup. And that 400ms is no small potatoes, since Google and Bing have both noted that sub-second delays have anappreciable impact on user engagement.
One thing to note is that this benchmark doesn’t measure the precise execution cost of 100, 1000, or 5000 modules per se, since that will depend on your usage of require(). Inside of these bundles, I’m calling require() once per module, but if you are calling require()multiple times per module (which is the norm in most codebases) or if you are callingrequire() multiple times on-the-fly (i.e. require() within a sub-function), then you could see severe performance degradations.
Reddit’s mobile site is a good example of this. Even though they have 1050 modules, I clocked their real-world Browserify execution time as much worse than the “1000 modules” benchmark. When profiling on that same Nexus 5 running Chrome, I measured 2.14 seconds for Reddit’s Browserify require() function, and 197 milliseconds for the equivalent function in the “1000 modules” script. (In desktop Chrome on an i7 Surface Book, I also measured it at 559ms vs 37ms, which is pretty astonishing given we’re talking desktop.)
This suggests that it may be worthwhile to run the benchmark again with multiplerequire()s per module, although in my opinion it wouldn’t be a fair fight for Browserify/Webpack, since Rollup/Closure both resolve duplicate ES6 imports into a single hoisted variable declaration, and it’s also impossible to import from anywhere but the top-level scope. So in essence, the cost of a single import for Rollup/Closure is the same as the cost of n imports, whereas for Browserify/Webpack, the execution cost will increase linearly with n require()s.
For the purposes of this analysis, though, I think it’s best to just assume that the number of modules is only a lower bound for the performance hit you might feel. In reality, the “5000 modules” benchmark may be a better yardstick for “5000 require() calls.”
Conclusions
First off, the bundle-collapser plugin seems to be a valuable addition to Browserify. If you’re not using it in production, then your bundle will be a bit larger and slower than it would be otherwise (although I must admit the difference is slight). Alternatively, you could switch to Webpack and get an even faster bundle without any extra configuration. (Note that it pains me to say this, since I’m a diehard Browserify fanboy.)
However, these results clearly show that Webpack and Browserify both underperform compared to Rollup and Closure Compiler, and that the gap widens the more modules you add. Unfortunately I’m not sure Webpack 2 will solve any of these problems, because although they’ll be borrowing some ideas from Rollup, they seem to be more focused on thetree-shaking aspects and not the scope-hoisting aspects. (Update: a better name is “inlining,” and the Webpack team is working on it.)
Given these results, I’m surprised Closure Compiler and Rollup aren’t getting much traction in the JavaScript community. I’m guessing it’s due to the fact that (in the case of the former) it has a Java dependency, and (in the case of the latter) it’s still fairly immature and doesn’t quite work out-of-the-box yet (see Calvin’s Metcalf’s comments for a good summary).
Even without the average JavaScript developer jumping on the Rollup/Closure bandwagon, though, I think npm package authors are already in a good position to help solve this problem. If you npm install lodash, you’ll notice that the main export is one giant JavaScript module, rather than what you might expect given Lodash’s hyper-modular nature (require('lodash/uniq'), require('lodash.uniq'), etc.). For PouchDB, we made a similar decision to use Rollup as a prepublish step, which produces the smallest possible bundle in a way that’s invisible to users.
I also created rollupify to try to make this pattern a bit easier to just drop-in to existing Browserify projects. The basic idea is to use imports and exports within your own project (cjs-to-es6 can help migrate), and then use require() for third-party packages. That way, you still have all the benefits of modularity within your own codebase, while exposing more-or-less one big module to your users. Unfortunately, you still pay the costs for third-party modules, but I’ve found that this is a good compromise given the current state of the npm ecosystem.
So there you have it: one horse-sized JavaScript duck is faster than a hundred duck-sized JavaScript horses. Despite this fact, though, I hope that our community will eventually realize the pickle we’re in – advocating for a “small modules” philosophy that’s good for developers but bad for users – and improve our tools, so that we can have the best of both worlds.
Bonus round! Three desktop browsers
Normally I like to run performance tests on mobile devices, since that’s where you see the clearest differences. But out of curiosity, I also ran this benchmark on Chrome 52, Edge 14, and Firefox 48 on an i7 Surface Book using Windows 10 RS1. Here are the results:
Chrome 52 (tabular results)
Edge 14 (tabular results)
Firefox 48 (tabular results)
The only interesting tidbits I’ll call out in these results are:
- bundle-collapser is definitely not a slam-dunk in all cases.
- The ratio of network-to-execution time is always extremely high for Rollup and Closure; their runtime costs are basically zilch. ChakraCore and SpiderMonkey eat them up for breakfast, and V8 is not far behind.
This latter point could be extremely important if your JavaScript is largely lazy-loaded, because if you can afford to wait on the network, then using Rollup and Closure will have the additional benefit of not clogging up the UI thread, i.e. they’ll introduce less jank than Browserify or Webpack.
Update: in response to this post, JDD has opened an issue on Webpack. There’s also one on Browserify.
Update 2: Ryan Fitzer has generously added RequireJS and RequireJS with Almond to the benchmark, both of which use AMD instead of CommonJS or ES6.
Testing shows that RequireJS has the largest bundle sizes but surprisingly its runtime costs are very close to Rollup and Closure. Here are the results for a Nexus 5 running Chrome 52 throttled to 3G: