A few days ago putting comments in a function slow them down. I decided to put this to the test: I couldn't believe it unless I saw it.
The method I used was a statistical hypothesis test. I'm not here to teach maths (I wouldn't be very good at it anyway), but the process involves gathering raw data, setting up hypotheses about what distribution those data fit, and calculating how likely the data fit that distribution. The higher the likelihood, the stronger the case for the hypothesis that nothing is happening.
Here, I take two JavaScript source files, differing only in the amount of non-code text present, and run them through the JavaScript console that comes with Mozilla's SpiderMonkey.
perf1.js:
function f1(n) { /** * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo * lolololololololololololololololololololololololololo */ var i = n + 1; return i; }
perf2.js:
function f2(n){var i=n+1;return i} for (var i=0;i<10000000;++i)f2(42);
I'm running SpiderMonkey 1.7:
[tung@eee ~/Code/JavaScript]$ js -v JavaScript-C 1.7.0 2007-10-03
Each sample can be obtained using the Unix time utility.
[tung@eee ~/Code/JavaScript]$ time js perf1.js real 0m10.239s user 0m10.143s sys 0m0.010s
real is wall-clock time, user is the CPU time spent on the user-space process, and sys system CPU time (mostly system call overhead). We choose the user time, since it's where the JavaScript process is running. real would include process scheduling overhead, while sys would only measure things like file loading, so we don't use those numbers.
I sampled 10 running times for perf1.js and then 10 for perf2.js:
[tung@eee ~/Code/JavaScript]$ time js perf1.js real 0m10.239s user 0m10.143s sys 0m0.010s [tung@eee ~/Code/JavaScript]$ time js perf1.js real 0m10.456s user 0m10.159s sys 0m0.010s [tung@eee ~/Code/JavaScript]$ time js perf1.js real 0m10.323s user 0m10.289s sys 0m0.007s [tung@eee ~/Code/JavaScript]$ time js perf1.js real 0m10.461s user 0m10.383s sys 0m0.023s [tung@eee ~/Code/JavaScript]$ time js perf1.js real 0m10.481s user 0m10.089s sys 0m0.017s [tung@eee ~/Code/JavaScript]$ time js perf1.js real 0m10.294s user 0m10.289s sys 0m0.003s [tung@eee ~/Code/JavaScript]$ time js perf1.js real 0m10.272s user 0m10.193s sys 0m0.017s [tung@eee ~/Code/JavaScript]$ time js perf1.js real 0m10.307s user 0m10.283s sys 0m0.003s [tung@eee ~/Code/JavaScript]$ time js perf1.js real 0m10.518s user 0m10.199s sys 0m0.023s [tung@eee ~/Code/JavaScript]$ time js perf1.js real 0m10.519s user 0m10.129s sys 0m0.003s
[tung@eee ~/Code/JavaScript]$ time js perf2.js real 0m10.407s user 0m10.319s sys 0m0.013s [tung@eee ~/Code/JavaScript]$ time js perf2.js real 0m10.477s user 0m10.216s sys 0m0.003s [tung@eee ~/Code/JavaScript]$ time js perf2.js real 0m10.398s user 0m10.123s sys 0m0.010s [tung@eee ~/Code/JavaScript]$ time js perf2.js real 0m10.619s user 0m10.209s sys 0m0.010s [tung@eee ~/Code/JavaScript]$ time js perf2.js real 0m10.250s user 0m10.153s sys 0m0.010s [tung@eee ~/Code/JavaScript]$ time js perf2.js real 0m10.572s user 0m10.236s sys 0m0.007s [tung@eee ~/Code/JavaScript]$ time js perf2.js real 0m10.303s user 0m10.296s sys 0m0.003s [tung@eee ~/Code/JavaScript]$ time js perf2.js real 0m10.276s user 0m10.176s sys 0m0.003s [tung@eee ~/Code/JavaScript]$ time js perf2.js real 0m10.476s user 0m10.279s sys 0m0.003s [tung@eee ~/Code/JavaScript]$ time js perf2.js real 0m10.678s user 0m10.169s sys 0m0.007s
perf1.js runtime data without the cruft lines:
user 0m10.143s user 0m10.159s user 0m10.289s user 0m10.383s user 0m10.089s user 0m10.289s user 0m10.193s user 0m10.283s user 0m10.199s user 0m10.129s
perf2.js runtime data without the cruft lines:
user 0m10.319s user 0m10.216s user 0m10.123s user 0m10.209s user 0m10.153s user 0m10.236s user 0m10.296s user 0m10.176s user 0m10.279s user 0m10.169s
The null hypothesis is the opposite of the proposed idea in play. A low probability for the results against this is evidence for the alternate hypothesis. Conversely, a high probability here argues that this null hypothesis holds.
Also, it's hard to type subscripts here, so I just use and underscore to indicate them.
H_0: Comments in a function do not affect runtime performance.
mean(x) = mean(y)
x consists of the samples of perf1's runs, and y consists of the samples of perf2's runs.
H_1: Comments in a function affect runtime performance.
mean(x) != mean(y)
In the t_(n_x + n_y - 2) probability distribution (using a variant of the t-test):
tau = (mean(X) - mean(Y)) / (S_p * sqrt(1/n_x + 1/n_y))
tau: test statistic
X and Y: random variables representing the data sets
S_p: random variable representing the pooled, common standard deviation gotten from the standard deviations of each data set (more how to calculate this below)
n_x and n_y: the number of samples in each data set
We use this because there are two random, independant samples and we want to see how (un)likely it is that they share a common mean.
How to get S_p?
S_p^2 = ((n_x - 1) * S_x^2 + (n_y - 1) * S_y^2) / (n_x + n_y - 2)
S_p is the pooled standard deviation, while S_p^2 is the pooled variance.
n_x and n_y are as above.
S_x^2 and S_y^2 are the variances of each data set. Consult your stats course textbook for how to calculate that from the data.
As above, it's t_(n_x + n_y - 2), i.e. the so-called "Student's t distribution", order n_x + n_y - 2.
A large observed tau argues against the null hypothesis.
First we need the variances S_x^2 and S_y^2:
S_x^2 = 1 / (n - 1) * (sum(x^2 for each x) - 1 / n * (sum(all x))^2)
x^2 for each x (all values are in seconds):
10.143 -> 102.880449 10.159 -> 103.205281 10.289 -> 105.863521 10.383 -> 107.806689 10.089 -> 101.787921 10.289 -> 105.863521 10.193 -> 103.897249 10.283 -> 105.740089 10.199 -> 104.019601 10.129 -> 102.596641 sum = 1043.660962
sum of all x = 102.156
S_x^2 = 1 / (10 - 1) * (1043.660962 - 1 / 10 * 102.156^2)
= 1 / 9 * (1043.660962 - 1 / 10 * 102.156^2)
= 0.008458711
S_y^2 = 1 / (n - 1) * (sum(y^2 for each y) - 1 / n * (sum(all y))^2)
y^2 for each y:
10.319 -> 106.481761 10.216 -> 104.366656 10.123 -> 102.475129 10.209 -> 104.223681 10.153 -> 103.083409 10.236 -> 104.775696 10.296 -> 106.007616 10.176 -> 103.550976 10.279 -> 105.657841 10.169 -> 103.408561 sum = 1044.031326
sum of all y = 102.176
S_y^2 = 1 / (10 - 1) * (1044.031326 - 1 / 10 * 102.176^2)
= 1 / 9 * (1044.031326 - 1 / 10 * 102.176^2)
= 0.004203156
With S_x^2 and S_y^2 we can get the pooled variance S_p^2 and thus the pooled standard deviation S_p:
S_p^2 = ((n_x - 1) * S_x^2 + (n_y - 1) * S_y^2) / (n_x + n_y - 2)
= ((10 - 1) * 0.008458711 + (10 - 1) * 0.004203156) / (10 + 10 - 2)
= (9 * 0.008458711 + 9 * 0.004203156) / 18
= 9 * (0.008458711 + 0.004203156) / 18
= (0.008458711 + 0.004203156) / 2
= 0.006330934
S_p = sqrt(S_p^2)
= sqrt(0.006330934)
Finally, we can get our observed test statistic tau:
tau = (mean(X) - mean(Y)) / (S_p * sqrt(1/n_x + 1/n_y))
= (10.2156 - 10.2176) / (sqrt(0.006330934) * sqrt(1/10 + 1/10))
= (10.2156 - 10.2176) / (sqrt(0.006330934) * sqrt(1/10 + 1/10))
= -0.002 / (sqrt(0.006330934) * sqrt(1/5))
= -0.056205796
We look up the absolute value of tau in the t-test table for order 18. Consult your favourite stats table source for a t distribution table, e.g. a stats text book.
The probability we get is only for one-sided tests, but since this test is two-sided, we need to double whatever we get from it.
tau > 0.25 * 2
> 0.50
That is, getting these measurements given H_0 holds is over 50%.
The chance of getting these kinds of samples, given that comments do not affect runtime performance, is very high: over 50%.
Conversely, the chance that comments in a JavaScript function do affect runtime performance is less than 50%. How much so is unknown, since the table in my textbook has a lower bound for tau lookups at 0.688 for p = 0.25, a value still not small enough to handle the calculated tau in this hypothesis test.
Correlation is not causation, and this test doesn't "prove" anything. Statistical methods only find likelihoods. To prove if comments have an effect on the runtime performance of functions in JavaScript, one could:
I should have used a spreadsheet for some of the longer calculations. You live you learn.
I could have scattered the comments through the source just to be sure, but I doubt the outcome would be much different.
This would all look a thousand times better in LaTeX. LaTeX is awesome and gets awesome results, but I wanted to keep this accessible.
I don't normally like maths, and I hadn't touched statistics in ages, but, dare I say it, I actually enjoyed doing this. Maybe this is what modern mathematics curriculums are missing: a reason for doing it!
Math majors may notice that the hypothesis test is a bit loose: shouldn't I have tested whether removing comments was faster, not merely different? I modelled it after a question in my stats textbook so I wouldn't screw it up. The numbers may have been different, but the outcome would have been the same, since both sets of hypotheses share their null hypothesis: that nothing is happening.
And there is quite a high probability that nothing is happening.