[translated] Joy of Programming--Fail Fast

This commit is contained in:
Frank Zhang 2014-07-18 09:35:58 +08:00
parent f0e5653c24
commit 478db403a1
2 changed files with 58 additions and 59 deletions

View File

@ -1,59 +0,0 @@
zpl1025
Joy of Programming: Fail Fast!
================================================================================
![](http://www.opensourceforu.com/wp-content/uploads/2011/12/fail-350x262.jpg)
> When a problem occurs in the software, it should fail immediately, in an easily noticeable way. This “fail fast” behaviour is desirable, and well discuss this important concept in this column.
At first, a “fail fast” might appear to be a bad practice affecting reliability — why should a system crash (or fail), when it can continue execution? For this, we need to understand that fail fast is very relevant in the context of Heisenbugs.
Consider Bohrbugs, which always crash for a given input, for example, with a null-pointer access. These bugs are easier to test, reproduce and fix. Now, all experienced programmers would have faced situations where the bug that caused the crash just disappears when the software is restarted. No matter how much time and effort is spent to reproduce the problem, the bug eludes us. These bugs are known as Heisenbugs.
The effort required to find, fix and test Heisenbugs is an order of magnitude more than the effort required for Bohrbugs. One strategy to avoid Heisenbugs is to turn them into Bohrbugs. How? By anticipating the possible cases in which Heisenbugs can arise, and trying to make them Bohrbugs. Yes, it is not easy, and it is also not always possible, but let us look at a specific example where it is useful.
Concurrent programming is one paradigm where Heisenbugs are common. Our example is a concurrency-related issue in Java. While iterating over a Java collection, we are supposed to modify the collection only through the Iterator methods, such as the remove() method. During iteration, if another thread attempts to modify that underlying collection (because of a programming mistake), the underlying collection will get corrupted (i.e., result in an incorrect state).
Such an incorrect state can lead to an eventual failure — or if we are fortunate (actually, unfortunate!), the program continues execution without crashing, but gives the wrong results. It is difficult to reproduce and fix these bugs, because such programming mistakes are non-deterministic. In other words, it is a Heisenbug.
Fortunately, the Java Iterators try to detect such concurrent modifications, and if found, will throw a `ConcurrentModificationException`, instead of failing late — and that too, silently. In other words, the Java Iterators follow the “fail fast” approach.
What if a `ConcurrentModificationException` is observed in production software? As the Javadoc for this exception observes, it “…should be used only to detect bugs.” In other words, `ConcurrentModificationExceptions` are supposed to be found and fixed during software development, and should not leak to production code.
Well, if production software does get this exception, it is certainly a bug in the software, and should be reported to the developer and fixed. At least, we know that there was an attempt for concurrent modification of the underlying data structure, and thats why the software failed (instead of getting wrong results from the software, or failing later with some other symptoms, for which it is not feasible to trace the root cause).
The “fail-safe” approach is meant for developing robust code. A very good example of writing fail-safe code is using assertions. Unfortunately, there is a lot of unnecessary controversy surrounding the use of asserts. The main criticism is this: the checks are enabled in the development version, and disabled in release versions.
However, this criticism is wrong: asserts are never meant to replace the defensive checks that should be put in place in the release version of the software. For example, asserts should not be used to check if the argument passed to a function is null or not. Instead, an if condition should be used to check if the argument is passed correctly, or else an exception, or a premature return, should be performed, as appropriate to the context. However, asserts can be used to do additional checks for assumptions that are made in the code, which are supposed to hold true. For example, a condition that checks that the stack is not empty after a push operation is performed on it (i.e., checking for “invariants”).
So, fail fast, be assertive, and youre on the way to developing more robust code.
--------------------------------------------------------------------------------
via:http://www.opensourceforu.com/2011/12/joy-of-programming-fail-fast/
译者:[译者ID](https://github.com/译者ID) 校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](http://linux.cn/) 荣誉推出
[1]:
[2]:
[3]:
[4]:
[5]:
[6]:
[7]:
[8]:
[9]:
[10]:
[11]:
[12]:
[13]:
[14]:
[15]:
[16]:
[17]:
[18]:
[19]:
[20]:

View File

@ -0,0 +1,58 @@
编程的乐趣:快速出错!
================================================================================
![](http://www.opensourceforu.com/wp-content/uploads/2011/12/fail-350x262.jpg)
> 当软件出现问题的时候,它应该以一种能引起注意的方式马上终止。这种“快速出错”的方式值得借鉴,我们会在这期专栏里谈谈这个重要的概念。
一开始“快速出错”看上去是一种会影响可靠性的不好的实践为什么一个系统在还可以继续运行的时候要崩溃或者说终止对于这个我们需要理解快速出错是和Heisenbugs对于不能复现bug的一种称呼紧密联系在一起的。
考虑一下Bohrbugs对于能够重现的bug的一种称呼它们在给定输入的时候总是会出现比如访问空指针。这类问题很容易测试复现并修复。如今所有有经验的程序员应该都面对过这样的情形导致崩溃的bug在重启软件后不再出现了。不管花多少时间或努力去重现问题那个bug就是跟我们捉迷藏。这种bug被称为Heisenbugs。
花在寻找修复和测试Heisenbugs上的努力比起Bohrbugs来说要高出一个数量级。一种避免Heisenbugs的策略是将它们转化为Bohrbugs。怎么做呢预测可能导致Heisenbugs的因素然后尝试将它们变成Bohrbugs。是的这并不简单而且也并不是一定就能成功但是让我们来看一个能产生效果的特殊例子。
并发编程是Heisenbugs经常出现的一个典范。我们的例子就是一个Java里和并发相关的问题。在遍历一个Java集合的时候一般要求只能通过Iterator的方法比如remove()方法。而当遍历的时候,如果有另一个线程尝试修改底层集合(因为编程时留下的错误),那么底层集合就可能会被破坏(例如,导致不正确的状态)。
类似这种不正确的状态会导致不确定的错误假如我们幸运的话实际上这很不幸程序可以继续执行而不会崩溃但是却给出错误的结果。这种bug很难重现和修复因为这一类的程序错误都是不确定的。换句话说这是个Heisenbug。
幸运的是Java Iterators会尝试侦测这种并发修改在发现了以后会丢出异常`ConcurrentModificationException`而不是等到最后再出错那样也是没有任何迹象的。换句话说Java Iterators也遵从了“快速出错”的方法。
如果异常`ConcurrentModificationException`在正式软件中发生了呢根据在Javadoc里对这个异常的说明它“只应该用于侦测bug”。换句话说`ConcurrentModificationException`只应该在开发阶段监听和修复,而不应该泄漏到正式代码中。
好吧如果正式软件确实发生了这个异常那它当然是软件中的bug应当报告给开发者并修复。至少我们能够知道发生了一次底层数据结构的并发修改而这是软件出错的原因而不是让软件产生错误的结果或是以其他现象延后出错这样就很难跟踪到根本原因
“安全出错”的方法意味着开发健壮的代码。一个很好的编写安全出错代码的例子就是使用断言。很可惜的是,关于断言的使用有大量不必要的公开争论。其中主要的批评点是:它在开发版本中使用,而在发布版中却被关掉的。
不管怎么样这个批评是错误的从来没有说用断言来替代应该放到发布版软件中的防御式检查代码。例如断言不应该用来检查传递给函数的参数是否为空。相应的应该用一个if语句来检查这个参数是否正确否则的话抛出异常或是提前返回来适合上下文。然而断言一般用于额外检查代码中所做出的假设它们应该为真才正常。例如用一个语句来检查在进行了入栈操作后栈应该不是空的例如对“不变量”的检查
所以,快速出错,随时中断,那么你已经走在开发更加健壮代码的道路上了。
--------------------------------------------------------------------------------
via:http://www.opensourceforu.com/2011/12/joy-of-programming-fail-fast/
译者:[zpl1025](https://github.com/zpl1025) 校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](http://linux.cn/) 荣誉推出
[1]:
[2]:
[3]:
[4]:
[5]:
[6]:
[7]:
[8]:
[9]:
[10]:
[11]:
[12]:
[13]:
[14]:
[15]:
[16]:
[17]:
[18]:
[19]:
[20]: