mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-01-13 22:30:37 +08:00
Merge pull request #14501 from lujun9972/add-MjAxOTA2MzAgQ29tcHV0YXRpb25hbCBQaG90b2dyYXBoeS5tZAo=
三篇选题
This commit is contained in:
commit
f3ef43ebf1
100
sources/talk/20190630 Data Still Dominates.md
Normal file
100
sources/talk/20190630 Data Still Dominates.md
Normal file
@ -0,0 +1,100 @@
|
||||
[#]: collector: (lujun9972)
|
||||
[#]: translator: ( )
|
||||
[#]: reviewer: ( )
|
||||
[#]: publisher: ( )
|
||||
[#]: url: ( )
|
||||
[#]: subject: (Data Still Dominates)
|
||||
[#]: via: (https://theartofmachinery.com/2019/06/30/data_still_dominates.html)
|
||||
[#]: author: (Simon Arneaud https://theartofmachinery.com)
|
||||
|
||||
Data Still Dominates
|
||||
======
|
||||
|
||||
Here’s [a quote from Linus Torvalds in 2006][1]:
|
||||
|
||||
> I’m a huge proponent of designing your code around the data, rather than the other way around, and I think it’s one of the reasons git has been fairly successful… I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships.
|
||||
|
||||
Which sounds a lot like [Eric Raymond’s “Rule of Representation” from 2003][2]:
|
||||
|
||||
> Fold knowledge into data, so program logic can be stupid and robust.
|
||||
|
||||
Which was just his summary of ideas like [this one from Rob Pike in 1989][3]:
|
||||
|
||||
> Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.
|
||||
|
||||
Which cites [Fred Brooks from 1975][4]:
|
||||
|
||||
> ### Representation is the Essence of Programming
|
||||
>
|
||||
> Beyond craftmanship lies invention, and it is here that lean, spare, fast programs are born. Almost always these are the result of strategic breakthrough rather than tactical cleverness. Sometimes the strategic breakthrough will be a new algorithm, such as the Cooley-Tukey Fast Fourier Transform or the substitution of an n log n sort for an n2 set of comparisons.
|
||||
>
|
||||
> Much more often, strategic breakthrough will come from redoing the representation of the data or tables. This is where the heart of your program lies. Show me your flowcharts and conceal your tables, and I shall be continued to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.
|
||||
|
||||
So, smart people have been saying this again and again for nearly half a century: focus on the data first. But sometimes it feels like the most famous piece of smart programming advice that everyone forgets.
|
||||
|
||||
Let me give some real examples.
|
||||
|
||||
### The Highly Scalable System that Couldn’t
|
||||
|
||||
This system was designed from the start to handle CPU-intensive loads with incredible scalability. Nothing was synchronous. Everything was done with callbacks, task queues and worker pools.
|
||||
|
||||
But there were two problems: The first was that the “CPU-intensive load” turned out not to be that CPU-intensive after all — a single task took a few milliseconds at worst. So most of the architecture was doing more harm than good. The second problem was that although it sounded like a highly scalable distributed system, it wasn’t one — it only ran on one machine. Why? Because all communication between asynchronous components was done using files on the local filesystem, which was now the bottleneck for any scaling. The original design didn’t say much about data at all, except to advocate local files in the name of “simplicity”. Most of the document was about all the extra architecture that was “obviously” needed to handle the “CPU-intensiveness” of the load.
|
||||
|
||||
### The Service-Oriented Architecture that was Still Data-Oriented
|
||||
|
||||
This system followed a microservices design, made up of single-purpose apps with REST-style APIs. One component was a database that stored documents (basically responses to standard forms, and other electronic paperwork). Naturally it exposed an API for basic storage and retrieval, but pretty quickly there was a need for more complex search functionality. The designers felt that adding this search functionality to the existing document API would have gone against the principles of microservices design. They could talk about “search” as being a different kind of service from “get/put”, so their architecture shouldn’t couple them together. Besides, the tool they were planning to use for search indexing was separate from the database itself, so creating a new service made sense for implementation, too.
|
||||
|
||||
In the end, a search API was created containing a search index that was essentially a duplicate of the data in the main database. This data was being updated dynamically, so any component that mutated document data through the main database API had to also update the search API. It’s impossible to do this with REST APIs without race conditions, so the two sets of data kept going out of sync every now and then, anyway.
|
||||
|
||||
Despite what the architecture diagram promised, the two APIs were tightly coupled through their data dependencies. Later on it was recognised that the search index should be an implementation detail of a unified document service, and this made the system much more maintainable. “Do one thing” works at the data level, not the verb level.
|
||||
|
||||
### The Fantastically Modular and Configurable Ball of Mud
|
||||
|
||||
This system was a kind of automated deployment pipeline. The original designers wanted to make a tool that was flexible enough to solve deployment problems across the company. It was written as a set of pluggable components, with a configuration file system that not only configured the components, but acted as a [DSL][5] for programming how the components fitted into the pipeline.
|
||||
|
||||
Fast forward a few years and it’s turned into “that program”. There was a long list of known bugs that no one was ever fixing. No one wanted to touch the code out of fear of breaking things. No one used any of the flexibility of the DSL. Everyone who used the program copy-pasted the same known-working configuration that everyone else used.
|
||||
|
||||
What had gone wrong? Although the original design document used words like “modular”, “decoupled”, “extensible” and “configurable” a lot, it never said anything about data. So, data dependencies between components ended up being handled in an ad-hoc way using a globally shared blob of JSON. Over time, components made more and more undocumented assumptions about what was in or not in the JSON blob. Sure, the DSL allowed rearranging components into any order, but most configurations didn’t work.
|
||||
|
||||
### Lessons
|
||||
|
||||
I chose these three examples because they’re easy to explain, not to pick on others. I once tried to build a website, and failed trying to instead build some cringe-worthy XML database that didn’t even solve the data problems I had. Then there’s the project that turned into a broken mockery of half the functionality of `make`, again because I didn’t think about what I really needed. I wrote a post before based on a time I wrote [a castle-in-the-sky OOP class hierarchy that should have been encoded in data instead][6].
|
||||
|
||||
Update:
|
||||
|
||||
Apparently many people still thought I wrote this to make fun of others. People who’ve actually worked with me will know I’m much more interested in the things I’m fixing than in blaming the people who did most of the work building them, but, okay, here’s what I think of the engineers involved.
|
||||
|
||||
Honestly, the first example obviously happened because the designer was more interested in bringing a science project to work than in solving the problem at hand. Most of us have done that (mea culpa), but it’s really annoying to our colleagues who’ll probably have to help maintain them when we’re bored of them. If this sounds like you, please don’t get offended; please just stop. (I’d still rather work on the single-node distributed system than anything built around my “XML database”.)
|
||||
|
||||
There’s nothing personal in the second example. Sometimes it feels like everyone is talking about how wonderful it is to split up services, but no one is talking about exactly when not to. People are learning the hard way all the time.
|
||||
|
||||
The third example was actually from some of the smartest people I’ve ever had the chance to work with.
|
||||
|
||||
(End update.)
|
||||
|
||||
“Does this talk about the problems created by data?” turns out to be a pretty useful litmus test for good systems design. It’s also pretty handy for detecting false expert advice. The hard, messy systems design problems are data problems, so false experts love to ignore them. They’ll show you a wonderfully beautiful architecture, but without talking about what kind of data it’s appropriate for, and (crucially) what kind of data it isn’t.
|
||||
|
||||
For example, a false expert might tell you that you should use a pub/sub system because pub/sub systems are loosely coupled, and loosely coupled components are more maintainable. That sounds nice and results in pretty diagrams, but it’s backwards thinking. Pub/sub doesn’t _make_ your components loosely coupled; pub/sub _is_ loosely coupled, which may or may not match your data needs.
|
||||
|
||||
On the flip side, a well-designed data-oriented architecture goes a long way. Functional programming, service meshes, RPCs, design patterns, event loops, whatever, all have their merits, but personally I’ve seen tools like [boring old databases][7] be responsible for a lot more successfully shipped software.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://theartofmachinery.com/2019/06/30/data_still_dominates.html
|
||||
|
||||
作者:[Simon Arneaud][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://theartofmachinery.com
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://lwn.net/Articles/193245/
|
||||
[2]: http://www.catb.org/~esr/writings/taoup/html/ch01s06.html
|
||||
[3]: http://doc.cat-v.org/bell_labs/pikestyle
|
||||
[4]: https://archive.org/stream/mythicalmanmonth00fred/mythicalmanmonth00fred_djvu.txt
|
||||
[5]: https://martinfowler.com/books/dsl.html
|
||||
[6]: https://theartofmachinery.com/2016/06/21/code_vs_data.html
|
||||
[7]: https://theartofmachinery.com/2017/10/28/rdbs_considered_useful.html
|
178
sources/tech/20190625 The cost of JavaScript in 2019 - V8.md
Normal file
178
sources/tech/20190625 The cost of JavaScript in 2019 - V8.md
Normal file
@ -0,0 +1,178 @@
|
||||
[#]: collector: (lujun9972)
|
||||
[#]: translator: ( )
|
||||
[#]: reviewer: ( )
|
||||
[#]: publisher: ( )
|
||||
[#]: url: ( )
|
||||
[#]: subject: (The cost of JavaScript in 2019 · V8)
|
||||
[#]: via: (https://v8.dev/blog/cost-of-javascript-2019)
|
||||
[#]: author: (Addy Osmani https://twitter.com/addyosmani)
|
||||
|
||||
The cost of JavaScript in 2019 · V8
|
||||
======
|
||||
**Note:** If you prefer watching a presentation over reading articles, then enjoy the video below! If not, skip the video and read on.
|
||||
|
||||
[“The cost of JavaScript”][1] as presented by Addy Osmani at #PerfMatters Conference 2019.
|
||||
|
||||
One large change to [the cost of JavaScript][2] over the last few years has been an improvement in how fast browsers can parse and compile script. **In 2019, the dominant costs of processing scripts are now download and CPU execution time.**
|
||||
|
||||
User interaction can be delayed if the browser’s main thread is busy executing JavaScript, so optimizing bottlenecks with script execution time and network can be impactful.
|
||||
|
||||
### Actionable high-level guidance #
|
||||
|
||||
What does this mean for web developers? Parse & compile costs are **no longer as slow** as we once thought. The three things to focus on for JavaScript bundles are:
|
||||
|
||||
* **Improve download time**
|
||||
* Keep your JavaScript bundles small, especially for mobile devices. Small bundles improve download speeds, lower memory usage, and reduce CPU costs.
|
||||
* Avoid having just a single large bundle; if a bundle exceeds ~50–100 kB, split it up into separate smaller bundles. (With HTTP/2 multiplexing, multiple request and response messages can be in flight at the same time, reducing the overhead of additional requests.)
|
||||
* On mobile you’ll want to ship much less especially because of network speeds, but also to keep plain memory usage low.
|
||||
* **Improve execution time**
|
||||
* Avoid [Long Tasks][3] that can keep the main thread busy and can push out how soon pages are interactive. Post-download, script execution time is now a dominant cost.
|
||||
* **Avoid large inline scripts** (as they’re still parsed and compiled on the main thread). A good rule of thumb is: if the script is over 1 kB, avoid inlining it (also because 1 kB is when [code caching][4] kicks in for external scripts).
|
||||
|
||||
|
||||
|
||||
### Why does download and execution time matter? #
|
||||
|
||||
Why is it important to optimize download and execution times? Download times are critical for low-end networks. Despite the growth in 4G (and even 5G) across the world, our [effective connection types][5] remain inconsistent with many of us running into speeds that feel like 3G (or worse) when we’re on the go.
|
||||
|
||||
JavaScript execution time is important for phones with slow CPUs. Due to differences in CPU, GPU, and thermal throttling, there are huge disparities between the performance of high-end and low-end phones. This matters for the performance of JavaScript, as execution is CPU-bound.
|
||||
|
||||
In fact, of the total time a page spends loading in a browser like Chrome, anywhere up to 30% of that time can be spent in JavaScript execution. Below is a page load from a site with a pretty typical workload (Reddit.com) on a high-end desktop machine:![][6]JavaScript processing represents 10–30% of time spent in V8 during page load.
|
||||
|
||||
On mobile, it takes 3–4× longer for a median phone (Moto G4) to execute Reddit’s JavaScript compared to a high-end device (Pixel 3), and over 6× as long on a low-end device (the <$100 Alcatel 1X):![][7]The cost of Reddit’s JavaScript across a few different device classes (low-end, average, and high-end)
|
||||
|
||||
**Note:** Reddit has different experiences for desktop and mobile web, and so the MacBook Pro results cannot be compared to the other results.
|
||||
|
||||
When you’re trying to optimize JavaScript execution time, keep an eye out for [Long Tasks][8] that might be monopolizing the UI thread for long periods of time. These can block critical tasks from executing even if the page looks visually ready. Break these up into smaller tasks. By splitting up your code and prioritizing the order in which it is loaded, you can get pages interactive faster and hopefully have lower input latency.![][9]Long tasks monopolize the main thread. You should break them up.
|
||||
|
||||
### What has V8 done to improve parse/compile? #
|
||||
|
||||
Raw JavaScript parsing speed in V8 has increased 2× since Chrome 60. At the same time, raw parse (and compile) cost has become less visible/important due to other optimization work in Chrome that parallelizes it.
|
||||
|
||||
V8 has reduced the amount of parsing and compilation work on the main thread by an average of 40% (e.g. 46% on Facebook, 62% on Pinterest) with the highest improvement being 81% (YouTube), by parsing and compiling on a worker thread. This is in addition to the existing off-main-thread streaming parse/compile.![][10]V8 parse times across different versions
|
||||
|
||||
We can also visualize the CPU time impact of these changes across different versions of V8 across Chrome releases. In the same amount of time it took Chrome 61 to parse Facebook’s JS, Chrome 75 can now parse both Facebook’s JS AND 6 times Twitter’s JS.![][11]In the time it took Chrome 61 to parse Facebook’s JS, Chrome 75 can now parse both Facebook’s JS and 6 times Twitter’s JS.
|
||||
|
||||
Let’s dive into how these changes were unlocked. In short, script resources can be streaming-parsed and-compiled on a worker thread, meaning:
|
||||
|
||||
* V8 can parse+compile JavaScript without blocking the main thread.
|
||||
* Streaming starts once the full HTML parser encounters a `<script>` tag. For parser-blocking scripts, the HTML parser yields, while for async scripts it continues.
|
||||
* For most real-world connection speeds, V8 parses faster than download, so V8 is done parsing+compiling a few milliseconds after the last script bytes are downloaded.
|
||||
|
||||
|
||||
|
||||
The not-so-short explanation is… Much older versions of Chrome would download a script in full before beginning to parse it, which is a straightforward approach but it doesn’t fully utilize the CPU. Between versions 41 and 68, Chrome started parsing async and deferred scripts on a separate thread as soon as the download begins.![][12]Scripts arrive in multiple chunks. V8 starts streaming once it’s seen at least 30 kB.
|
||||
|
||||
In Chrome 71, we moved to a task-based setup where the scheduler could parse multiple async/deferred scripts at once. The impact of this change was a ~20% reduction in main thread parse time, yielding an overall ~2% improvement in TTI/FID as measured on real-world websites.![][13]Chrome 71 moved to a task-based setup where the scheduler could parse multiple async/deferred scripts at once.
|
||||
|
||||
In Chrome 72, we switched to using streaming as the main way to parse: now also regular synchronous scripts are parsed that way (not inline scripts though). We also stopped canceling task-based parsing if the main thread needs it, since that just unnecessarily duplicates any work already done.
|
||||
|
||||
[Previous versions of Chrome][14] supported streaming parsing and compilation where the script source data coming in from the network had to make its way to Chrome’s main thread before it would be forwarded to the streamer.
|
||||
|
||||
This often resulted in the streaming parser waiting for data that arrived from the network already, but had not yet been forwarded to the streaming task as it was blocked by other work on the main thread (like HTML parsing, layout, or JavaScript execution).
|
||||
|
||||
We are now experimenting with starting parsing on preload, and the main-thread-bounce was a blocker for this beforehand.
|
||||
|
||||
Leszek Swirski’s BlinkOn presentation goes into more detail:
|
||||
|
||||
[“Parsing JavaScript in zero* time”][15] as presented by Leszek Swirski at BlinkOn 10.
|
||||
|
||||
In addition to the above, there was [an issue in DevTools][16] that rendered the entire parser task in a way that hints that it’s using CPU (full block). However, the parser blocks whenever it’s starved for data (that needs to go over the main thread). Since we moved from a single streamer thread to streaming tasks, this became really obvious. Here’s what you’d use to see in Chrome 69:![][17]The DevTools issue that rendered the entire parser task in a way that hints that it’s using CPU (full block)
|
||||
|
||||
The “parse script” task is shown to take 1.08 seconds. However, parsing JavaScript isn’t really that slow! Most of that time is spent doing nothing except waiting for data to go over the main thread.
|
||||
|
||||
Chrome 76 paints a different picture:![][18]In Chrome 76, parsing is broken up into multiple smaller streaming tasks.
|
||||
|
||||
In general, the DevTools performance pane is great for getting a high-level overview of what’s happening on your page. For detailed V8-specific metrics such as JavaScript parse and compile times, we recommend [using Chrome Tracing with Runtime Call Stats (RCS)][19]. In RCS results, `Parse-Background` and `Compile-Background` tell you how much time was spent parsing and compiling JavaScript off the main thread, whereas `Parse` and `Compile` captures the main thread metrics.![][20]
|
||||
|
||||
### What is the real-world impact of these changes? #
|
||||
|
||||
Let’s look at some examples of real-world sites and how script streaming applies.![][21]Main thread vs. worker thread time spent parsing and compiling Reddit’s JS on a MacBook Pro
|
||||
|
||||
Reddit.com has several 100 kB+ bundles which are wrapped in outer functions causing lots of [lazy compilation][22] on the main thread. In the above chart, the main thread time is all that really matters because keeping the main thread busy can delay interactivity. Reddit spends most of its time on the main thread with minimum usage of the Worker/Background thread.
|
||||
|
||||
They’d benefit from splitting up some of their larger bundles into smaller ones (e.g 50 kB each) without the wrapping to maximize parallelization — so that each bundle could be streaming-parsed + compiled separately and reduce main thread parse/compile during start-up.![][23]Main thread vs. worker thread time spent parsing and compiling Facebook’s JS on a MacBook Pro
|
||||
|
||||
We can also look at a site like Facebook.com. Facebook loads ~6MB of compressed JS across ~292 requests, some of it async, some preloaded, and some fetched with a lower priority. A lot of their scripts are very small and granular — this can help with overall parallelization on the Background/Worker thread as these smaller scripts can be streaming-parsed/compiled at the same time.
|
||||
|
||||
Note, you’re probably not Facebook and likely don’t have a long-lived app like Facebook or Gmail where this much script may be justifiable on desktop. However, in general, keep your bundles coarse and only load what you need.
|
||||
|
||||
Although most JavaScript parsing and compilation work can happen in a streaming fashion on a background thread, some work still has to happen on the main thread. When the main thread is busy, the page can’t respond to user input. Do keep an eye on the impact both downloading and executing code has on your UX.
|
||||
|
||||
**Note:** Currently, not all JavaScript engines and browsers implement script streaming as a loading optimization. We still believe the overall guidance here leads to good user experiences across the board.
|
||||
|
||||
### The cost of parsing JSON #
|
||||
|
||||
Because the JSON grammar is much simpler than JavaScript’s grammar, JSON can be parsed more efficiently than JavaScript. This knowledge can be applied to improve start-up performance for web apps that ship large JSON-like configuration object literals (such as inline Redux stores). Instead of inlining the data as a JavaScript object literal, like so:
|
||||
|
||||
```
|
||||
const data = { foo: 42, bar: 1337 };
|
||||
```
|
||||
|
||||
…it can be represented in JSON-stringified form, and then JSON-parsed at runtime:
|
||||
|
||||
```
|
||||
const data = JSON.parse('{"foo":42,"bar":1337}');
|
||||
```
|
||||
|
||||
As long as the JSON string is only evaluated once, the `JSON.parse` approach is much faster compared to the JavaScript object literal, especially for cold loads. A good rule of thumb is to apply this technique for objects of 10 kB or larger — but as always with performance advice, measure the actual impact before making any changes.
|
||||
|
||||
There’s an additional risk when using plain object literals for large amounts of data: they could be parsed twice!
|
||||
|
||||
1. The first pass happens when the literal gets preparsed.
|
||||
2. The second pass happens when the literal gets lazy-parsed.
|
||||
|
||||
|
||||
|
||||
The first pass can’t be avoided. Luckily, the second pass can be avoided by placing the object literal at the top-level, or within a [PIFE][24].
|
||||
|
||||
### What about parse/compile on repeat visits? #
|
||||
|
||||
V8’s (byte)code-caching optimization can help. When a script is first requested, Chrome downloads it and gives it to V8 to compile. It also stores the file in the browser’s on-disk cache. When the JS file is requested a second time, Chrome takes the file from the browser cache and once again gives it to V8 to compile. This time, however, the compiled code is serialized, and is attached to the cached script file as metadata.![][25]Visualization of how code caching works in V8
|
||||
|
||||
The third time, Chrome takes both the file and the file’s metadata from the cache, and hands both to V8. V8 deserializes the metadata and can skip compilation. Code caching kicks in if the first two visits happen within 72 hours. Chrome also has eager code caching if a service worker is used to cache scripts. You can read more about code caching in [code caching for web developers][4].
|
||||
|
||||
Download and execution time are the primary bottlenecks for loading scripts in 2019. Aim for a small bundle of synchronous (inline) scripts for your above-the-fold content with one or more deferred scripts for the rest of the page. Break down your large bundles so you focus on only shipping code the user needs when they need it. This maximizes parallelization in V8.
|
||||
|
||||
On mobile, you’ll want to ship a lot less script because of network, memory consumption and execution time for slower CPUs. Balance latency with cacheability to maximize the amount of parsing and compilation work that can happen off the main thread.
|
||||
|
||||
### Further reading #
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://v8.dev/blog/cost-of-javascript-2019
|
||||
|
||||
作者:[Addy Osmani][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://twitter.com/addyosmani
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://www.youtube.com/watch?v=X9eRLElSW1c
|
||||
[2]: https://medium.com/@addyosmani/the-cost-of-javascript-in-2018-7d8950fbb5d4
|
||||
[3]: https://w3c.github.io/longtasks/
|
||||
[4]: https://v8.dev/blog/code-caching-for-devs
|
||||
[5]: https://developer.mozilla.org/en-US/docs/Web/API/NetworkInformation/effectiveType
|
||||
[6]: https://v8.dev/_img/cost-of-javascript-2019/reddit-js-processing.svg
|
||||
[7]: https://v8.dev/_img/cost-of-javascript-2019/reddit-js-processing-devices.svg
|
||||
[8]: https://web.dev/long-tasks-devtools/
|
||||
[9]: https://v8.dev/_img/cost-of-javascript-2019/long-tasks.png
|
||||
[10]: https://v8.dev/_img/cost-of-javascript-2019/chrome-js-parse-times.svg
|
||||
[11]: https://v8.dev/_img/cost-of-javascript-2019/js-parse-times-websites.svg
|
||||
[12]: https://v8.dev/_img/cost-of-javascript-2019/script-streaming-1.svg
|
||||
[13]: https://v8.dev/_img/cost-of-javascript-2019/script-streaming-2.svg
|
||||
[14]: https://v8.dev/blog/v8-release-75#script-streaming-directly-from-network
|
||||
[15]: https://www.youtube.com/watch?v=D1UJgiG4_NI
|
||||
[16]: https://bugs.chromium.org/p/chromium/issues/detail?id=939275
|
||||
[17]: https://v8.dev/_img/cost-of-javascript-2019/devtools-69.png
|
||||
[18]: https://v8.dev/_img/cost-of-javascript-2019/devtools-76.png
|
||||
[19]: https://v8.dev/docs/rcs
|
||||
[20]: https://v8.dev/_img/cost-of-javascript-2019/rcs.png
|
||||
[21]: https://v8.dev/_img/cost-of-javascript-2019/reddit-main-thread.svg
|
||||
[22]: https://v8.dev/blog/preparser
|
||||
[23]: https://v8.dev/_img/cost-of-javascript-2019/facebook-main-thread.svg
|
||||
[24]: https://v8.dev/blog/preparser#pife
|
||||
[25]: https://v8.dev/_img/cost-of-javascript-2019/code-caching.png
|
539
sources/tech/20190630 Computational Photography.md
Normal file
539
sources/tech/20190630 Computational Photography.md
Normal file
@ -0,0 +1,539 @@
|
||||
[#]: collector: (lujun9972)
|
||||
[#]: translator: ( )
|
||||
[#]: reviewer: ( )
|
||||
[#]: publisher: ( )
|
||||
[#]: url: ( )
|
||||
[#]: subject: (Computational Photography)
|
||||
[#]: via: (https://vas3k.com/blog/computational_photography/)
|
||||
[#]: author: (vas3k https://vas3k.com/)
|
||||
|
||||
Computational Photography
|
||||
======
|
||||
![](https://i.vas3k.ru/full/853.png)
|
||||
|
||||
It's impossible to imagine a smartphone presentation today without dancing around its camera. Google makes Pixel shoot in the dark, Huawei zooms like a telescope, Samsung puts lidars inside, and Apple presents the new world's roundest corners. Illegal level of innovations happening here.
|
||||
|
||||
DSLRs, on the other hand, seems half dead. Sony showers everybody with a new sensor-megapixel rain every year, while manufacturers lazily update the minor version number and keep lying on piles of cash from movie makers. I have a $3000 Nikon on my desk, but I take an iPhone on my travels. Why?
|
||||
|
||||
I went online with this question. There, I saw a lot of debates about "algorithms" and "neural networks", though no one could explain how exactly they affect a photo. Journalists are loudly reading the number of megapixels from press releases, bloggers are shitting down the Internet with more unboxings, and the camera-nerds are overflowing it with "sensual perception of the sensor color palette". Ah, Internet. You gave us access to all the information. Love you.
|
||||
|
||||
Thus, I spent half of my life to understand the whole thing on my own. I'll try to explain everything I found in this article, otherwise I'll forget it in a month.
|
||||
|
||||
[📔 Download pdf, epub, mobi
|
||||
convenient for reading offline][1] [❤️ Support me][2]
|
||||
|
||||
This article in other languages: [Russian][3]
|
||||
|
||||
### What is Computational Photography?
|
||||
|
||||
Everywhere, including [wikipedia][4], you get a definition like this: computational photography is a digital image capture and processing techniques that use digital computation instead of optical processes. Everything is fine with it except that's bullshit. It includes even an autofocus, but not plenoptic, which has already brought a lot of good stuff to us. The fuzziness of the official definitions kinda indicates that we still have no idea what are we doing.
|
||||
|
||||
Stanford Professor and pioneer of computational photography Marc Levoy (he's also in charge of Google Pixel's camera now) [gives][5] another definition - computational imaging techniques that enhance or extend the capabilities of digital photography in which the output is an ordinary photograph, but one that could not have been taken by a traditional camera. I like it more, and in the article, I will follow this definition.
|
||||
|
||||
So, the smartphones were to blame for everything.
|
||||
|
||||
> Smartphones had no choice but to give life to a new kind of photography — computational
|
||||
|
||||
They had little noisy sensors and tiny slow lenses. According to all the laws of physics, they could only bring us pain and suffering. And they did. Until some devs figured out how to use their strengths to overcome the weaknesses: fast electronic shutters, powerful processors, and software.
|
||||
|
||||
<https://i.vas3k.ru/88h.jpg>
|
||||
|
||||
Most of the significant research in computational photography field was done in 2005-2015, that counts yesterday in science. Means, right now, just in front of our eyes and inside our pockets, there's a new field of knowledge and technology is rising, that never existed before.
|
||||
|
||||
<https://i.vas3k.ru/87c.jpg>
|
||||
|
||||
Computational photography isn't just about the bokeh on selfies. A recent photograph of a black hole would not have been taken without using computational photography methods. To take such picture with a standard telescope, we would have to make it the size of the Earth. However, by combining the data of eight radio telescopes at different locations of our Earth-ball and writing [some cool Python scripts][6], we got the world's first picture of the event horizon.
|
||||
|
||||
It's still good for selfies though, don't worry.
|
||||
|
||||
📝 [Computational Photography: Principles and Practice][7]
|
||||
📝 [Marc Levoy: New Techniques in Computational photography][8]
|
||||
|
||||
I'm going to insert such links in the course of the story. They will lead you to the rare brilliant articles 📝 or videos 🎥, that I found, and allow you to dive deeper into the topic if you suddenly became interested. Because I physically can't tell you everything in one blog post.
|
||||
|
||||
### The Beginning: Digital Processing
|
||||
|
||||
Let's get back to 2010. Justin Bieber released his first album, Burj Khalifa just opened in Dubai, but we couldn't even capture these two great universe events, because our photos were noisy 2-megapixel JPEGs. We got the first irresistible desire to hide the worthlessness of mobile cameras by using "vintage" presets. Instagram cames out.
|
||||
|
||||
<https://i.vas3k.ru/88i.jpg>
|
||||
|
||||
### Math and Instagram
|
||||
|
||||
With the release of Instagram, everyone got obsessed with filters. As the man who reverse engineered the X-Pro II, Lo-Fi, and Valencia for, of course, research (hehe) purposes, I still remember that they comprised three components:
|
||||
|
||||
<https://i.vas3k.ru/85k.jpg>
|
||||
|
||||
* Color settings (Hue, Saturation, Lightness, Contrast, Levels, etc.) are simple coefficients, just like in any presets that photographers used since ancient times.
|
||||
|
||||
|
||||
|
||||
<https://i.vas3k.ru/85i.jpg>
|
||||
|
||||
* Tone Mapping is a vector of values, each tells us that "red with a hue of 128 should be turned into a hue of 240". It often represented as a single-pixel picture, like [this one][9]. This is an example for the X-Pro II filter.
|
||||
|
||||
|
||||
|
||||
<https://i.vas3k.ru/85t.jpg>
|
||||
|
||||
* Overlay — translucent picture with dust, grain, vignette, and everything else that can be applied from above to get (not at all, yeah) the banal effect of the old film. Used rarely.
|
||||
|
||||
|
||||
|
||||
Modern filters have not gone far from these three, but have become a little more complicated from the math perspective. With the advent of hardware shaders and [OpenCL][10] on smartphones, they were quickly rewritten under the GPU, and it was considered insanely cool. For 2012, of course. Today any kid can do the same thing [on CSS][11], but he still won't invite a girl to prom.
|
||||
|
||||
However, the progress in the area of filters has not stopped there. Guys from [Dehanсer][12], for example, are getting very hands-on with non-linear filters. Instead of poor-human tone-mapping, they use more posh and complex non-linear transformations, which opens up much more opportunities, according to them.
|
||||
|
||||
You can do a lot of things with non-linear transformations, but they are incredibly complex, and we humans are incredibly stupid. As soon as it comes to non-linear transformations, we prefer to go with numerical methods or run neural networks to do our job. The same thing happens here.
|
||||
|
||||
### Automation and Dreams of a "Masterpiece" Button
|
||||
|
||||
When everybody got used to filters, we started to integrate them right into our cameras. It's hidden in history whoever was the first manufacturer to implement this, but just to understand how long ago it was, think, that in iOS 5.0 released in 2011 we already had a public API for [Auto Enhancing Images][13]. Only Steve Jobs knows how long it was in use before it opened to the public.
|
||||
|
||||
The automation was doing the same thing that any of us does by opening the photo editor — it fixed the lights and shadows, increased the brightness, took away the red eyes, and fixed the face color. Users didn't even know that "dramatically improved camera" was just the merit of a couple of new lines of code.
|
||||
|
||||
<https://i.vas3k.ru/865.jpg>ML Enhance in Pixelmator
|
||||
|
||||
Today, the battles for the Masterpiece button have moved to the machine learning field. Tired of playing with tone-mapping everyone rushed to the hype train [CNN's and GAN's][14] and started, forcing computers to move the sliders for us. In other words, to use an input image to determine a set of optimal parameters that will bring the given image closer to a particular subjective understanding of "good photography". Check out how it's implemented in [Pixelmator Pro][15] and other editors who's luring you with their fancy "ML" features stated on a landing page. It doesn't always work well, as you can guess. But you can always take the datasets and train your own network to beat these guys, using the links below. Or not.
|
||||
|
||||
📝 [Image Enhancement Papers][16]
|
||||
📝 [DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks][17]
|
||||
|
||||
### [Stacking 90% success of mobile cameras](#scroll50)
|
||||
|
||||
True computational photography began with stacking — a method of combining several photos on top of each other. It's not a big deal for a smartphone to shoot a dozen pics in half a second. There're no slow mechanical parts in their cameras: the aperture is fixed, and there is an electronic shutter instead of the "moving curtain". The processor simply tells the sensor how many microseconds it should catch the wild photons, and reads the result.
|
||||
|
||||
Technically, the phone can shoot photos at a speed of the video, and it can shoot video in a photo resolution, but all that is slowed down to the speed of the bus and processor. Therefore, there is always a software limitation.
|
||||
|
||||
Stacking has been with us for a while. Even the founders' fathers used plugins for Photoshop 7.0 to gather some crazy-sharpened HDR photos or to make a panorama of 18000x600 pixels, and… no one figured out what to do with them next. Good wild times.
|
||||
|
||||
Now, as grown-ups, we call it "[epsilon photography][18]", which means changing one of the camera parameters (exposure, focus, or position) and putting images together to get something that couldn't be captured in one shot. Although, in practice, we call it stacking. Nowadays, 90% of all mobile camera innovations are based on it.
|
||||
|
||||
<https://i.vas3k.ru/85d.jpeg>
|
||||
|
||||
There's a thing many people don't care, but it's crucial for understanding the entire mobile photography: **Modern smartphone camera starts taking photos as soon as you open it**. Which is logical, since it should show the image on screen somehow. But in addition to that, it saves high-resolution images to its cyclic buffer and stores them for a couple more seconds. No, not only for NSA.
|
||||
|
||||
> When you tap "take a photo" button, the photo has actually already been taken, and the camera is just using the last picture from the buffer
|
||||
|
||||
That's how any mobile camera works today. At least the top ones. Buffering allows implementing not only zero [shutter lag][19], which photographers begged for so long, but even a negative one. By pressing the button, the smartphone looks in the past, unloads 5-10 last photos from the buffer and starts to analyze and combine them furiously. No longer need to wait till phone snaps shots for HDR or a night mode — let's simply pick them up from the buffer, the user won't even realize.
|
||||
|
||||
In fact, that's how Live Photo implemented in iPhones, and HTC had it back in 2013 under a strange name [Zoe][20].
|
||||
|
||||
#### [Exposure Stacking HDR and brightness control](#scroll60)
|
||||
|
||||
<https://i.vas3k.ru/85x.jpg>
|
||||
|
||||
The old and hot topic is whether the camera sensors [can capture the entire brightness range available to our eyes][21]. Some people say no, as the eye can see up to 25 [f-stops][22] and even the top full-frame sensor can be stretched out to a maximum of 14. Others call the comparison incorrect, since our eyes are assisted by the brain, which automatically adjusts your pupils and completes the image with its neural networks. So the instantaneous dynamic range of the eye is actually no more than 10-14 f-stops. Too hard. Let's leave these disputes to scientists.
|
||||
|
||||
The fact remains — taking pictures of friends against a bright sky, without HDR, with any mobile camera, you get either a natural sky and dark faces of friends, or natural faces, but completely burned sky.
|
||||
|
||||
The solution was found a long time ago — to expand the brightness range using HDR (High-dynamic-range) process. When we can't get a wide range of brightness right away, we can do it in three steps (or more). We can shoot several pictures with different exposure — "normal" one, brighter, and darker one. Then we can fill in the shady spots using the bright photo, and restore overexposed spots from the dark one.
|
||||
|
||||
One last thing needs to be done here is solving the problem of automatic bracketing. How far do we shift the exposure of each photo so as not to overdo it? However, any second-year tech student can do it today using some Python libraries.
|
||||
|
||||
<https://i.vas3k.ru/86t.jpg>
|
||||
|
||||
The latest iPhone, Pixel and Galaxy turn on HDR mode automatically when a simple algorithm inside their cameras detects you're shooting on a sunny day. You can even see how the phone switches to buffer mode to save shifted images — fps drops down, and the picture on the screen becomes juicier. That moment of switching is every time clearly visible on my iPhone X. Take a closer look at your smartphone next time.
|
||||
|
||||
<https://i.vas3k.ru/87u.png>
|
||||
|
||||
The main disadvantage of HDR with exposure bracketing is its incredible uselessness in poor lighting. Even in the light of a home lamp, the images come out so dark that even the machine cannot level and stack them together. To solve the problem, Google announced a different approach to HDR in a Nexus smartphone back to 2013. It was using time stacking.
|
||||
|
||||
#### [Time Stacking Long exposure and time lapse](#scroll70)
|
||||
|
||||
<https://i.vas3k.ru/85v.jpg>
|
||||
|
||||
Time stacking allows you to get a long exposure look with a series of short shots. This approach pioneered by the guys, who liked to take pictures of star trails in the night sky. Even with a tripod, it was impossible to shot such pictures by opening the shutter once for two hours. You had to calculate all the settings beforehand, and the slightest shaking would spoil the whole shot. So they decided to divide the process into a few minute intervals and stack the pictures together later in Photoshop.
|
||||
|
||||
<https://i.vas3k.ru/86u.jpg>These star patterns are always glued together from a series of photos. That make it easier to control exposure
|
||||
|
||||
Thus, the camera never was shooting with a long exposure; we simulated the effect by combining several consecutive shots. Smartphones have a lot of apps using this trick for a long time, but now almost every manufacturer added it to standard camera tools.
|
||||
|
||||
<https://i.vas3k.ru/86f.jpg>A long exposure made of iPhone's Live Photo in 3 clicks
|
||||
|
||||
Let's get back to Google and its night-time HDR. It turned out that using time bracketing you can create a decent HDR in the dark. This technology appeared in Nexus 5 for the first time and was called HDR+. The technology is still so popular that [it is even praised][23] in the latest Pixel presentation.
|
||||
|
||||
HDR+ works quite simple: once the camera detects that you're shooting in the dark, it takes the last 8-15 RAW photos out of the buffer out and stacks them on top of each other. This way, the algorithm collects more information about the dark areas of the shot to minimize the noise — pixels, when due to some reasons the camera screwed up and failed to catch some photons on each particular frame.
|
||||
|
||||
Imagine that: you have no idea how [capybara][24] looks like, so you decided to ask five people about it. Their stories would be roughly the same, but each will mention any unique detail, and so you'd gather more information than if asking only one person. Same happens with pixels on photo. More information — more clarity and less noise.
|
||||
|
||||
📝 [HDR+: Low Light and High Dynamic Range photography in the Google Camera App][25]
|
||||
|
||||
Combining the images captured from the same point gives the same fake long exposure effect as in the example with the stars above. Exposure of dozens of pictures is summarized, and errors on one picture are minimized on the other. Imagine how many times you would have to slam the shutter in your DSLR to achieve this.
|
||||
|
||||
<https://i.vas3k.ru/86g.jpg>Pixel ad that glorifies HDR+ and Night Sight
|
||||
|
||||
Only one thing left, and this is an automatic color casting. Shots taken in the dark usually have broken color balance (yellowish or greenish), so we need to fix it manually. In earlier versions of HDR+, the issue was resolved by simple auto-toning fix, à la Instagram filters. Later, they brought a neural network to the rescue.
|
||||
|
||||
That's how [Night Sight][26] was born — "the night photography" technology in Pixel 2, 3, and later. The description says "machine learning techniques built on top of HDR+ that make Night Sight work". In fact, it's just a fancy name for a neural network and all the HDR+ post-processing steps. The machine was trained on "before" and "after" dataset of photos to make one beautiful image out of a set of dark and dirty ones.
|
||||
|
||||
<https://i.vas3k.ru/88k.jpg>
|
||||
|
||||
This dataset, by the way, was made public. Maybe Apple guys will take it and finally teach their "world-best cameras" to shoot in the dark?
|
||||
|
||||
Also, Night Sight calculates the [motion vector][27] of the objects in the shot to normalize the blurring, that's for sure will appear in a long exposure. Thus, the smartphone can take sharp parts from other shots and stack them.
|
||||
|
||||
📝 [Night Sight: Seeing in the Dark on Pixel Phones][28]
|
||||
📝 [Introducing the HDR+ Burst Photography Dataset][29]
|
||||
|
||||
#### [Focus Stacking DoF and refocus in post-production](#scroll90)
|
||||
|
||||
<https://i.vas3k.ru/85y.jpg>
|
||||
|
||||
The method came from macro photography, where the depth of field has always been a problem. To keep the entire object in focus, you had to take several shots, moving focus back and forth, and combine them later into one sharp shot in photoshop. The same method is often used by landscape photographers to make the foreground and background sharp as shark.
|
||||
|
||||
<https://i.vas3k.ru/86c.jpg>Focus stacking in macro. DoF is too small and you can't shoot it one go
|
||||
|
||||
Of course, it all migrated to smartphones. With no hype, though. Nokia released Lumia 1020 with "Refocus App" in 2013, and Samsung Galaxy S5 did the same in 2014 with "[Selective Focus][30]". Both used the same approach — they quickly took 3 photos: focused one, focus shifted forth and shifted back. The camera then aligned the images and allowed you to choose one of them, which was introduced as a "real" focus control in the post-production.
|
||||
|
||||
There was no further processing, as even this simple hack was enough to hammer another nail in the coffin of Lytro and analogs that used a fair refocus. Let's talk about them, by the way (topic change master 80 lvl).
|
||||
|
||||
### [Computational Sensor Plenoptic and Light Fields](#scroll100)
|
||||
|
||||
Well, our sensors are shit. We simply got used to it and trying to do our best with them. They haven't changed much in their design from the beginning of time. Technical process was the only thing that improved — we reduced the distance between pixels, fought noise, and added specific pixels for [phase-detection autofocus system][31]. But even if we take the most expensive camera to try to photograph a running cat in the indoor light, the cat will win.
|
||||
|
||||
<https://i.vas3k.ru/88p.jpg>
|
||||
|
||||
🎥 [The Science of Camera Sensors][32]
|
||||
|
||||
<https://i.vas3k.ru/881.jpg>
|
||||
|
||||
We've been trying to invent a better sensor for a long time. You can google a lot of researches in this field by "computational sensor" or "non-Bayer sensor" queries. Even the Pixel Shifting example can be referred to as an attempt to improve sensors with calculations.
|
||||
|
||||
The most promising stories of the last twenty years, though, come to us from plenoptic cameras.
|
||||
|
||||
To calm your sense of impending boring math, I'll throw in the insider's note — the last Google Pixel camera is a little bit plenoptic. With only two pixels in one, there's still enough to calculate a fair optical depth of field map without having a second camera like everyone else.
|
||||
|
||||
Plenoptics is a powerful weapon that hasn't fired yet.
|
||||
|
||||
#### [Light Field More than a photo, less than VR](#scroll190)
|
||||
|
||||
Usually, the explanation of plenoptic starts from light fields. And yes, from the science perspective, the plenoptic camera captures the light field, not just the photo. Plenus comes from the Latin "full", i.e., collecting all the information about the rays of light. Just like a Parliament plenary session.
|
||||
|
||||
Let's get to the bottom of this to understand what is a light field is and why do we need it.
|
||||
|
||||
Traditional photo is two-dimensional. There, where ray hit a sensor will be a pixel on a photo. The camera doesn't give a shit where the ray came from, whether it accidentally fell from aside or was reflected by a lovely lady's ass. The photo captures only the point of intersection of the ray with the surface of the sensor. So it's kinda 2D.
|
||||
|
||||
Light field image is the same, but with a new component — the origin of the ray. Means, it captures the ray vector in 3D space. Like calculating the lighting of a video game, but the other way around — we're trying to catch the scene, not create it. The light field is a set of all the light rays in our scene — both coming from the light sources and reflected.
|
||||
|
||||
<https://i.vas3k.ru/86h.png>There are a lot of mathematical models of light fields. Here's one of the most representative
|
||||
|
||||
The light field is essentially a visual model of the space around it. We can easily compute any photo within this space mathematically. Point of view, depth of field, aperture — all these are also computable.
|
||||
|
||||
I love to draw an analogy with a city here. Photography is like your favourite path from your home to the bar you always remember, while the light field is a map of the whole town. Using the map, you can calculate any route from point A to B. In the same way, knowing the light field, we can calculate any photo.
|
||||
|
||||
For an ordinary photo it's an overkill, I agree. But here comes the VR, where the light fields there are one of the most promising areas.
|
||||
|
||||
Having a light field model of an object or a room allows you to see this object or a room from any point in space as if everything around is virtual reality. It's no longer necessary to build a 3D-model of the room if we want to walk through it. We can "simply" capture all the rays inside it and calculate a picture of the room. Simply, yeah. That's what we're fighting over.
|
||||
|
||||
📝 [Google AR and VR: Experimenting with Light Fields][33]
|
||||
|
||||
[![][34]][35]
|
||||
|
||||
Saying optics, I with the [guys from Stanford][36] mean not only lenses but everything in between the object and sensor. Even the aperture and shutter. Sorry, photography snobs. I feel your pain.
|
||||
|
||||
#### Multi-camera
|
||||
|
||||
<https://i.vas3k.ru/851.jpg>
|
||||
|
||||
In 2014, the HTC One (M8) was released and became the first smartphone with two cameras and amusing computational photography [features][37] such as replacing the background with rain or sparkles.
|
||||
|
||||
The race has begun. Everybody started putting two, three, five lenses into their smartphones, trying to argue whether telephoto or wide-angle lens is better. Eventually, we got the [Light L16][38] camera. 16-lensed, as you can guess.
|
||||
|
||||
<https://i.vas3k.ru/859.jpg>Light L16
|
||||
|
||||
L16 was no longer a smartphone, but rather a new kind of pocket camera. It promised to reach the quality of top DSLRs with a high-aperture lens and full-frame sensor while yet fitting into your pocket. The power of computational photography algorithms was the main selling point.
|
||||
|
||||
<https://i.vas3k.ru/854.jpg>Telephoto-periscope, P30 Pro
|
||||
|
||||
It had 16 lenses: 5 x 28mm wide-angle and 5 x 70mm and 6 x 150mm telephoto. Each telephoto was periscope-style, meaning that the light did not flow directly through the lens to the sensor, but was reflected by a mirror inside the body. This configuration made it possible to fit a sufficiently long telephoto into a flat body, rather than stick out a "pipe" from it. Huawei recently did the same thing in the P30 Pro.
|
||||
|
||||
Each L16 photo was shot simultaneously on 10 or more lenses, and then the camera combined them to get a 52-megapixel image. According to the creators' idea, simultaneous shooting with several lenses made it possible to catch the same amount of light as with the large digital camera lens, artfully bypassing all the laws of optics.
|
||||
|
||||
Talking of software features, the first version had a depth of field and focus control in post-production. Minimal set. Having photos from different perspectives made it possible to compute the depth of the image and apply a decent software blur. Everything seemed nice on paper, so before the release, everybody even had hope for a bright computing future.
|
||||
|
||||
<https://i.vas3k.ru/88y.jpg>
|
||||
|
||||
In March 2018, Light L16 penetrated the market and… [miserably failed][39]. Yes, technologically it was in the future. However, at a price of $2000 it had no optical stabilization, so that the photos were always blurred (no wonder with 70-150 mm lenses), the autofocus was tediously slow, the algorithms of combining several pictures gave strange sharpness fluctuations, and there was no use for the camera in the dark, as it had no algorithms such as Google's HDR+ or Night Sight. Modern $500 point-and-shoot cameras with RAW support were able to do it from the start, so sales were discontinued after the first batch.
|
||||
|
||||
However, Light did not shut down at this point (hehe, pun). It raised the cash and continues to work on the new version with redoubled force. For instance, their technologies used in the recent [Nokia 9][40], which is a terrible dream of trypophobe. The idea is encouraging, so we are waiting for further innovations.
|
||||
|
||||
🎥 [Light L16 Review: Optical Insanity][41]
|
||||
|
||||
#### [
|
||||
|
||||
Coded Aperture
|
||||
Deplur + Depth Map](#scroll220)
|
||||
|
||||
We're entering the area of telescopes, X-rays, and other fog of war. We won't go deep, but it's safer to fasten your seatbelts. The story of the coded aperture began where it was physically impossible to focus the rays: for gamma and X-ray radiation. Ask your physics teacher; they will explain why.
|
||||
|
||||
The essence of the coded aperture is to replace the standard petal diaphragm with a pattern. The position of the holes should ensure that the overall shape is maximally varied depending on the defocus — the more diverse, the better. Astronomers invented the whole range of [such patterns][42] for their telescopes. I'll cite the very classical one here.
|
||||
|
||||
<https://i.vas3k.ru/88z.jpg>
|
||||
|
||||
How does this work?
|
||||
|
||||
When we focus on the object, everything beyond our depth of field is blurred. Physically, blur is when a lens projects one ray onto several pixels of the sensor due to defocus. So a street lamp turns into a bokeh pancake.
|
||||
|
||||
Mathematicians use the term convolution and deconvolution to refer to these operations. Let's remember these words cause they sound cool!
|
||||
|
||||
<https://i.vas3k.ru/890.jpg>
|
||||
|
||||
Technically, we can turn any convolution back if we know the kernel. That's what mathematicians say. In reality, we have a limited sensor range and non-ideal lens, so all of our bokeh is far from the mathematical ideal and cannot be fully restored.
|
||||
|
||||
📝 [High-quality Motion Deblurring from a Single Image][43]
|
||||
|
||||
We can still try if we know the kernel of the convolution. Not gonna keep you waiting — the kernel is actually the shape of the aperture. In other words, the aperture makes a mathematical convolution using pure optics.
|
||||
|
||||
The problem is that the standard round aperture remains round at any level of blurring. Our kernel is always about the same; it's stable, but not very useful. In case of encoded aperture, rays with different defocus degrees will be encoded with different kernels. Readers with IQ > 150 have already guessed what will happen next.
|
||||
|
||||
The only issue remains is to understand which kernel is encoded in each area of the image. You can try it on manually, by testing different kernels and looking where the convolution turns out to be more accurate, but this is not our way. A long time ago, people invented the Fourier transform for this. Don't want to abuse you with calculus, so I'll add a link to my favorite explanation for those who are interested.
|
||||
|
||||
🎥 [But what is the Fourier Transform? A visual introduction][44]
|
||||
|
||||
All you need to know is that the Fourier transform allows you to find out which waves are dominant in the pile of overlapped ones. In the case of music, the Fourier will show the frequency of the notes in the complex chord. In the case of photography, it is the main pattern of overlapping light rays, which is the kernel of the convolution.
|
||||
|
||||
Since the form of the coded aperture is always different depending on the distance to the object — we can calculate that distance mathematically using only one simple image shot with a regular sensor!
|
||||
|
||||
Using the inverse convolution on the kernel, we can restore the blurred areas of the image. Bring back all the scattered pixels.
|
||||
|
||||
<https://i.vas3k.ru/872.jpg>The convolution kernel is at the top right
|
||||
|
||||
That's how most deblur tools work. It works even with an average round aperture, yet the result is less accurate.
|
||||
|
||||
The downside of the coded aperture is the noise and light loss, which we can't ignore. Lidars and fairly accurate ToF-cameras have wholly negated all the ideas of using a coded aperture in consumer gadgets. If you've seen it somewhere, write in comments.
|
||||
|
||||
📝 [Image and Depth from a Conventional Camera with a Coded Aperture][45]
|
||||
📝 [Coded Aperture. Computational Photography WS 07/08][46]
|
||||
🎥 [Coded aperture projection (SIGGRAPH 2008 Talks)][47]
|
||||
|
||||
#### Phase Coding (Wavefront Coding)
|
||||
|
||||
According to the latest news, the light is half the wave. Coding the aperture, we control the transparency of the lens, means we control the wave amplitude. Besides the amplitude, there is a phase, which can also be coded.
|
||||
|
||||
And yes. It can be done with an additional lens, which reverses the phase of light passing through it. Like on the Pink Floyd cover.
|
||||
|
||||
<https://i.vas3k.ru/892.jpg>
|
||||
|
||||
Then everything works like any other optical encoding. Different areas of the image encoded in different ways, and we can algorithmically recognize and fix them somehow. To shift the focus, for example.
|
||||
|
||||
What is good about phase coding is that we don't lose brightness. All photons reach the sensor, unlike in the coded aperture, where they bump into impenetrable parts of it (after all in the other half of the standards said that light is a particle).
|
||||
|
||||
The bad part is that we will always lose sharpness, as even the utterly focused objects will be smoothly blurred in the sensor, and we will have to call Fourier to gather them together for us. I'll attach the link with more detailed description and examples of photos below.
|
||||
|
||||
📝 [Computational Optics by Jongmin Baek, 2012][36]
|
||||
|
||||
#### [Flutter Shutter Fighting the motion blur](#scroll240)
|
||||
|
||||
The last thing we can code throughout the path of light to the sensor is the shutter. Instead of usual "open — wait — close" cycle, we will move the shutter several times per shot to result with the desired shutter speed. Sort of as in a multi-exposure, where one shot is exposed several times.
|
||||
|
||||
Let's imagine we decided to take pictures of a fast-moving car at night to see its license plate afterward. We don't have a flash, we can't use slow shutter speed, either we'll blur everything. It is necessary to lower the shutter speed, but so we get to a completely black image, and won't recognize the car. What to do?
|
||||
|
||||
It also is possible to take this shot in flutter shutter movements, so that the car smear not evenly, but like a "ladder" with a known interval. Thus, we encode the blur with a random sequence of open-close of the shutter, and we can try to decode it with the same inverse convolution. Appears it works much better than trying to get back pixels, evenly blurred with long shutter speed.
|
||||
|
||||
<https://i.vas3k.ru/893.jpg>
|
||||
|
||||
There are several algorithms for that. For the hardcore details, I'll again include links to some smart Indian guys' work.
|
||||
|
||||
📝 [Coded exposure photography: motion deblurring using fluttered shutter][48]
|
||||
🎥 [Flutter Shutter Coded Filter][49]
|
||||
|
||||
Soon we'll go so goddamn crazy, so we'd want to control the lighting after the photo was taken too. To change the cloudy weather to a sunny one, or to change the lights on a model's face after shooting. Now it seems a bit wild, but let's talk again in ten years.
|
||||
|
||||
We've already invented a dumb device to control the light — a flash. They have come a long way: from the large lamp boxes that helped avoid the technical limitations of early cameras, to the modern LED flashes that spoil our pictures, so we mainly use them as a flashlight.
|
||||
|
||||
[![][50]][51]
|
||||
|
||||
#### Programmable Flash
|
||||
|
||||
It's been a long time since all smartphones switched to Dual LED flashes — a combination of orange and blue LEDs with brightness being adjusted to the color temperature of the shot. In the iPhone, for example, it's called True Tone and controlled by a small piece of code with a hacky formula. Even developers are not allowed to control it.
|
||||
|
||||
📝 [Demystifying iPhone’s Amber Flashlight][52]
|
||||
|
||||
<https://i.vas3k.ru/87k.jpg>
|
||||
|
||||
Then we started to think about the problem of all flashes — the overexposed faces and foreground. Everyone did it in their own way. iPhone got [Slow Sync Flash][53], which made camera artificially increase shutter speed in the dark. Google Pixel and other Android smartphones start using their depth sensors to combine images with and without flash, quickly made one by one. The foreground was taken from the photo without the flash when the background remained illuminated.
|
||||
|
||||
<https://i.vas3k.ru/86r.jpg>
|
||||
|
||||
The further use of a programmable multi-flash is vague. The only interesting application was found in computer vision, where it was used once in assembly scheme (like for Ikea book shelves) to detect the borders of objects more accurately. See the article below.
|
||||
|
||||
📝 [Non-photorealistic Camera:
|
||||
Depth Edge Detection and Stylized Rendering using Multi-Flash Imaging][54]
|
||||
|
||||
#### Lightstage
|
||||
|
||||
Light is fast. It's always made light coding an easy thing to do. We can change the lighting a hundred times per shot and still not get close to its speed. That's how Lighstage was created back in 2005.
|
||||
|
||||
<https://i.vas3k.ru/86d.jpg>
|
||||
|
||||
🎥 [Lighstage demo video][55]
|
||||
|
||||
The essence of the method is to highlight the object from all possible angles in each shot of a real 24 fps movie. To get this done, we use 150+ lamps and a high-speed camera that captures hundreds of shots with different lighting conditions per shot.
|
||||
|
||||
A similar approach is now used when shooting mixed CGI graphics in movies. It allows you to fully control the lighting of the object in the post-production, placing it in scenes with absolutely random lighting. We just grab the shots illuminated from the required angle, tint them a little, done.
|
||||
|
||||
<https://i.vas3k.ru/86s.jpg>
|
||||
|
||||
<https://i.vas3k.ru/86e.jpg>
|
||||
|
||||
Unfortunately, it's hard to do it on mobile devices, but probably someone will like the idea. I've seen the app from guys who shot a 3D face model, illuminating it with the phone flashlight from different sides.
|
||||
|
||||
#### Lidar and Time-of-Flight Camera
|
||||
|
||||
Lidar is a device that determines the distance to the object. Thanks to a recent hype of self-driving cars, now we can find a cheap lidar on any dumpster. You've probably seen these rotating thingys at their roof? These are lidars.
|
||||
|
||||
We still can't fit a laser lidar into a smartphone, but we can go with its younger brother — [time-of-light camera][56]. The idea is ridiculously simple — a special separate camera with an LED-flash above it. The camera measures how quickly the light reaches the objects and creates a depth map of the image.
|
||||
|
||||
<https://i.vas3k.ru/868.jpg>
|
||||
|
||||
The accuracy of modern ToF cameras is about a centimeter. The latest Samsung and Huawei top models use them to create a bokeh map and for better autofocus in the dark. The latter, by the way, is quite good. I wish everybody had one.
|
||||
|
||||
Knowing the exact depth of field will be useful in the coming era of augmented reality. It will be much more accurate and effortless to shoot at the surfaces with lidar to make the first mapping in 3D than analyzing camera images.
|
||||
|
||||
#### Projector Illumination
|
||||
|
||||
To finally get serious about the computational lighting, we have to switch from regular LED flashes to projectors — devices that can project a 2D picture on a surface. Even a simple monochrome grid will be a good start for smartphones.
|
||||
|
||||
The first benefit of the projector is that it can illuminate only the part of the image that needs to be illuminated. No more burnt faces in the foreground. Objects can be recognized and ignored, just like laser headlights of modern car don't blind the oncoming drivers but illuminate pedestrians. Even with the minimum resolution of the projector, such as 100x100 dots, the possibilities are exciting.
|
||||
|
||||
<https://i.vas3k.ru/86i.jpg>Today, you can't surprise a kid with a car with a controllable light
|
||||
|
||||
The second and more realistic use of the projector is to project an invisible grid on a scene to detect its depth map. With a grid like this, you can safely throw away all your neural networks and lidars. All the distances to the objects in the image now can be calculated with the simplest computer vision algorithms. It was done in Microsoft Kinect times (rest in peace), and it was great.
|
||||
|
||||
Of course, it's worth to remember here the Dot Projector for Face ID on iPhone X and above. That's our first small step towards projector technology, but quite a noticeable one.
|
||||
|
||||
<https://i.vas3k.ru/86j.jpg>Dot Projector in iPhone X
|
||||
|
||||
<https://j.gifs.com/wVrzx8.gif>
|
||||
|
||||
It's time to reflex a bit. Observing what major technology companies are doing, it becomes clear that our next 10 years will be tightly tied to augmented reality. Today AR still looks like a toy to play [with 3D wifey][57], to [try on sneakers][58], to see [how the makeup looks][59], or to train [the U.S. Army][60]. Tomorrow we won't even notice we're using it every day. Dense flows of cash in this area are already felt from the Google and Nvidia offices.
|
||||
|
||||
For photography, AR means the ability to control the 3D scene. Scan the area, like smartphones with [Tango][61] do, add new objects, like in [HoloLenz][62], all such things. Don't worry about the poor graphics of modern AR-apps. As soon as game dev companies invade the area with their battle royales, everything becomes much better than PS4.
|
||||
|
||||
<https://i.vas3k.ru/87f.jpg> <https://i.vas3k.ru/87h.jpg>
|
||||
|
||||
By [Defected Pixel][63]
|
||||
|
||||
Remember that epic [fake Moon Mode][64] presented by Huawei? If you missed it: when Huawei camera detects you're going to take a photo of moon, it puts a pre-prepared high-resolution moon picture on top of your photo. Because it looks cooler, indeed! True Chinese cyberpunk.
|
||||
|
||||
<https://i.vas3k.ru/869.jpg>Life goal: be able to bend the truth like Huawei
|
||||
|
||||
When all the jokes were joked in twitter, I thought about that situation — Huawei gave people exactly what they promised. The moon was real, and the camera lets you shoot it THIS awesome. No deception. Tomorrow, if you give people the opportunity to replace the sky on their photos with beautiful sunsets, half the planet will be amazed.
|
||||
|
||||
> In the future, machines will be "finishing up" and re-painting our photos for us
|
||||
|
||||
Pixel, Galaxy and other Android-phones have some stupid AR-mode today. Some let you add cartoon characters to take photos with them, others spread emojis all over the room, or put a mask on your face just like in a Snapchat.
|
||||
|
||||
These are just our first naive steps. Today, Google camera has Google Lens, that finds information about any object you point your camera at. Samsung does the same with Bixby. For now, these tricks are only made to humiliate the iPhone users, but it's easy to imagine the next time you're taking a pic with the Eiffel Tower, your phone says: you know, your selfie is shit. I'll put a nice sharp picture of the tower in the background, fix your hair, and remove a pimple above your lip. If you plan to post it to Instagram, VSCO L4 filter will work the best for it. You're welcome, leather bastard.
|
||||
|
||||
After a while, the camera will start to replace the grass with greener one, your friends with better ones, and boobs with bigger ones. Or something like that. A brave new world.
|
||||
|
||||
<https://i.vas3k.ru/894.jpg>
|
||||
|
||||
In the beginning it's gonna look ridiculous. Probably even terrible. The photo-aesthetes will be enraged, the fighters for natural beauty will launch a company to ban neural networks usage, but the mass audience will be delighted.
|
||||
|
||||
Because photography always was just a way to express and share emotions. Every time there is a tool to express more vividly and effectively, everyone starts using it — emoji, filters, stickers, masks, audio messages. Some will already find the list disgusting, but it can be easily continued.
|
||||
|
||||
Photos of the "objective reality" will seem as boring as your great-grandmother's pictures on the chair. They won't die but become something like paper books or vinyl records — a passion of enthusiasts, who see a special deep meaning in it. "Who cares of setting up the lighting and composition when my phone can do the same". That's our future. Sorry.
|
||||
|
||||
The mass audience doesn't give a shit about objectivity. It needs algorithms to make their faces younger, and vacations cooler than their coworker or neighbor. The augmented reality will re-draw the reality for them, even with a higher level of detail than it really is. It may sound funny, but we'll start to improve the graphics in the real world.
|
||||
|
||||
And yes, as it always does, it all starts with teenagers with their "strange, stupid hobbies for idiots". That's what happens all the time. When you stop understanding something — this IS the future.
|
||||
|
||||
It is hard to legibly compare the differences in modern smartphone cameras, as due to the huge competition in the market, everyone implements the new features almost simultaneously. There's no way to be objective in the world where Google announces a new Night Mode, then Samsung and Xiaomi just copy it in a new firmware after a month. So I'm not gonna even try to be objective here.
|
||||
|
||||
In the pictures below, I briefly described the main features that I found interesting (in the context of this article) — ignoring the most obvious things like Dual LED flashes, automatic white balance, or panorama mode. In the next section, you can share your insights about your favorite smartphone. Crowdsourcing!
|
||||
|
||||
#### A place to brag about your smartphone
|
||||
|
||||
For this comparison, I only took four phones that I tested myself. Of course, there are thousands more in the world. If you have or had an interesting phone, please tell us a few words about its camera and your experience below in comments.
|
||||
|
||||
Throughout history, each human technology becomes more advanced as soon as it stops copying living organisms. Today, it is hard to imagine a car with joints and muscles instead of wheels. Planes with fixed wings fly 800+ km/h — birds don't even try. There are no analogs to the computer processor in nature at all.
|
||||
|
||||
The most exciting part of the list is what's not in it. Camera sensors. We still haven't figured out anything better than imitating the eye structure. The same crystalline lens and a set of RGGB-cones as retina has.
|
||||
|
||||
Computational photography has added a "brain" to this process. A processor that handles visual information not only by reading pixels through the optic nerve but also by complementing the picture based on its experience. Yes, it opens up a lot of possibilities for us today, but there is a hunch we're still trying to wave with hand-made wings instead of inventing a plane. One that will leave behind all these shutters, apertures, and Bayer filters.
|
||||
|
||||
The beauty of the situation is that we can't even imagine today what it's going to be.
|
||||
|
||||
Most of us will even die without knowing.
|
||||
|
||||
And it's wonderful.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://vas3k.com/blog/computational_photography/
|
||||
|
||||
作者:[vas3k][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://vas3k.com/
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://www.dropbox.com/sh/pqw8x5vepavffq5/AADEbPpQr71JUr31A6g96zHxa?dl=0
|
||||
[2]: https://vas3k.com/donate/
|
||||
[3]: https://vas3k.ru/blog/computational_photography/
|
||||
[4]: https://en.wikipedia.org/wiki/Computational_photography
|
||||
[5]: https://medium.com/hd-pro/computational-photography-will-revolutionize-digital-imaging-a25d34f37b11
|
||||
[6]: https://achael.github.io/_pages/imaging/
|
||||
[7]: http://alumni.media.mit.edu/~jaewonk/Publications/Comp_LectureNote_JaewonKim.pdf
|
||||
[8]: https://graphics.stanford.edu/talks/compphot-publictalk-may08.pdf
|
||||
[9]: https://github.com/danielgindi/Instagram-Filters/blob/master/InstaFilters/Resources_for_IF_Filters/xproMap.png
|
||||
[10]: https://en.wikipedia.org/wiki/OpenCL
|
||||
[11]: https://una.im/CSSgram/
|
||||
[12]: http://blog.dehancer.com/category/examples/
|
||||
[13]: https://developer.apple.com/library/archive/documentation/GraphicsImaging/Conceptual/CoreImaging/ci_autoadjustment/ci_autoadjustmentSAVE.html
|
||||
[14]: http://vas3k.com/blog/machine_learning/
|
||||
[15]: https://www.pixelmator.com/pro/machine-learning/
|
||||
[16]: https://paperswithcode.com/task/image-enhancement
|
||||
[17]: http://people.ee.ethz.ch/~ihnatova/#dataset
|
||||
[18]: https://en.wikipedia.org/wiki/Epsilon_photography
|
||||
[19]: https://en.wikipedia.org/wiki/Shutter_lag
|
||||
[20]: https://www.youtube.com/watch?v=FmB1LztzEVM
|
||||
[21]: https://www.cambridgeincolour.com/tutorials/cameras-vs-human-eye.htm
|
||||
[22]: https://en.wikipedia.org/wiki/F-number
|
||||
[23]: https://www.youtube.com/watch?v=iLtWyLVjDg0&t=0
|
||||
[24]: https://en.wikipedia.org/wiki/Capybara
|
||||
[25]: https://ai.googleblog.com/2014/10/hdr-low-light-and-high-dynamic-range.html
|
||||
[26]: https://www.blog.google/products/pixel/see-light-night-sight/
|
||||
[27]: https://en.wikipedia.org/wiki/Optical_flow
|
||||
[28]: https://ai.googleblog.com/2018/11/night-sight-seeing-in-dark-on-pixel.html
|
||||
[29]: https://ai.googleblog.com/2018/02/introducing-hdr-burst-photography.html
|
||||
[30]: https://recombu.com/mobile/article/focus-shifting-explained_m20454-html
|
||||
[31]: https://www.imaging-resource.com/news/2015/09/15/sony-mirrorless-cameras-will-soon-focus-as-fast-as-dslrs-if-this-patent-bec
|
||||
[32]: https://www.youtube.com/watch?v=MytCfECfqWc
|
||||
[33]: https://www.blog.google/products/google-ar-vr/experimenting-light-fields/
|
||||
[34]: https://i.vas3k.ru/full/871.gif
|
||||
[35]: https://i.vas3k.ru/full/full/871.gif
|
||||
[36]: http://graphics.stanford.edu/courses/cs478/lectures/02292012_computational_optics.pdf
|
||||
[37]: https://www.computerworld.com/article/2476104/in-pictures--here-s-what-the-htc-one-s-dual-cameras-can-do.html
|
||||
[38]: https://light.co/camera
|
||||
[39]: https://petapixel.com/2017/12/08/review-light-l16-brilliant-braindead/
|
||||
[40]: https://www.nokia.com/phones/en_int/nokia-9-pureview/
|
||||
[41]: https://www.youtube.com/watch?v=W3pBp12r-m0
|
||||
[42]: http://ipl.uv.es/?q=es/content/page/ibis-coded-mask
|
||||
[43]: http://jiaya.me/papers/deblur_siggraph08.pdf
|
||||
[44]: https://www.youtube.com/watch?v=spUNpyF58BY
|
||||
[45]: https://graphics.stanford.edu/courses/cs448a-08-spring/levin-coded-aperture-sig07.pdf
|
||||
[46]: https://www.eecs.tu-berlin.de/fileadmin/fg144/Courses/07WS/compPhoto/Coded_Aperture.pdf
|
||||
[47]: https://www.youtube.com/watch?v=4kh71S446FM
|
||||
[48]: http://www.cs.cmu.edu/~ILIM/projects/IM/aagrawal/sig06/CodedExposureLowres.pdf
|
||||
[49]: https://www.youtube.com/watch?v=gGvvqj-lF5o
|
||||
[50]: https://i.vas3k.ru/full/87b.gif
|
||||
[51]: https://i.vas3k.ru/full/full/87b.gif
|
||||
[52]: https://medium.com/@thatchaponunprasert/demystifying-iphones-amber-flashlight-519352db10bd
|
||||
[53]: https://www.reddit.com/r/iphone/comments/71myyp/a_feature_from_the_new_new_iphone_a_few_talk_about_is/
|
||||
[54]: https://www.eecis.udel.edu/~jye/lab_research/SIG04/SIG_YU_RASKAR.pdf
|
||||
[55]: https://www.youtube.com/watch?v=wT2uFlP0MlU
|
||||
[56]: https://en.m.wikipedia.org/wiki/Time-of-flight_camera
|
||||
[57]: https://youtu.be/p9oDlvOV3qs?t=161
|
||||
[58]: https://www.youtube.com/watch?v=UmJriqzDUTo
|
||||
[59]: https://www.youtube.com/watch?v=dpSP6ZM5XGo
|
||||
[60]: https://www.youtube.com/watch?time_continue=87&v=x8p19j8C6VI
|
||||
[61]: https://en.wikipedia.org/wiki/Tango_%28platform%29
|
||||
[62]: https://youtu.be/e-n90xrVXh8?t=314
|
||||
[63]: https://vk.com/pxirl
|
||||
[64]: https://www.androidauthority.com/huawei-p30-pro-moon-mode-controversy-978486/
|
Loading…
Reference in New Issue
Block a user