diff --git a/.gitmodules b/.gitmodules deleted file mode 100644 index 64ae20885b..0000000000 --- a/.gitmodules +++ /dev/null @@ -1,3 +0,0 @@ -[submodule "comic"] - path = comic - url = https://wxy@github.com/LCTT/comic.git diff --git a/.travis.yml b/.travis.yml index 26ca1955e2..0b25cff718 100644 --- a/.travis.yml +++ b/.travis.yml @@ -2,7 +2,13 @@ language: c script: - sh ./scripts/check.sh - ./scripts/badge.sh - +branches: + only: + - master + except: + - gh-pages +git: + submodules: false deploy: provider: pages skip_cleanup: true diff --git a/README.md b/README.md index b3bbaf0ee6..19005502c6 100644 --- a/README.md +++ b/README.md @@ -6,9 +6,9 @@ ![待校正](https://lctt.github.io/TranslateProject/badge/translated.svg) ![已发布](https://lctt.github.io/TranslateProject/badge/published.svg) -LCTT 是“Linux中国”([https://linux.cn/](https://linux.cn/))的翻译组,负责从国外优秀媒体翻译 Linux 相关的技术、资讯、杂文等内容。 +[LCTT](https://linux.cn/lctt/) 是“Linux中国”([https://linux.cn/](https://linux.cn/))的翻译组,负责从国外优秀媒体翻译 Linux 相关的技术、资讯、杂文等内容。 -LCTT 已经拥有几百名活跃成员,并欢迎更多的Linux志愿者加入我们的团队。 +LCTT 已经拥有几百名活跃成员,并欢迎更多的 Linux 志愿者加入我们的团队。 ![logo](https://linux.cn/static/image/common/lctt_logo.png) @@ -70,6 +70,10 @@ LCTT 的组成 * 2018/01/11 提升 lujun9972 成为核心成员,并加入选题组。 * 2018/02/20 遭遇 DMCA 仓库被封。 * 2018/05/15 提升 MjSeven 为核心成员。 +* 2018/08/01 [发布 Linux 中国通证:LCCN](https://linux.cn/article-9886-1.html)。 +* 2018/08/17 提升 pityonline 为核心成员,担任校对,并接受他的建议采用 PR 审核模式。 +* 2018/09/10 [LCTT 五周年](https://linux.cn/article-9999-1.html)。 +* 2018/10/25 重构了 CI,感谢 vizv、lujun9972、bestony。 核心成员 ------------------------------- @@ -78,13 +82,16 @@ LCTT 的组成 - 组长 @wxy, - 选题 @oska874, +- 选题 @lujun9972, +- 技术 @bestony, - 校对 @jasminepeng, +- 校对 @pityonline, - 钻石译者 @geekpi, +- 钻石译者 @qhwdw, - 钻石译者 @GOLinux, -- 钻石译者 @ictlyh, -- 技术组长 @bestony, -- 漫画组长 @GHLandy, -- LFS 组长 @martin2011qi, +- 核心成员 @GHLandy, +- 核心成员 @martin2011qi, +- 核心成员 @ictlyh, - 核心成员 @strugglingyouth, - 核心成员 @FSSlc, - 核心成员 @zpl1025, @@ -96,8 +103,6 @@ LCTT 的组成 - 核心成员 @Locez, - 核心成员 @ucasFL, - 核心成员 @rusking, -- 核心成员 @qhwdw, -- 核心成员 @lujun9972 - 核心成员 @MjSeven - 前任选题 @DeadFire, - 前任校对 @reinoir222, diff --git a/comic b/comic deleted file mode 160000 index e5db5b880d..0000000000 --- a/comic +++ /dev/null @@ -1 +0,0 @@ -Subproject commit e5db5b880dac1302ee0571ecaaa1f8ea7cf61901 diff --git a/sources/talk/20171007 The Most Important Database You-ve Never Heard of.md b/sources/talk/20171007 The Most Important Database You-ve Never Heard of.md new file mode 100644 index 0000000000..f429aba373 --- /dev/null +++ b/sources/talk/20171007 The Most Important Database You-ve Never Heard of.md @@ -0,0 +1,50 @@ +The Most Important Database You've Never Heard of +====== +In 1962, JFK challenged Americans to send a man to the moon by the end of the decade, inspiring a heroic engineering effort that culminated in Neil Armstrong’s first steps on the lunar surface. Many of the fruits of this engineering effort were highly visible and sexy—there were new spacecraft, new spacesuits, and moon buggies. But the Apollo Program was so staggeringly complex that new technologies had to be invented even to do the mundane things. One of these technologies was IBM’s Information Management System (IMS). + +IMS is a database management system. NASA needed one in order to keep track of all the parts that went into building a Saturn V rocket, which—because there were two million of them—was expected to be a challenge. Databases were a new idea in the 1960s and there weren’t any already available for NASA to use, so, in 1965, NASA asked IBM to work with North American Aviation and Caterpillar Tractor to create one. By 1968, IBM had installed a working version of IMS at NASA, though at the time it was called ICS/DL/I for “Informational Control System and Data Language/Interface.” (IBM seems to have gone through a brief, unfortunate infatuation with the slash; see [PL/I][1].) Two years later, IBM rebranded ICS/DL/I as “IMS” and began selling it to other customers. It was one of the first commercially available database management systems. + +The incredible thing about IMS is that it is still in use today. And not just on a small scale: Banks, insurance companies, hospitals, and government agencies still use IMS for all sorts of critical tasks. Over 95% of Fortune 1000 companies use IMS in some capacity, as do all of the top five US banks. Whenever you withdraw cash from an ATM, the odds are exceedingly good that you are interacting with IMS at some point in the course of your transaction. In a world where the relational database is an old workhorse increasingly in competition with trendy new NoSQL databases, IMS is a freaking dinosaur. It is a relic from an era before the relational database was even invented, which didn’t happen until 1970. And yet it seems to be the database system in charge of all the important stuff. + +I think this makes IMS pretty interesting. Depending on how you feel about relational databases, it either offers insight into how the relational model improved on its predecessors or else exemplifies an alternative model better suited to certain problems. + +IMS works according to a hierarchical model, meaning that, instead of thinking about data as tables that can be brought together using JOIN operations, IMS thinks about data as trees. Each kind of record you store can have other kinds of records as children; these child record types represent additional information that you might be interested in given a record of the parent type. + +To take an example, say that you want to store information about bank customers. You might have one type of record to represent customers and another type of record to represent accounts. Like in a relational database, where each table has columns, these records will have different fields; we might want to have a first name field, a last name field, and a city field for each customer. We must then decide whether we are likely to first lookup a customer and then information about that customer’s account, or whether we are likely to first lookup an account and then information about that account’s owner. Assuming we decide that we will access customers first, then we will make our account record type a child of our customer record type. Diagrammed, our database model would look something like this: + +![][2] + +And an actual database might look like: + +![][3] + +By modeling our data this way, we are hewing close to the reality of how our data is stored. Each parent record includes pointers to its children, meaning that moving down our tree from the root node is efficient. (Actually, each parent basically stores just one pointer to the first of its children. The children in turn contain pointers to their siblings. This ensures that the size of a record does not vary with the number of children it has.) This efficiency can make data accesses very fast, provided that we are accessing our data in ways that we anticipated when we first structured our database. According to IBM, an IMS instance can process over 100,000 transactions a second, which is probably a large part of why IMS is still used, particularly at banks. But the downside is that we have lost a lot of flexibility. If we want to access our data in ways we did not anticipate, we will have a hard time. + +To illustrate this, consider what might happen if we decide that we would like to access accounts before customers. Perhaps customers are calling in to update their addresses, and we would like them to uniquely identify themselves using their account numbers. So we want to use an account number to find an account, and then from there find the account’s owner. But since all accesses start at the root of our tree, there’s no way for us to get to an account efficiently without first deciding on a customer. To fix this problem, we could introduce a second tree or hierarchy starting with account records; these account records would then have customer records as children. This would let us access accounts and then customers efficiently. But it would involve duplicating information that we already have stored in our database—we would have two trees storing the same information in different orders. Another option would be to establish an index of accounts that could point us to the right account record given an account number. That would work too, but it would entail extra work during insert and update operations in the future. + +It was precisely this inflexibility and the problem of duplicated information that pushed E. F. Codd to propose the relational model. In his 1970 paper, A Relational Model of Data for Large Shared Data Banks, he states at the outset that he intends to present a model for data storage that can protect users from having to know anything about how their data is stored. Looked at one way, the hierarchical model is entirely an artifact of how the designers of IMS chose to store data. It is a bottom-up model, the implication of a physical reality. The relational model, on the other hand, is an abstract model based on relational algebra, and is top-down in that the data storage scheme can be anything provided it accommodates the model. The relational model’s great advantage is that, just because you’ve made decisions that have caused the database to store your data in a particular way, you won’t find yourself effectively unable to make certain queries. + +All that said, the relational model is an abstraction, and we all know abstractions aren’t free. Banks and large institutions have stuck with IMS partly because of the performance benefits, though it’s hard to say if those benefits would be enough to keep them from switching to a modern database if they weren’t also trying to avoid rewriting mission-critical legacy code. However, today’s popular NoSQL databases demonstrate that there are people willing to drop the conveniences of the relational model in return for better performance. Something like MongoDB, which encourages its users to store data in a denormalized form, isn’t all that different from IMS. If you choose to store some entity inside of another JSON record, then in effect you have created something like the IMS hierarchy, and you have constrained your ability to query for that data in the future. But perhaps that’s a tradeoff you’re willing to make. So, even if IMS hadn’t predated E. F. Codd’s relational model by several years, there are still reasons why IMS’ creators might not have adopted the relational model wholesale. + +Unfortunately, IMS isn’t something that you can download and take for a spin on your own computer. First of all, IMS is not free, so you would have to buy it from IBM. But the bigger problem is that IMS only runs on IBM mainframes like the IBM z13. That’s a shame, because it would be a joy to play around with IMS and get a sense for exactly how it differs from something like MySQL. But even without that opportunity, it’s interesting to think about software systems that work in ways we don’t expect or aren’t used to. And it’s especially interesting when those systems, alien as they are, turn out to undergird your local hospital, the entire financial sector, and even the federal government. + +If you enjoyed this post, more like it come out every two weeks! Follow [@TwoBitHistory][4] on Twitter or subscribe to the [RSS feed][5] to make sure you know when a new post is out. + +-------------------------------------------------------------------------------- + +via: https://twobithistory.org/2017/10/07/the-most-important-database.html + +作者:[Two-Bit History][a] +选题:[lujun9972][b] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]: https://twobithistory.org +[b]: https://github.com/lujun9972 +[1]: https://en.wikipedia.org/wiki/PL/I +[2]: https://twobithistory.org/images/hierarchical-model.png +[3]: https://twobithistory.org/images/hierarchical-db.png +[4]: https://twitter.com/TwoBitHistory +[5]: https://twobithistory.org/feed.xml diff --git a/sources/talk/20171119 The Ruby Story.md b/sources/talk/20171119 The Ruby Story.md new file mode 100644 index 0000000000..90d5f41790 --- /dev/null +++ b/sources/talk/20171119 The Ruby Story.md @@ -0,0 +1,84 @@ +The Ruby Story +====== +Ruby has always been one of my favorite languages, though I’ve sometimes found it hard to express why that is. The best I’ve been able to do is this musical analogy: Whereas Python feels to me like punk rock—it’s simple, predictable, but rigid—Ruby feels like jazz. Ruby gives programmers a radical freedom to express themselves, though that comes at the cost of added complexity and can lead to programmers writing programs that don’t make immediate sense to other people. + +I’ve always been aware that freedom of expression is a core value of the Ruby community. But what I didn’t appreciate is how deeply important it was to the development and popularization of Ruby in the first place. One might create a programming lanugage in pursuit of better peformance, or perhaps timesaving abstractions—the Ruby story is interesting because instead the goal was, from the very beginning, nothing more or less than the happiness of the programmer. + +### Yukihiro Matsumoto + +Yukihiro Matsumoto, also known as “Matz,” graduated from the University of Tsukuba in 1990. Tsukuba is a small town just northeast of Tokyo, known as a center for scientific research and technological devlopment. The University of Tsukuba is particularly well-regarded for its STEM programs. Matsumoto studied Information Science, with a focus on programming languages. For a time he worked in a programming language lab run by Ikuo Nakata. + +Matsumoto started working on Ruby in 1993, only a few years after graduating. He began working on Ruby because he was looking for a scripting language with features that no existing scripting language could provide. He was using Perl at the time, but felt that it was too much of a “toy language.” Python also fell short; in his own words: + +> I knew Python then. But I didn’t like it, because I didn’t think it was a true object-oriented language—OO features appeared to be an add-on to the language. As a language maniac and OO fan for 15 years, I really wanted a genuine object-oriented, easy-to-use scripting language. I looked for one, but couldn’t find one. + +So one way of understanding Matsumoto’s motivations in creating Ruby is that he was trying to create a better, object-oriented version of Perl. + +But at other times, Matsumoto has said that his primary motivation in creating Ruby was simply to make himself and others happier. Toward the end of a Google tech talk that Matsumoto gave in 2008, he showed the following slide: + +![][1] + +He told his audience, + +> I hope to see Ruby help every programmer in the world to be productive, and to enjoy programming, and to be happy. That is the primary purpose of the Ruby language. + +Matsumoto goes on to joke that he created Ruby for selfish reasons, because he was so underwhelmed by other languages that he just wanted to create something that would make him happy. + +The slide epitomizes Matsumoto’s humble style. Matsumoto, it turns out, is a practicing Mormon, and I’ve wondered whether his religious commitments have any bearing on his legendary kindness. In any case, this kindness is so well known that the Ruby community has a principle known as MINASWAN, or “Matz Is Nice And So We Are Nice.” The slide must have struck the audience at Google as an unusual one—I imagine that any random slide drawn from a Google tech talk is dense with code samples and metrics showing how one engineering solution is faster or more efficient than another. Few, I suspect, come close to stating nobler goals more simply. + +Ruby was influenced primarily by Perl. Perl was created by Larry Wall in the late 1980s as a means of processing and transforming text-based reports. It became well-known for its text processing and regular expression capabilities. A Perl program contains many syntactic elements that would be familiar to a Ruby programmer—there are `$` signs, `@` signs, and even `elsif`s, which I’d always thought were one of Ruby’s less felicitous idiosyncracies. On a deeper level, Ruby borrows much of Perl’s regular expression handling and standard library. + +But Perl was by no means the only influence on Ruby. Prior to beginning work on Ruby, Matsumoto worked on a mail client written entirely in Emacs Lisp. The experience taught him a lot about the inner workings of Emacs and the Lisp language, which Matsumoto has said influenced the underlying object model of Ruby. On top of that he added a Smalltalk-style messsage passing system which forms the basis for any behavior relying on Ruby’s `#method_missing`. Matsumoto has also claimed Ada and Eiffel as influences on Ruby. + +When it came time to decide on a name for Ruby, Matsumoto and a colleague, Keiju Ishitsuka, considered several alternatives. They were looking for something that suggested Ruby’s relationship to Perl and also to shell scripting. In an [instant message exchange][2] that is well-worth reading, Ishitsuka and Matsumoto probably spend too much time thinking about the relationship between shells, clams, oysters, and pearls and get close to calling the Ruby language “Coral” or “Bisque” instead. Thankfully, they decided to go with “Ruby”, the idea being that it was, like “pearl”, the name of a valuable jewel. It also turns out that the birthstone for June is a pearl while the birthstone for July is a ruby, meaning that the name “Ruby” is another tongue-in-cheek “incremental improvement” name like C++ or C#. + +### Ruby Goes West + +Ruby grew popular in Japan very quickly. Soon after its initial release in 1995, Matz was hired by a Japanese software consulting group called Netlab (also known as Network Applied Communication Laboratory) to work on Ruby full-time. By 2000, only five years after it was initially released, Ruby was more popular in Japan than Python. But it was only just beginning to make its way to English-speaking countries. There had been a Japanese-language mailing list for Ruby discussion since almost the very beginning of Ruby’s existence, but the English-language mailing list wasn’t started until 1998. Initially, the English-language mailing list was used by Japanese Rubyists writing in English, but this gradually changed as awareness of Ruby grew. + +In 2000, Dave Thomas published Programming Ruby, the first English-language book to cover Ruby. The book became known as the “pickaxe” book for the pickaxe it featured on its cover. It introduced Ruby to many programmers in the West for the first time. Like it had in Japan, Ruby spread quickly, and by 2002 the English-language Ruby mailing list had more traffic than the original Japanese-language mailing list. + +By 2005, Ruby had become more popular, but it was still not a mainstream programming language. That changed with the release of Ruby on Rails. Ruby on Rails was the “killer app” for Ruby, and it did more than any other project to popularize Ruby. After the release of Ruby on Rails, interest in Ruby shot up across the board, as measured by the TIOBE language index: + +![][3] + +It’s sometimes joked that the only programs anybody writes in Ruby are Ruby-on-Rails web applications. That makes it sound as if Ruby on Rails completely took over the Ruby community, which is only partly true. While Ruby has certainly come to be known as that language people write Rails apps in, Rails owes as much to Ruby as Ruby owes to Rails. + +The Ruby philosophy heavily informed the design and implementation of Rails. David Heinemeier Hansson, who created Rails, often talks about how his first contact with Ruby was an almost religious experience. He has said that the encounter was so transformative that it “imbued him with a calling to do missionary work in service of Matz’s creation.” For Hansson, Ruby’s no-shackles approach was a politically courageous rebellion against the top-down impositions made by languages like Python and Java. He appreciated that the language trusted him and empowered him to make his own judgements about how best to express his programs. + +Like Matsumoto, Hansson claims that he created Rails out of a frustration with the status quo and a desire to make things better for himself. He, like Matsumoto, prioritized programmer happiness above all else, evaluating additions to Rails by what he calls “The Principle of The Bigger Smile.” Whatever made Hansson smile more was what made it into the Rails codebase. As a result, Rails would come to include unorthodox features like the “Inflector” class (which tries to map singular class names to plural database table names automatically) and Rails’ `Time` extensions (allowing programmers to write cute expressions like `2.days.ago`). To some, these features were truly weird, but the success of Rails is testament to the number of people who found it made their lives much easier. + +And so, while it might seem that Rails was an incidental application of Ruby that happened to become extremely popular, Rails in fact embodies many of Ruby’s core principles. Futhermore, it’s hard to see how Rails could have been built in any other language, given its dependence on Ruby’s macro-like class method calls to implement things like model associations. Some people might take the fact that so much of Ruby development revolves around Ruby on Rails as a sign of an unhealthy ecosystem, but there are good reasons that Ruby and Ruby on Rails are so intertwined. + +### The Future of Ruby + +People seem to have an inordinate amount of interest in whether or not Ruby (and Ruby on Rails) are dying. Since as early as 2011, it seems that Stack Overflow and Quora have been full of programmers asking whether or not they should bother learning Ruby if it will no longer be around in the next few years. These concerns are not unjustified; according to the TIOBE index and to Stack Overflow trends, Ruby and Ruby on Rails have been shrinking in popularity. Though Ruby on Rails was once the hot new thing, it has since been eclipsed by hotter and newer frameworks. + +One theory for why this has happened is that programmers are abandoning dynamically typed languages for statically typed ones. Analysts at TIOBE index figure that a rise in quality requirements have made runtime exceptions increasingly unacceptable. They cite TypeScript as an example of this trend—a whole new version of JavaScript was created just to ensure that client-side code could be written with the benefit of compile-time safety guarantees. + +A more likely answer, I think, is just that Ruby on Rails now has many more competitors than it once did. When Rails was first introduced in 2005, there weren’t that many ways to create web applications—the main alternative was Java. Today, you can create web applications using great frameworks built for Go, JavaScript, or Python, to name only the most popular options. The web world also seems to be moving toward a more distributed architecture for applications, meaning that, rather than having one codebase responsible for everything from database access to view rendering, responsibilites are split between different components that focus on doing one thing well. Rails feels overbroad and bloated for something as focused as a JSON API that talks to a JavaScript frontend. + +All that said, there are reasons to be optimistic about Ruby’s future. Both Rails and Ruby continue to be actively developed. Matsumoto and others are working hard on Ruby’s third major release, which they aim to make three times faster than the existing version of Ruby, possibly alleviating the performance concerns that have always dogged Ruby. And even if the world of web frameworks has become more diverse since 2005, that doesn’t mean that there won’t always be room for Ruby on Rails. It is now a mature tool with an enormous amount of built-in power that will always be a good choice for certain kinds of applications. + +But even if Ruby and Rails go the way of the dinosaurs, one thing that seems certain to survive is the Ruby ethos of programmer happiness. Ruby has had a profound influence on the design of many new programming languages, which have adopted many of its best ideas. Other new lanuages have tried to be “more modern” interpretations of Ruby: Elixir, for example, is a version of Ruby that emphasizes the functional programming paradigm, while Crystal, which is still in development, aims to be a statically typed version of Ruby. Many programmers around the world have fallen in love with Ruby and its syntax, so we can count on its influence persisting for a long while to come. + +If you enjoyed this post, more like it come out every two weeks! Follow [@TwoBitHistory][4] on Twitter or subscribe to the [RSS feed][5] to make sure you know when a new post is out. + +-------------------------------------------------------------------------------- + +via: https://twobithistory.org/2017/11/19/the-ruby-story.html + +作者:[Two-Bit History][a] +选题:[lujun9972][b] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]: https://twobithistory.org +[b]: https://github.com/lujun9972 +[1]: https://twobithistory.org/images/matz.png +[2]: http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/88819 +[3]: https://twobithistory.org/images/tiobe_ruby.png +[4]: https://twitter.com/TwoBitHistory +[5]: https://twobithistory.org/feed.xml diff --git a/sources/talk/20171229 Important Papers- Codd and the Relational Model.md b/sources/talk/20171229 Important Papers- Codd and the Relational Model.md new file mode 100644 index 0000000000..8fc8b1f701 --- /dev/null +++ b/sources/talk/20171229 Important Papers- Codd and the Relational Model.md @@ -0,0 +1,44 @@ +Important Papers: Codd and the Relational Model +====== +It’s hard to believe today, but the relational database was once the cool new kid on the block. In 2017, the relational model competes with all sorts of cutting-edge NoSQL technologies that make relational database systems seem old-fashioned and boring. Yet, 50 years ago, none of the dominant database systems were relational. Nobody had thought to structure their data that way. When the relational model did come along, it was a radical new idea that revolutionized the database world and spawned a multi-billion dollar industry. + +The relational model was introduced in 1970. Edgar F. Codd, a researcher at IBM, published a [paper][1] called “A Relational Model of Data for Large Shared Data Banks.” The paper was a rewrite of a paper he had circulated internally at IBM a year earlier. The paper is unassuming; Codd does not announce in his abstract that he has discovered a brilliant new approach to storing data. He only claims to have employed a novel tool (the mathematical notion of a “relation”) to address some of the inadequacies of the prevailing database models. + +In 1970, there were two schools of thought about how to structure a database: the hierarchical model and the network model. The hierarchical model was used by IBM’s Information Management System (IMS), the dominant database system at the time. The network model had been specified by a standards committee called CODASYL (which also—random tidbit—specified COBOL) and implemented by several other database system vendors. The two models were not really that different; both could be called “navigational” models. They persisted tree or graph data structures to disk using pointers to preserve the links between the data. Retrieving a record stored toward the bottom of the tree would involve first navigating through all of its ancestor records. These databases were fast (IMS is still used by many financial institutions partly for this reason, see [this excellent blog post][2]) but inflexible. Woe unto those database administrators who suddenly found themselves needing to query records from the bottom of the tree without having an obvious place to start at the top. + +Codd saw this inflexibility as a symptom of a larger problem. Programs using a hierarchical or network database had to know about how the stored data was structured. Programs had to know this because they were responsible for navigating down this structure to find the information they needed. This was so true that when Charles Bachman, a major pioneer of the network model, received a Turing Award for his work in 1973, he gave a speech titled “[The Programmer as Navigator][3].” Of course, if programs were saddled with this responsibility, then they would immediately break if the structure of the database ever changed. In the introduction to his 1970 paper, Codd motivates the search for a better model by arguing that we need “data independence,” which he defines as “the independence of application programs and terminal activities from growth in data types and changes in data representation.” The relational model, he argues, “appears to be superior in several respects to the graph or network model presently in vogue,” partly because, among other benefits, the relational model “provides a means of describing data with its natural structure only.” By this he meant that programs could safely ignore any artificial structures (like trees) imposed upon the data for storage and retrieval purposes only. + +To further illustrate the problem with the navigational models, Codd devotes the first section of his paper to an example data set involving machine parts and assembly projects. This dataset, he says, could be represented in existing systems in at least five different ways. Any program that is developed assuming one of five structures will fail when run against at least three of the other structures. The program could instead try to figure out ahead of time which of the structures it might be dealing with, but it would be difficult to do so in this specific case and practically impossible in the general case. So, as long as the program needs to know about how the data is structured, we cannot switch to an alternative structure without breaking the program. This is a real bummer because (and this is from the abstract) “changes in data representation will often be needed as a result of changes in query, update, and report traffic and natural growth in the types of stored information.” + +Codd then introduces his relational model. This model would be refined and expanded in subsequent papers: In 1971, Codd wrote about ALPHA, a SQL-like query language he created; in another 1971 paper, he introduced the first three normal forms we know and love today; and in 1972, he further developed relational algebra and relational calculus, the mathematically rigorous underpinnings of the relational model. But Codd’s 1970 paper contains the kernel of the relational idea: + +> The term relation is used here in its accepted mathematical sense. Given sets (not necessarily distinct), is a relation on these sets if it is a set of -tuples each of which has its first element from , its second element from , and so on. We shall refer to as the th domain of . As defined above, is said to have degree . Relations of degree 1 are often called unary, degree 2 binary, degree 3 ternary, and degree n-ary. + +Today, we call a relation a table, and a domain an attribute or a column. The word “table” actually appears nowhere in the paper, though Codd’s visual representations of relations (which he calls “arrays”) do resemble tables. Codd defines several more terms, some of which we continue to use and others we have replaced. He explains primary and foreign keys, as well as what he calls the “active domain,” which is the set of all distinct values that actually appear in a given domain or column. He then spends some time distinguishing between a “simple” and a “nonsimple” domain. A simple domain contains “atomic” or “nondecomposable” values, like integers. A nonsimple domain has relations as elements. The example Codd gives here is that of an employee with a salary history. The salary history is not one salary but a collection of salaries each associated with a date. So a salary history cannot be represented by a single number or string. + +It’s not obvious how one could store a nonsimple domain in a multi-dimensional array, AKA a table. The temptation might be to denote the nonsimple relationship using some kind of pointer, but then we would be repeating the mistakes of the navigational models. Instead. Codd introduces normalization, which at least in the 1970 paper involves nothing more than turning nonsimple domains into simple ones. This is done by expanding the child relation so that it includes the primary key of the parent. Each tuple of the child relation references its parent using simple domains, eliminating the need for a nonsimple domain in the parent. Normalization means no pointers, sidestepping all the problems they cause in the navigational models. + +At this point, anyone reading Codd’s paper would have several questions, such as “Okay, how would I actually query such a system?” Codd mentions the possibility of creating a universal sublanguage for querying relational databases from other programs, but declines to define such a language in this particular paper. He does explain, in mathematical terms, many of the fundamental operations such a language would have to support, like joins, “projection” (`SELECT` in SQL), and “restriction” (`WHERE`). The amazing thing about Codd’s 1970 paper is that, really, all the ideas are there—we’ve been writing `SELECT` statements and joins for almost half a century now. + +Codd wraps up the paper by discussing ways in which a normalized relational database, on top of its other benefits, can reduce redundancy and improve consistency in data storage. Altogether, the paper is only 11 pages long and not that difficult of a read. I encourage you to look through it yourself. It would be another ten years before Codd’s ideas were properly implemented in a functioning system, but, when they finally were, those systems were so obviously better than previous systems that they took the world by storm. + +If you enjoyed this post, more like it come out every two weeks! Follow [@TwoBitHistory][4] on Twitter or subscribe to the [RSS feed][5] to make sure you know when a new post is out. + +-------------------------------------------------------------------------------- + +via: https://twobithistory.org/2017/12/29/codd-relational-model.html + +作者:[Two-Bit History][a] +选题:[lujun9972][b] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]: https://twobithistory.org +[b]: https://github.com/lujun9972 +[1]: https://cs.uwaterloo.ca/~david/cs848s14/codd-relational.pdf +[2]: https://twobithistory.org/2017/10/07/the-most-important-database.html +[3]: https://pdfs.semanticscholar.org/f371/d196bf0e7b43df6dcbbc44de461925a21709.pdf +[4]: https://twitter.com/TwoBitHistory +[5]: https://twobithistory.org/feed.xml diff --git a/sources/talk/20180209 How writing can change your career for the better, even if you don-t identify as a writer.md b/sources/talk/20180209 How writing can change your career for the better, even if you don-t identify as a writer.md index 98d57bcca3..55618326c6 100644 --- a/sources/talk/20180209 How writing can change your career for the better, even if you don-t identify as a writer.md +++ b/sources/talk/20180209 How writing can change your career for the better, even if you don-t identify as a writer.md @@ -1,4 +1,4 @@ -How writing can change your career for the better, even if you don't identify as a writer Translating by FelixYFZ +How writing can change your career for the better, even if you don't identify as a writer ====== Have you read Marie Kondo's book [The Life-Changing Magic of Tidying Up][1]? Or did you, like me, buy it and read a little bit and then add it to the pile of clutter next to your bed? diff --git a/sources/talk/20180527 Whatever Happened to the Semantic Web.md b/sources/talk/20180527 Whatever Happened to the Semantic Web.md new file mode 100644 index 0000000000..22d48c150a --- /dev/null +++ b/sources/talk/20180527 Whatever Happened to the Semantic Web.md @@ -0,0 +1,106 @@ +Whatever Happened to the Semantic Web? +====== +In 2001, Tim Berners-Lee, inventor of the World Wide Web, published an article in Scientific American. Berners-Lee, along with two other researchers, Ora Lassila and James Hendler, wanted to give the world a preview of the revolutionary new changes they saw coming to the web. Since its introduction only a decade before, the web had fast become the world’s best means for sharing documents with other people. Now, the authors promised, the web would evolve to encompass not just documents but every kind of data one could imagine. + +They called this new web the Semantic Web. The great promise of the Semantic Web was that it would be readable not just by humans but also by machines. Pages on the web would be meaningful to software programs—they would have semantics—allowing programs to interact with the web the same way that people do. Programs could exchange data across the Semantic Web without having to be explicitly engineered to talk to each other. According to Berners-Lee, Lassila, and Hendler, a typical day living with the myriad conveniences of the Semantic Web might look something like this: + +> The entertainment system was belting out the Beatles’ “We Can Work It Out” when the phone rang. When Pete answered, his phone turned the sound down by sending a message to all the other local devices that had a volume control. His sister, Lucy, was on the line from the doctor’s office: “Mom needs to see a specialist and then has to have a series of physical therapy sessions. Biweekly or something. I’m going to have my agent set up the appointments.” Pete immediately agreed to share the chauffeuring. At the doctor’s office, Lucy instructed her Semantic Web agent through her handheld Web browser. The agent promptly retrieved the information about Mom’s prescribed treatment within a 20-mile radius of her home and with a rating of excellent or very good on trusted rating services. It then began trying to find a match between available appointment times (supplied by the agents of individual providers through their Web sites) and Pete’s and Lucy’s busy schedules. + +The vision was that the Semantic Web would become a playground for intelligent “agents.” These agents would automate much of the work that the world had only just learned to do on the web. + +![][1] + +For a while, this vision enticed a lot of people. After new technologies such as AJAX led to the rise of what Silicon Valley called Web 2.0, Berners-Lee began referring to the Semantic Web as Web 3.0. Many thought that the Semantic Web was indeed the inevitable next step. A New York Times article published in 2006 quotes a speech Berners-Lee gave at a conference in which he said that the extant web would, twenty years in the future, be seen as only the “embryonic” form of something far greater. A venture capitalist, also quoted in the article, claimed that the Semantic Web would be “profound,” and ultimately “as obvious as the web seems obvious to us today.” + +Of course, the Semantic Web we were promised has yet to be delivered. In 2018, we have “agents” like Siri that can do certain tasks for us. But Siri can only do what it can because engineers at Apple have manually hooked it up to a medley of web services each capable of answering only a narrow category of questions. An important consequence is that, without being large and important enough for Apple to care, you cannot advertise your services directly to Siri from your own website. Unlike the physical therapists that Berners-Lee and his co-authors imagined would be able to hang out their shingles on the web, today we are stuck with giant, centralized repositories of information. Today’s physical therapists must enter information about their practice into Google or Yelp, because those are the only services that the smartphone agents know how to use and the only ones human beings will bother to check. The key difference between our current reality and the promised Semantic future is best captured by this throwaway aside in the excerpt above: “…appointment times (supplied by the agents of individual providers through **their** Web sites)…” + +In fact, over the last decade, the web has not only failed to become the Semantic Web but also threatened to recede as an idea altogether. We now hardly ever talk about “the web” and instead talk about “the internet,” which as of 2016 has become such a common term that newspapers no longer capitalize it. (To be fair, they stopped capitalizing “web” too.) Some might still protest that the web and the internet are two different things, but the distinction gets less clear all the time. The web we have today is slowly becoming a glorified app store, just the easiest way among many to download software that communicates with distant servers using closed protocols and schemas, making it functionally identical to the software ecosystem that existed before the web. How did we get here? If the effort to build a Semantic Web had succeeded, would the web have looked different today? Or have there been so many forces working against a decentralized web for so long that the Semantic Web was always going to be stillborn? + +### Semweb Hucksters and Their Metacrap + +To some more practically minded engineers, the Semantic Web was, from the outset, a utopian dream. + +The basic idea behind the Semantic Web was that everyone would use a new set of standards to annotate their webpages with little bits of XML. These little bits of XML would have no effect on the presentation of the webpage, but they could be read by software programs to divine meaning that otherwise would only be available to humans. + +The bits of XML were a way of expressing metadata about the webpage. We are all familiar with metadata in the context of a file system: When we look at a file on our computers, we can see when it was created, when it was last updated, and whom it was originally created by. Likewise, webpages on the Semantic Web would be able to tell your browser who authored the page and perhaps even where that person went to school, or where that person is currently employed. In theory, this information would allow Semantic Web browsers to answer queries across a large collection of webpages. In their article for Scientific American, Berners-Lee and his co-authors explain that you could, for example, use the Semantic Web to look up a person you met at a conference whose name you only partially remember. + +Cory Doctorow, a blogger and digital rights activist, published an influential essay in 2001 that pointed out the many problems with depending on voluntarily supplied metadata. A world of “exhaustive, reliable” metadata would be wonderful, he argued, but such a world was “a pipe-dream, founded on self-delusion, nerd hubris, and hysterically inflated market opportunities.” Doctorow had found himself in a series of debates over the Semantic Web at tech conferences and wanted to catalog the serious issues that the Semantic Web enthusiasts (Doctorow calls them “semweb hucksters”) were overlooking. The essay, titled “Metacrap,” identifies seven problems, among them the obvious fact that most web users were likely to provide either no metadata at all or else lots of misleading metadata meant to draw clicks. Even if users were universally diligent and well-intentioned, in order for the metadata to be robust and reliable, users would all have to agree on a single representation for each important concept. Doctorow argued that in some cases a single representation might not be appropriate, desirable, or fair to all users. + +Indeed, the web had already seen people abusing the HTML `` tag (introduced at least as early as HTML 4) in an attempt to improve the visibility of their webpages in search results. In a 2004 paper, Ben Munat, then an academic at Evergreen State College, explains how search engines once experimented with using keywords supplied via the `` tag to index results, but soon discovered that unscrupulous webpage authors were including tags unrelated to the actual content of their webpage. As a result, search engines came to ignore the `` tag in favor of using complex algorithms to analyze the actual content of a webpage. Munat concludes that a general-purpose Semantic Web is unworkable, and that the focus should be on specific domains within medicine and science. + +Others have also seen the Semantic Web project as tragically flawed, though they have located the flaw elsewhere. Aaron Swartz, the famous programmer and another digital rights activist, wrote in an unfinished book about the Semantic Web published after his death that Doctorow was “attacking a strawman.” Nobody expected that metadata on the web would be thoroughly accurate and reliable, but the Semantic Web, or at least a more realistically scoped version of it, remained possible. The problem, in Swartz’ view, was the “formalizing mindset of mathematics and the institutional structure of academics” that the “semantic Webheads” brought to bear on the challenge. In forums like the World Wide Web Consortium (W3C), a huge amount of effort and discussion went into creating standards before there were any applications out there to standardize. And the standards that emerged from these “Talmudic debates” were so abstract that few of them ever saw widespread adoption. The few that did, like XML, were “uniformly scourges on the planet, offenses against hardworking programmers that have pushed out sensible formats (like JSON) in favor of overly-complicated hairballs with no basis in reality.” The Semantic Web might have thrived if, like the original web, its standards were eagerly adopted by everyone. But that never happened because—as [has been discussed][2] on this blog before—the putative benefits of something like XML are not easy to sell to a programmer when the alternatives are both entirely sufficient and much easier to understand. + +### Building the Semantic Web + +If the Semantic Web was not an outright impossibility, it was always going to require the contributions of lots of clever people working in concert. + +The long effort to build the Semantic Web has been said to consist of four phases. The first phase, which lasted from 2001 to 2005, was the golden age of Semantic Web activity. Between 2001 and 2005, the W3C issued a slew of new standards laying out the foundational technologies of the Semantic future. + +The most important of these was the Resource Description Framework (RDF). The W3C issued the first version of the RDF standard in 2004, but RDF had been floating around since 1997, when a W3C working group introduced it in a draft specification. RDF was originally conceived of as a tool for modeling metadata and was partly based on earlier attempts by Ramanathan Guha, an Apple engineer, to develop a metadata system for files stored on Apple computers. The Semantic Web working groups at W3C repurposed RDF to represent arbitrary kinds of general knowledge. + +RDF would be the grammar in which Semantic webpages expressed information. The grammar is a simple one: Facts about the world are expressed in RDF as triplets of subject, predicate, and object. Tim Bray, who worked with Ramanathan Guha on an early version of RDF, gives the following example, describing TV shows and movies: + +``` +@prefix rdf: . + +@prefix ex: . + + +ex:vincent_donofrio ex:starred_in ex:law_and_order_ci . + +ex:law_and_order_ci rdf:type ex:tv_show . + +ex:the_thirteenth_floor ex:similar_plot_as ex:the_matrix . +``` + +The syntax is not important, especially since RDF can be represented in a number of formats, including XML and JSON. This example is in a format called Turtle, which expresses RDF triplets as straightforward sentences terminated by periods. The three essential sentences, which appear above after the `@prefix` preamble, state three facts: Vincent Donofrio starred in Law and Order, Law and Order is a type of TV Show, and the movie The Thirteenth Floor has a similar plot as The Matrix. (If you don’t know who Vincent Donofrio is and have never seen The Thirteenth Floor, I, too, was watching Nickelodeon and sipping Capri Suns in 1999.) + +Other specifications finalized and drafted during this first era of Semantic Web development describe all the ways in which RDF can be used. RDF in Attributes (RDFa) defines how RDF can be embedded in HTML so that browsers, search engines, and other programs can glean meaning from a webpage. RDF Schema and another standard called OWL allows RDF authors to demarcate the boundary between valid and invalid RDF statements in their RDF documents. RDF Schema and OWL, in other words, are tools for creating what are known as ontologies, explicit specifications of what can and cannot be said within a specific domain. An ontology might include a rule, for example, expressing that no person can be the mother of another person without also being a parent of that person. The hope was that these ontologies would be widely used not only to check the accuracy of RDF found in the wild but also to make inferences about omitted information. + +In 2006, Tim Berners-Lee posted a short article in which he argued that the existing work on Semantic Web standards needed to be supplemented by a concerted effort to make semantic data available on the web. Furthermore, once on the web, it was important that semantic data link to other kinds of semantic data, ensuring the rise of a data-based web as interconnected as the existing web. Berners-Lee used the term “linked data” to describe this ideal scenario. Though “linked data” was in one sense just a recapitulation of the original vision for the Semantic Web, it became a term that people could rally around and thus amounted to a rebranding of the Semantic Web project. + +Berners-Lee’s article launched the second phase of the Semantic Web’s development, where the focus shifted from setting standards and building toy examples to creating and popularizing large RDF datasets. Perhaps the most successful of these datasets was [DBpedia][3], a giant repository of RDF triplets extracted from Wikipedia articles. DBpedia, which made heavy use of the Semantic Web standards that had been developed in the first half of the 2000s, was a standout example of what could be accomplished using the W3C’s new formats. Today DBpedia describes 4.58 million entities and is used by organizations like the NY Times, BBC, and IBM, which employed DBpedia as a knowledge source for IBM Watson, the Jeopardy-winning artificial intelligence system. + +![][4] + +The third phase of the Semantic Web’s development involved adapting the W3C’s standards to fit the actual practices and preferences of web developers. By 2008, JSON had begun its meteoric rise to popularity. Whereas XML came packaged with a bunch of associated technologies of indeterminate purpose (XLST, XPath, XQuery, XLink), JSON was just JSON. It was less verbose and more readable. Manu Sporny, an entrepreneur and member of the W3C, had already started using JSON at his company and wanted to find an easy way for RDFa and JSON to work together. The result would be JSON-LD, which in essence was RDF reimagined for a world that had chosen JSON over XML. Sporny, together with his CTO, Dave Longley, issued a draft specification of JSON-LD in 2010. For the next few years, JSON-LD and an updated RDF specification would be the primary focus of Semantic Web work at the W3C. JSON-LD could be used on its own or it could be embedded within a `