From be1f37c733fc34c944151a3c0042b34157d3444b Mon Sep 17 00:00:00 2001 From: Vonng Date: Tue, 13 Feb 2018 17:56:51 +0800 Subject: [PATCH] fix format --- ddia/ch1.md | 26 +++++--- ddia/ch2.md | 169 ++++++++++++++-------------------------------------- ddia/ch3.md | 68 ++++++++------------- 3 files changed, 88 insertions(+), 175 deletions(-) diff --git a/ddia/ch1.md b/ddia/ch1.md index 25efda2..68b43d8 100644 --- a/ddia/ch1.md +++ b/ddia/ch1.md @@ -14,17 +14,27 @@ 数据密集型应用通常由标准组件构建而成,标准组件提供了很多通用的功能:例如,许多应用程序需要: -***数据库(database)***:存储数据,以便自己或其他应用程序之后能再次找到 +***数据库(database)*** -***缓存(cache)***:记住开销昂贵操作的结果,加快读取速度 +​ 存储数据,以便自己或其他应用程序之后能再次找到 -***搜索索引(search indexes)***:允许用户按关键字搜索数据,或以各种方式对数据进行过滤 +***缓存(cache)*** -***流处理(stream processing)***:向其他进程发送消息,进行异步处理 +​ 记住开销昂贵操作的结果,加快读取速度 -***批处理(batch processing)***: 定期压缩累积的大批量数据 +***搜索索引(search indexes)*** -如果这些功能听上去平淡无奇,那真让人心酸。因为这些**数据系统(data system)**是如此成功的抽象,我们一直用着它们,却没有想太多。绝大多数工程师不会想从零开始编写存储引擎,开发应用时,数据库已经是足够完美工具了。 +​ 允许用户按关键字搜索数据,或以各种方式对数据进行过滤 + +***流处理(stream processing)*** + +​ 向其他进程发送消息,进行异步处理 + +***批处理(batch processing)*** + +​ 定期压缩累积的大批量数据 + +如果这些功能听上去平淡无奇,那真让人心酸。因为这些**数据系统(data system)**是如此成功的抽象,我们一直用着它们,却没有想太多。绝大多数工程师不会想从零开始编写存储引擎,开发应用时,数据库已经是足够完美的工具了。 但事实并没有这么简单。不同的应用有不同的需求,所以数据库系统也是百花齐放,有着各式各样的特性。有很多不同的手段可以实现缓存,也有好几种方法可以搞定搜索索引,诸如此类。所以开发应用时仍然有必要弄清楚什么样的工具和方法最适合手头的工作。而且,当单个工具解决不了你的问题时,你会发现组合使用这些工具还是挺有难度的。 @@ -247,11 +257,11 @@ > #### 实践中的百分位点 > -> 在多重调用的后端服务里,高百分位数变得特别重要。即使并行调用,最终用户请求仍然需要等待最慢的并行呼叫完成。如图1-5所示,只需要一个缓慢的呼叫就可以使整个最终用户请求变慢。即使只有一小部分后端呼叫速度较慢,如果最终用户请求需要多个后端调用,则获得较慢调用的机会也会增加,因此较高比例的最终用户请求速度会变慢(效果称为尾部延迟放大【24】)。 +> 在多重调用的后端服务里,高百分位数变得特别重要。即使并行调用,最终用户请求仍然需要等待最慢的并行呼叫完成。如[图1-5](img/fig1-5.png)所示,只需要一个缓慢的呼叫就可以使整个最终用户请求变慢。即使只有一小部分后端呼叫速度较慢,如果最终用户请求需要多个后端调用,则获得较慢调用的机会也会增加,因此较高比例的最终用户请求速度会变慢(效果称为尾部延迟放大【24】)。 > > 如果您想将响应时间百分点添加到您的服务的监视仪表板,则需要持续有效地计算它们。例如,您可能希望在最近10分钟内保持请求响应时间的滚动窗口。每一分钟,您都会计算出该窗口中的中值和各种百分数,并将这些度量值绘制在图上。 > -> 简单的实现是在时间窗口内保存所有请求的响应时间列表,并且每分钟对列表进行排序。如果对你来说效率太低,那么有一些算法能够以最小的CPU和内存成本(如正向衰减【25】,t-digest【26】或HdrHistogram 【27】)来计算百分位数的近似值。请注意,平均百分比(例如,减少时间分辨率或合并来自多台机器的数据)在数学上没有意义 - 聚合响应时间数据的正确方法是添加直方图【28】。 +> 简单的实现是在时间窗口内保存所有请求的响应时间列表,并且每分钟对列表进行排序。如果对你来说效率太低,那么有一些算法能够以最小的CPU和内存成本(如前向衰减【25】,t-digest【26】或HdrHistogram 【27】)来计算百分位数的近似值。请注意,平均百分比(例如,减少时间分辨率或合并来自多台机器的数据)在数学上没有意义 - 聚合响应时间数据的正确方法是添加直方图【28】。 ![](img/fig1-5.png) diff --git a/ddia/ch2.md b/ddia/ch2.md index b4868ae..541e0d2 100644 --- a/ddia/ch2.md +++ b/ddia/ch2.md @@ -149,13 +149,13 @@ JSON表示比图2-1中的多表模式具有更好的局部性。如果要在关 而且,即使应用程序的初始版本适合无连接的文档模型,随着功能添加到应用程序中,数据也会变得更加互联。例如,考虑一下我们可以对简历例子进行的一些修改: -* 组织和学校作为实体 +***组织和学校作为实体*** - 在前面的描述中,组织(用户工作的公司)和school_name(他们学习的地方)只是字符串。也许他们应该是对实体的引用呢?然后,每个组织,学校或大学都可以拥有自己的网页(标识,新闻提要等)。每个简历可以链接到它所提到的组织和学校,并且包括他们的标识和其他信息(参见图2-3,来自LinkedIn的一个例子)。 +在前面的描述中,组织(用户工作的公司)和school_name(他们学习的地方)只是字符串。也许他们应该是对实体的引用呢?然后,每个组织,学校或大学都可以拥有自己的网页(标识,新闻提要等)。每个简历可以链接到它所提到的组织和学校,并且包括他们的标识和其他信息(参见图2-3,来自LinkedIn的一个例子)。 -* 推荐 +***推荐*** - 假设你想添加一个新的功能:一个用户可以为另一个用户写一个推荐。推荐在用户的简历上显示,并附上推荐用户的姓名和照片。如果推荐人更新他们的照片,他们写的任何建议都需要反映新的照片。因此,推荐应该引用作者的个人资料。 +假设你想添加一个新的功能:一个用户可以为另一个用户写一个推荐。推荐在用户的简历上显示,并附上推荐用户的姓名和照片。如果推荐人更新他们的照片,他们写的任何建议都需要反映新的照片。因此,推荐应该引用作者的个人资料。 ![](img/fig2-3.png) @@ -367,8 +367,6 @@ li.selected > p { 在这里,CSS选择器`li.selected> p`声明了我们想要应用蓝色样式的元素的模式:即直接父元素是一个CSS元素的`
  • `元素的所有`

    `元素。 示例中的元素`

    Sharks `匹配此模式,但`

    Whales `不匹配,因为其`

  • `父类缺少`class =“selected”`。 - - 如果你使用XSL而不是CSS,你可以做类似的事情: ```xml @@ -908,29 +906,19 @@ Datalog方法需要对本章讨论的其他查询语言采取不同的思维方 ## 参考文献 -1. Edgar F. Codd: - “[A Relational Model of Data for Large Shared Data Banks](https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf),” *Communications of the ACM*, volume 13, number - 6, pages 377–387, June 1970. - [doi:10.1145/362384.362685](http://dx.doi.org/10.1145/362384.362685) +1. Edgar F. Codd: “[A Relational Model of Data for Large Shared Data Banks](https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf),” *Communications of the ACM*, volume 13, number 6, pages 377–387, June 1970. [doi:10.1145/362384.362685](http://dx.doi.org/10.1145/362384.362685) -1. Michael Stonebraker and Joseph M. Hellerstein: - “[What Goes Around Comes Around](http://mitpress2.mit.edu/books/chapters/0262693143chapm1.pdf),” - in *Readings in Database Systems*, 4th edition, MIT Press, pages 2–41, 2005. - ISBN: 978-0-262-69314-1 +1. Michael Stonebraker and Joseph M. Hellerstein: “[What Goes Around Comes Around](http://mitpress2.mit.edu/books/chapters/0262693143chapm1.pdf),” + in *Readings in Database Systems*, 4th edition, MIT Press, pages 2–41, 2005. ISBN: 978-0-262-69314-1 -1. Pramod J. Sadalage and - Martin Fowler: *NoSQL Distilled*. Addison-Wesley, August 2012. ISBN: +1. Pramod J. Sadalage and Martin Fowler: *NoSQL Distilled*. Addison-Wesley, August 2012. ISBN: 978-0-321-82662-6 -1. Eric Evans: - “[NoSQL: What's in a Name?](http://blog.sym-link.com/2009/10/30/nosql_whats_in_a_name.html),” *blog.sym-link.com*, October 30, 2009. +1. Eric Evans: “[NoSQL: What's in a Name?](http://blog.sym-link.com/2009/10/30/nosql_whats_in_a_name.html),” *blog.sym-link.com*, October 30, 2009. -1. James Phillips: - “[Surprises in Our NoSQL Adoption Survey](http://blog.couchbase.com/nosql-adoption-survey-surprises),” *blog.couchbase.com*, February 8, 2012. +1. James Phillips: “[Surprises in Our NoSQL Adoption Survey](http://blog.couchbase.com/nosql-adoption-survey-surprises),” *blog.couchbase.com*, February 8, 2012. -1. Michael Wagner: - *SQL/XML:2006 – Evaluierung der Standardkonformität ausgewählter Datenbanksysteme*. - Diplomica Verlag, Hamburg, 2010. ISBN: 978-3-836-64609-3 +1. Michael Wagner: *SQL/XML:2006 – Evaluierung der Standardkonformität ausgewählter Datenbanksysteme*. Diplomica Verlag, Hamburg, 2010. ISBN: 978-3-836-64609-3 1. “[XML Data in SQL Server](http://technet.microsoft.com/en-us/library/bb522446.aspx),” SQL Server 2012 documentation, *technet.microsoft.com*, 2013. @@ -942,147 +930,80 @@ Datalog方法需要对本章讨论的其他查询语言采取不同的思维方 1. “[Apache CouchDB 1.6 Documentation](http://docs.couchdb.org/en/latest/),” *docs.couchdb.org*, 2014. -1. Lin Qiao, Kapil Surlaker, Shirshanka Das, et al.: - “[On Brewing Fresh Espresso: LinkedIn’s Distributed Data Serving Platform](http://www.slideshare.net/amywtang/espresso-20952131),” at *ACM International Conference on Management - of Data* (SIGMOD), June 2013. +1. Lin Qiao, Kapil Surlaker, Shirshanka Das, et al.: “[On Brewing Fresh Espresso: LinkedIn’s Distributed Data Serving Platform](http://www.slideshare.net/amywtang/espresso-20952131),” at *ACM International Conference on Management of Data* (SIGMOD), June 2013. -1. Rick Long, Mark Harrington, Robert Hain, and Geoff Nicholls: - *IMS Primer*. - IBM Redbook SG24-5352-00, IBM International Technical Support Organization, January 2000. +1. Rick Long, Mark Harrington, Robert Hain, and Geoff Nicholls: *IMS Primer*. IBM Redbook SG24-5352-00, IBM International Technical Support Organization, January 2000. -1. Stephen D. Bartlett: - “[IBM’s IMS—Myths, Realities, and Opportunities](ftp://public.dhe.ibm.com/software/data/ims/pdf/TCG2013015LI.pdf),” The Clipper Group Navigator, TCG2013015LI, July 2013. +1. Stephen D. Bartlett: “[IBM’s IMS—Myths, Realities, and Opportunities](ftp://public.dhe.ibm.com/software/data/ims/pdf/TCG2013015LI.pdf),” The Clipper Group Navigator, TCG2013015LI, July 2013. -1. Sarah Mei: - “[Why You Should Never Use MongoDB](http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/),” - *sarahmei.com*, November 11, 2013. +1. Sarah Mei: “[Why You Should Never Use MongoDB](http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/),” *sarahmei.com*, November 11, 2013. -1. J. S. Knowles and D. M. R. Bell: - “The CODASYL Model,” in *Databases—Role and Structure: An Advanced Course*, edited by P. M. - Stocker, P. M. D. Gray, and M. P. Atkinson, pages 19–56, Cambridge University Press, 1984. ISBN: - 978-0-521-25430-4 +1. J. S. Knowles and D. M. R. Bell: “The CODASYL Model,” in *Databases—Role and Structure: An Advanced Course*, edited by P. M. Stocker, P. M. D. Gray, and M. P. Atkinson, pages 19–56, Cambridge University Press, 1984. ISBN: 978-0-521-25430-4 -1. Charles W. Bachman: - “[The Programmer as Navigator](http://dl.acm.org/citation.cfm?id=362534),” - *Communications of the ACM*, volume 16, number 11, pages 653–658, November 1973. - [doi:10.1145/355611.362534](http://dx.doi.org/10.1145/355611.362534) +1. Charles W. Bachman: “[The Programmer as Navigator](http://dl.acm.org/citation.cfm?id=362534),” *Communications of the ACM*, volume 16, number 11, pages 653–658, November 1973. [doi:10.1145/355611.362534](http://dx.doi.org/10.1145/355611.362534) -1. Joseph M. Hellerstein, Michael Stonebraker, and James Hamilton: - “[Architecture of a Database System](http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf),” - *Foundations and Trends in Databases*, volume 1, number 2, pages 141–259, November 2007. - [doi:10.1561/1900000002](http://dx.doi.org/10.1561/1900000002) +1. Joseph M. Hellerstein, Michael Stonebraker, and James Hamilton: “[Architecture of a Database System](http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf),” + *Foundations and Trends in Databases*, volume 1, number 2, pages 141–259, November 2007. [doi:10.1561/1900000002](http://dx.doi.org/10.1561/1900000002) -1. Sandeep Parikh and Kelly Stirman: - “[Schema Design for Time Series Data in MongoDB](http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb),” *blog.mongodb.org*, October 30, 2013. +1. Sandeep Parikh and Kelly Stirman: “[Schema Design for Time Series Data in MongoDB](http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb),” *blog.mongodb.org*, October 30, 2013. -1. Martin Fowler: - “[Schemaless Data Structures](http://martinfowler.com/articles/schemaless/),” - *martinfowler.com*, January 7, 2013. +1. Martin Fowler: “[Schemaless Data Structures](http://martinfowler.com/articles/schemaless/),” *martinfowler.com*, January 7, 2013. -1. Amr Awadallah: - “[Schema-on-Read vs. Schema-on-Write](http://www.slideshare.net/awadallah/schemaonread-vs-schemaonwrite),” at *Berkeley EECS RAD Lab Retreat*, Santa Cruz, CA, May 2009. +1. Amr Awadallah: “[Schema-on-Read vs. Schema-on-Write](http://www.slideshare.net/awadallah/schemaonread-vs-schemaonwrite),” at *Berkeley EECS RAD Lab Retreat*, Santa Cruz, CA, May 2009. -1. Martin Odersky: - “[The Trouble with Types](http://www.infoq.com/presentations/data-types-issues),” - at *Strange Loop*, September 2013. +1. Martin Odersky: “[The Trouble with Types](http://www.infoq.com/presentations/data-types-issues),” at *Strange Loop*, September 2013. -1. Conrad Irwin: - “[MongoDB—Confessions of a PostgreSQL Lover](https://speakerdeck.com/conradirwin/mongodb-confessions-of-a-postgresql-lover),” at *HTML5DevConf*, October 2013. +1. Conrad Irwin: “[MongoDB—Confessions of a PostgreSQL Lover](https://speakerdeck.com/conradirwin/mongodb-confessions-of-a-postgresql-lover),” at *HTML5DevConf*, October 2013. 1. “[Percona Toolkit Documentation: pt-online-schema-change](http://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html),” Percona Ireland Ltd., 2013. -1. Rany Keddo, Tobias Bielohlawek, and Tobias Schmidt: - “[Large Hadron Migrator](https://github.com/soundcloud/lhm),” SoundCloud, 2013. +1. Rany Keddo, Tobias Bielohlawek, and Tobias Schmidt: “[Large Hadron Migrator](https://github.com/soundcloud/lhm),” SoundCloud, 2013. Shlomi Noach: -1. Shlomi Noach: “[gh-ost: GitHub's Online Schema Migration Tool for MySQL](http://githubengineering.com/gh-ost-github-s-online-migration-tool-for-mysql/),” *githubengineering.com*, August 1, 2016. -1. James C. Corbett, Jeffrey Dean, Michael Epstein, et al.: - “[Spanner: Google’s Globally-Distributed Database](http://research.google.com/archive/spanner.html),” - at *10th USENIX Symposium on Operating System Design and Implementation* (OSDI), +1. James C. Corbett, Jeffrey Dean, Michael Epstein, et al.: “[Spanner: Google’s Globally-Distributed Database](http://research.google.com/archive/spanner.html),” at *10th USENIX Symposium on Operating System Design and Implementation* (OSDI), October 2012. -1. Donald K. Burleson: - “[Reduce I/O with Oracle Cluster Tables](http://www.dba-oracle.com/oracle_tip_hash_index_cluster_table.htm),” *dba-oracle.com*. +1. Donald K. Burleson: “[Reduce I/O with Oracle Cluster Tables](http://www.dba-oracle.com/oracle_tip_hash_index_cluster_table.htm),” *dba-oracle.com*. -1. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, et al.: - “[Bigtable: A Distributed Storage System for Structured Data](http://research.google.com/archive/bigtable.html),” at *7th USENIX Symposium on Operating System Design and - Implementation* (OSDI), November 2006. +1. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, et al.: “[Bigtable: A Distributed Storage System for Structured Data](http://research.google.com/archive/bigtable.html),” at *7th USENIX Symposium on Operating System Design and Implementation* (OSDI), November 2006. -1. Bobbie J. Cochrane and Kathy A. McKnight: - “[DB2 JSON Capabilities, Part 1: Introduction to DB2 JSON](http://www.ibm.com/developerworks/data/library/techarticle/dm-1306nosqlforjson1/),” IBM developerWorks, June 20, 2013. +1. Bobbie J. Cochrane and Kathy A. McKnight: “[DB2 JSON Capabilities, Part 1: Introduction to DB2 JSON](http://www.ibm.com/developerworks/data/library/techarticle/dm-1306nosqlforjson1/),” IBM developerWorks, June 20, 2013. -1. Herb Sutter: - “[The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software](http://www.gotw.ca/publications/concurrency-ddj.htm),” *Dr. Dobb's Journal*, - volume 30, number 3, pages 202-210, March 2005. +1. Herb Sutter: “[The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software](http://www.gotw.ca/publications/concurrency-ddj.htm),” *Dr. Dobb's Journal*, volume 30, number 3, pages 202-210, March 2005. -1. Joseph M. Hellerstein: - “[The Declarative Imperative: Experiences and Conjectures in Distributed Logic](http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-90.pdf),” Electrical Engineering and - Computer Sciences, University of California at Berkeley, Tech report UCB/EECS-2010-90, June - 2010. +1. Joseph M. Hellerstein: “[The Declarative Imperative: Experiences and Conjectures in Distributed Logic](http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-90.pdf),” Electrical Engineering and Computer Sciences, University of California at Berkeley, Tech report UCB/EECS-2010-90, June 2010. -1. Jeffrey Dean and Sanjay Ghemawat: - “[MapReduce: Simplified Data Processing on Large Clusters](http://research.google.com/archive/mapreduce.html),” at *6th USENIX Symposium on Operating System Design and - Implementation* (OSDI), December 2004. +1. Jeffrey Dean and Sanjay Ghemawat: “[MapReduce: Simplified Data Processing on Large Clusters](http://research.google.com/archive/mapreduce.html),” at *6th USENIX Symposium on Operating System Design and Implementation* (OSDI), December 2004. -1. Craig Kerstiens: - “[JavaScript in Your Postgres](https://blog.heroku.com/javascript_in_your_postgres),” - *blog.heroku.com*, June 5, 2013. +1. Craig Kerstiens: “[JavaScript in Your Postgres](https://blog.heroku.com/javascript_in_your_postgres),” *blog.heroku.com*, June 5, 2013. -1. Nathan Bronson, Zach Amsden, George Cabrera, et al.: - “[TAO: Facebook’s Distributed Data Store for the Social Graph](https://www.usenix.org/conference/atc13/technical-sessions/presentation/bronson),” at - *USENIX Annual Technical Conference* (USENIX ATC), June 2013. +1. Nathan Bronson, Zach Amsden, George Cabrera, et al.: “[TAO: Facebook’s Distributed Data Store for the Social Graph](https://www.usenix.org/conference/atc13/technical-sessions/presentation/bronson),” at *USENIX Annual Technical Conference* (USENIX ATC), June 2013. 1. “[Apache TinkerPop3.2.3 Documentation](http://tinkerpop.apache.org/docs/3.2.3/reference/),” *tinkerpop.apache.org*, October 2016. -1. “[The Neo4j Manual v2.0.0](http://docs.neo4j.org/chunked/2.0.0/index.html),” - Neo Technology, 2013. +1. “[The Neo4j Manual v2.0.0](http://docs.neo4j.org/chunked/2.0.0/index.html),” Neo Technology, 2013. Emil Eifrem: [Twitter correspondence](https://twitter.com/emileifrem/status/419107961512804352), January 3, 2014. -1. Emil Eifrem: - [Twitter correspondence](https://twitter.com/emileifrem/status/419107961512804352), January 3, 2014. +1. David Beckett and Tim Berners-Lee: “[Turtle – Terse RDF Triple Language](http://www.w3.org/TeamSubmission/turtle/),” W3C Team Submission, March 28, 2011. -1. David Beckett and Tim Berners-Lee: - “[Turtle – Terse RDF Triple Language](http://www.w3.org/TeamSubmission/turtle/),” - W3C Team Submission, March 28, 2011. +1. “[Datomic Development Resources](http://docs.datomic.com/),” Metadata Partners, LLC, 2013. W3C RDF Working Group: “[Resource Description Framework (RDF)](http://www.w3.org/RDF/),” *w3.org*, 10 February 2004. -1. “[Datomic Development Resources](http://docs.datomic.com/),” Metadata Partners, LLC, 2013. +1. “[Apache Jena](http://jena.apache.org/),” Apache Software Foundation. -1. W3C RDF Working Group: - “[Resource Description Framework (RDF)](http://www.w3.org/RDF/),” - *w3.org*, 10 February 2004. - -1. “[Apache Jena](http://jena.apache.org/),” - Apache Software Foundation. - -1. Steve Harris, Andy Seaborne, and Eric - Prud'hommeaux: “[SPARQL 1.1 Query Language](http://www.w3.org/TR/sparql11-query/),” +1. Steve Harris, Andy Seaborne, and Eric Prud'hommeaux: “[SPARQL 1.1 Query Language](http://www.w3.org/TR/sparql11-query/),” W3C Recommendation, March 2013. -1. Todd J. Green, Shan Shan Huang, Boon Thau Loo, and Wenchao Zhou: - “[Datalog and Recursive Query Processing](http://blogs.evergreen.edu/sosw/files/2014/04/Green-Vol5-DBS-017.pdf),” *Foundations and Trends in Databases*, - volume 5, number 2, pages 105–195, November 2013. - [doi:10.1561/1900000017](http://dx.doi.org/10.1561/1900000017) +1. Todd J. Green, Shan Shan Huang, Boon Thau Loo, and Wenchao Zhou: “[Datalog and Recursive Query Processing](http://blogs.evergreen.edu/sosw/files/2014/04/Green-Vol5-DBS-017.pdf),” *Foundations and Trends in Databases*, volume 5, number 2, pages 105–195, November 2013. [doi:10.1561/1900000017](http://dx.doi.org/10.1561/1900000017) -1. Stefano Ceri, Georg Gottlob, and Letizia Tanca: - “[What You Always Wanted to Know About Datalog (And Never Dared to Ask)](https://www.researchgate.net/profile/Letizia_Tanca/publication/3296132_What_you_always_wanted_to_know_about_Datalog_and_never_dared_to_ask/links/0fcfd50ca2d20473ca000000.pdf),” *IEEE - Transactions on Knowledge and Data Engineering*, volume 1, number 1, pages 146–166, March 1989. - [doi:10.1109/69.43410](http://dx.doi.org/10.1109/69.43410) +1. Stefano Ceri, Georg Gottlob, and Letizia Tanca: “[What You Always Wanted to Know About Datalog (And Never Dared to Ask)](https://www.researchgate.net/profile/Letizia_Tanca/publication/3296132_What_you_always_wanted_to_know_about_Datalog_and_never_dared_to_ask/links/0fcfd50ca2d20473ca000000.pdf),” *IEEE Transactions on Knowledge and Data Engineering*, volume 1, number 1, pages 146–166, March 1989. [doi:10.1109/69.43410](http://dx.doi.org/10.1109/69.43410) -1. Serge Abiteboul, Richard Hull, and Victor Vianu: - *Foundations of Databases*. Addison-Wesley, 1995. - ISBN: 978-0-201-53771-0, available online at *webdam.inria.fr/Alice* +1. Serge Abiteboul, Richard Hull, and Victor Vianu: *Foundations of Databases*. Addison-Wesley, 1995. ISBN: 978-0-201-53771-0, available online at *webdam.inria.fr/Alice* -1. Nathan Marz: - “[Cascalog](http://cascalog.org/)," *cascalog.org*. +1. Nathan Marz: “[Cascalog](http://cascalog.org/)," *cascalog.org*. Dennis A. Benson, Ilene Karsch-Mizrachi, David J. Lipman, et al.: -1. Dennis A. Benson, - Ilene Karsch-Mizrachi, David J. Lipman, et al.: - “[GenBank](http://nar.oxfordjournals.org/content/36/suppl_1/D25.full-text-lowres.pdf),” - *Nucleic Acids Research*, volume 36, Database issue, pages D25–D30, December 2007. - [doi:10.1093/nar/gkm929](http://dx.doi.org/10.1093/nar/gkm929) + “[GenBank](http://nar.oxfordjournals.org/content/36/suppl_1/D25.full-text-lowres.pdf),” *Nucleic Acids Research*, volume 36, Database issue, pages D25–D30, December 2007. [doi:10.1093/nar/gkm929](http://dx.doi.org/10.1093/nar/gkm929) -1. Fons Rademakers: - “[ROOT for Big Data Analysis](http://indico.cern.ch/getFile.py/access?contribId=13&resId=0&materialId=slides&confId=246453),” at *Workshop on the Future of Big Data Management*, +1. Fons Rademakers: “[ROOT for Big Data Analysis](http://indico.cern.ch/getFile.py/access?contribId=13&resId=0&materialId=slides&confId=246453),” at *Workshop on the Future of Big Data Management*, London, UK, June 2013. diff --git a/ddia/ch3.md b/ddia/ch3.md index 3bf7367..3f941d0 100644 --- a/ddia/ch3.md +++ b/ddia/ch3.md @@ -104,25 +104,25 @@ $ cat database 每个段现在都有自己的内存散列表,将键映射到文件偏移量。为了找到一个键的值,我们首先检查最近段的哈希映射;如果键不存在,我们检查第二个最近的段,依此类推。合并过程保持细分的数量,所以查找不需要检查许多哈希映射。 大量的细节进入实践这个简单的想法工作。简而言之,一些真正实施中重要的问题是: -* 文件格式 +***文件格式*** - CSV不是日志的最佳格式。使用二进制格式更快,更简单,首先以字节为单位对字符串的长度进行编码,然后使用原始字符串(不需要转义)。 +​ CSV不是日志的最佳格式。使用二进制格式更快,更简单,首先以字节为单位对字符串的长度进行编码,然后使用原始字符串(不需要转义)。 -* 删除记录 +***删除记录*** - 如果要删除一个键及其关联的值,则必须在数据文件(有时称为逻辑删除)中附加一个特殊的删除记录。当日志段被合并时,逻辑删除告诉合并过程放弃删除键的任何以前的值。 +如果要删除一个键及其关联的值,则必须在数据文件(有时称为逻辑删除)中附加一个特殊的删除记录。当日志段被合并时,逻辑删除告诉合并过程放弃删除键的任何以前的值。 -* 崩溃恢复 +***崩溃恢复*** - 如果数据库重新启动,则内存散列映射将丢失。原则上,您可以通过从头到尾读取整个段文件并在每次按键时注意每个键的最近值的偏移量来恢复每个段的哈希映射。但是,如果段文件很大,这可能需要很长时间,这将使服务器重新启动痛苦。 Bitcask通过存储加速恢复磁盘上每个段的哈希映射的快照,可以更快地加载到内存中。 +如果数据库重新启动,则内存散列映射将丢失。原则上,您可以通过从头到尾读取整个段文件并在每次按键时注意每个键的最近值的偏移量来恢复每个段的哈希映射。但是,如果段文件很大,这可能需要很长时间,这将使服务器重新启动痛苦。 Bitcask通过存储加速恢复磁盘上每个段的哈希映射的快照,可以更快地加载到内存中。 -* 部分书面记录 +***部分写入记录*** - 数据库可能随时崩溃,包括将记录附加到日志中途。 Bitcask文件包含校验和,允许检测和忽略日志的这些损坏部分。 +数据库可能随时崩溃,包括将记录附加到日志中途。 Bitcask文件包含校验和,允许检测和忽略日志的这些损坏部分。 -* 并发控制 +***并发控制*** - 由于写操作是以严格顺序的顺序附加到日志中的,所以常见的实现选择是只有一个写入器线程。数据文件段是附加的,否则是不可变的,所以它们可以被多个线程同时读取。 +由于写操作是以严格顺序的顺序附加到日志中的,所以常见的实现选择是只有一个写入器线程。数据文件段是附加的,否则是不可变的,所以它们可以被多个线程同时读取。 乍一看,只有追加日志看起来很浪费:为什么不更新文件,用新值覆盖旧值?但是只能追加设计的原因有几个: @@ -160,11 +160,12 @@ $ cat database 2. 为了在文件中找到一个特定的键,你不再需要保存内存中所有键的索引。以[图3-5]()为例:假设你正在内存中寻找键`handiwork`,但是你不知道段文件中该关键字的确切偏移量。然而,你知道`handbag`和`handsome`的偏移,而且由于排序特性,你知道`handiwork`必须出现在这两者之间。这意味着您可以跳到`handbag`的偏移位置并从那里扫描,直到您找到`handiwork`(或没找到,如果该文件中没有该键)。 -![](img/fig3-5.png) + ![](img/fig3-5.png) -**图3-5 具有内存索引的SSTable** + **图3-5 具有内存索引的SSTable** + + 您仍然需要一个内存中索引来告诉您一些键的偏移量,但它可能很稀疏:每几千字节的段文件就有一个键就足够了,因为几千字节可以很快被扫描。 -您仍然需要一个内存中索引来告诉您一些键的偏移量,但它可能很稀疏:每几千字节的段文件就有一个键就足够了,因为几千字节可以很快被扫描。 3. 由于读取请求无论如何都需要扫描所请求范围内的多个键值对,因此可以将这些记录分组到块中,并在将其写入磁盘之前对其进行压缩(如图3-5中的阴影区域所示) 。稀疏内存中索引的每个条目都指向压缩块的开始处。除了节省磁盘空间之外,压缩还可以减少IO带宽的使用。 @@ -602,46 +603,27 @@ WHERE product_sk = 31 AND store_sk = 3 ## 参考文献 -1. Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman: - *Data Structures and Algorithms*. Addison-Wesley, 1983. ISBN: 978-0-201-00023-8 +1. Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman: *Data Structures and Algorithms*. Addison-Wesley, 1983. ISBN: 978-0-201-00023-8 -1. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and - Clifford Stein: *Introduction to Algorithms*, 3rd edition. MIT Press, 2009. - ISBN: 978-0-262-53305-8 +1. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein: *Introduction to Algorithms*, 3rd edition. MIT Press, 2009. ISBN: 978-0-262-53305-8 -1. Justin Sheehy and David Smith: - “[Bitcask: A Log-Structured Hash Table for Fast Key/Value Data](http://basho.com/wp-content/uploads/2015/05/bitcask-intro.pdf),” Basho Technologies, April 2010. +1. Justin Sheehy and David Smith: “[Bitcask: A Log-Structured Hash Table for Fast Key/Value Data](http://basho.com/wp-content/uploads/2015/05/bitcask-intro.pdf),” Basho Technologies, April 2010. -1. Yinan Li, Bingsheng He, Robin Jun Yang, et al.: - “[Tree Indexing on Solid State Drives](http://www.vldb.org/pvldb/vldb2010/papers/R106.pdf),” - *Proceedings of the VLDB Endowment*, volume 3, number 1, pages 1195–1206, - September 2010. +1. Yinan Li, Bingsheng He, Robin Jun Yang, et al.: “[Tree Indexing on Solid State Drives](http://www.vldb.org/pvldb/vldb2010/papers/R106.pdf),” *Proceedings of the VLDB Endowment*, volume 3, number 1, pages 1195–1206, September 2010. -1. Goetz Graefe: - “[Modern B-Tree Techniques](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.219.7269&rep=rep1&type=pdf),” - *Foundations and Trends in Databases*, volume 3, number 4, pages 203–402, August 2011. - [doi:10.1561/1900000028](http://dx.doi.org/10.1561/1900000028) +1. Goetz Graefe: “[Modern B-Tree Techniques](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.219.7269&rep=rep1&type=pdf),” *Foundations and Trends in Databases*, volume 3, number 4, pages 203–402, August 2011. [doi:10.1561/1900000028](http://dx.doi.org/10.1561/1900000028) -1. Jeffrey Dean and Sanjay Ghemawat: - “[LevelDB Implementation Notes](https://github.com/google/leveldb/blob/master/doc/impl.html),” - *leveldb.googlecode.com*. +1. Jeffrey Dean and Sanjay Ghemawat: “[LevelDB Implementation Notes](https://github.com/google/leveldb/blob/master/doc/impl.html),” *leveldb.googlecode.com*. -1. Dhruba Borthakur: - “[The History of RocksDB](http://rocksdb.blogspot.com/),” - *rocksdb.blogspot.com*, November 24, 2013. +1. Dhruba Borthakur: “[The History of RocksDB](http://rocksdb.blogspot.com/),” *rocksdb.blogspot.com*, November 24, 2013. -1. Matteo Bertozzi: - “[Apache HBase I/O – HFile](http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/),” *blog.cloudera.com*, June, 29 2012. +1. Matteo Bertozzi: “[Apache HBase I/O – HFile](http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/),” *blog.cloudera.com*, June, 29 2012. -1. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, et al.: - “[Bigtable: A Distributed Storage System for Structured Data](http://research.google.com/archive/bigtable.html),” at *7th USENIX Symposium on Operating System Design and +1. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, et al.: “[Bigtable: A Distributed Storage System for Structured Data](http://research.google.com/archive/bigtable.html),” at *7th USENIX Symposium on Operating System Design and Implementation* (OSDI), November 2006. -1. Patrick - O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil: - “[The Log-Structured Merge-Tree (LSM-Tree)](http://www.cs.umb.edu/~poneil/lsmtree.pdf),” - *Acta Informatica*, volume 33, number 4, pages 351–385, June 1996. - [doi:10.1007/s002360050048](http://dx.doi.org/10.1007/s002360050048) +1. Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil: + “[The Log-Structured Merge-Tree (LSM-Tree)](http://www.cs.umb.edu/~poneil/lsmtree.pdf),” *Acta Informatica*, volume 33, number 4, pages 351–385, June 1996. [doi:10.1007/s002360050048](http://dx.doi.org/10.1007/s002360050048) 1. Mendel Rosenblum and John K. Ousterhout: “[The Design and Implementation of a Log-Structured File System](http://research.cs.wisc.edu/areas/os/Qual/papers/lfs.pdf),”