Merge pull request #8 from LCTT/master

GENGXIN
2025-03-03 01:10:13 +08:00 · 2016-09-20 09:12:37 -05:00 · 2016-09-20 09:12:37 -05:00 · 62280a227f
commit 62280a227f
parent 9ed9817483 b55c7a408a
6 changed files with 535 additions and 537 deletions
--- a/sources/tech/20160823
+++ b/sources/tech/20160823
@ -1,129 +0,0 @@
-Eriwoon Start to translate this article
-The infrastructure behind Twitter: efficiency and optimization
-===========
-
-
-In the past, we've published details about Finagle, Manhattan, and the summary of how we re-architected the site to be able to handle events like Castle in the Sky, the Super Bowl, 2014 World Cup, the global New Year's Eve celebration, among others. In this infrastructure series, we're focusing on the core infrastructure and components that run Twitter. We're also going to focus each blog on efforts surrounding scalability, reliability, and efficiency in a way that highlights the history of our infrastructure, challenges we've faced, lessons learned, upgrades made, and where we're heading.
-
-### Data center efficiency
-
-#### History
-
-Twitter hardware and data centers are at the scale few technology companies ever reach. However, this was not accomplished without a few missteps along the way. Our uptime has matured through a combination of physical improvements and software-based changes.
-
-During the period when the fail whale was prevalent, outages occurred due to software limitations, as well as physical failures at the hardware or infrastructure level. Failure domains existed in various definitions which had to be aggregated to determine the risk and required redundancy for services. As the business scaled in customers, services, media content, and global presence, the strategy evolved to efficiently and resiliently support the service.
-
-#### Challenges
-
-Software dependencies on bare metal were further dependant on our data centers' ability to operate and maintain uptime of power, fiber connectivity, and environment. These discrete physical failure domains had to be reviewed against the services distributed on the hardware to provide for fault tolerance.
-
-The initial decision of which data center service provider to scale with was done when specialization in site selection, operation, and design was in its infancy. We began in a hosted provider then migrated to a colocation facility as we scaled. Early service interruptions occurred as result of equipment failures, data center design issues, maintenance issues, and human error. As a result, we continually iterated on the physical layer designs to increase the resiliency of the hardware and the data center operations.
-
-The physical reasons for service interruptions were inclusive of hardware failures at the server component level, top of rack switch, and core switches. For example, during the initial evaluation of our customized servers, the hardware team determined the cost of the second power supply was not warranted given the low rate of failure of server power supplies — so they were removed from the design. The data center power topology provides redundancy through separate physical whips to the racks and requires the second power supply. Removal of the second power supply eliminated the redundant power path, leaving the hardware vulnerable to impact during distribution faults in the power system. To mitigate the impact of the single power supply, ATS units were required to be added at the rack level to allow a secondary path for power.
-
-The layering of systems with diverse fiber paths, power sources, and physical domains continued to separate services from impacts at relatively small scale interruptions, thus improving resiliency.
-
-#### Lessons learned and major technology upgrades, migrations, and adoptions
-
-We learned to model dependencies between the physical failure domains, (i.e. building power and cooling, hardware, fiber) and the services distributed across them to better predict fault tolerance and drive improvements.
-
-We added additional data centers providing regional diversity to mitigate risk from natural disaster and the ability to fail between regions when it was needed during major upgrades, deploys or incidents. The active-active operation of data centers provided for staged code deployment reducing overall impacts of code rollouts.
-
-The efficiency of power use by the data centers has improved with expanding the operating ranges of the environmental envelope and designing the hardware for resiliency at the higher operating temperatures.
-
-#### Future work
-
-Our data centers continue to evolve in strategy and operation, providing for live changes to the operating network and hardware without interruption to the users. Our strategy will continue to focus on scale within the existing power and physical footprints through optimization and maintaining flexibility while driving efficiency in the coming years.
-
-### Hardware efficiency
-
-#### History and challenges
-
-Our hardware engineering team was started to qualify and validate performance of off-the-shelf purchased hardware, and evolved into customization of hardware for cost and performance optimizations.
-
-Procuring and consuming hardware at Twitter's scale comes with a unique set of challenges. In order to meet the demands of our internal customers, we initially started a program to qualify and ensure the quality of purchased hardware. The team was primarily focused on performance and reliability testing ensuring that systems could meet the demands. Running systematic tests to validate the behavior was predictable, and there were very few bugs introduced.
-
-As we scaled our major workloads (Mesos, Hadoop, Manhattan, and MySQL) it became apparent the available market offerings didn't quite meet the needs. Off-the-shelf servers come with enterprise features, like raid controllers and hot swap power supplies. These components improve reliability at small scale, but often decrease performance and increase cost; for example some raid controllers interfered with the performance of SSDs and could be a third of the cost of the system.
-
-At the time, we were a large user of mysql databases. Issues arose from both supply and performance of SAS media. The majority of deployments were 1u servers, and the total number of drives used plus a writeback cache could predict the performance of a system often time limited to a sustained 2000 sequential IOPS. In order to continue scaling this workload, we were stranding CPU cores and disk capacity to meet IOPS requirement. We were unable to find cost-effective solutions at this time.
-
-As our volume of hardware reached a critical mass, it made sense to invest in a hardware engineering team for customized white box solutions with focus on reducing the capital expenses and increased performance metrics.
-
-#### Major technology changes and adoption
-
-We've made many transitions in our hardware technology stack. Below is a timeline for adoptions of new technology and internally developed platforms.
-
- 2012 - SSDs become the primary storage media for our MySQL and key/value databases.
- 2013 - Our first custom solution for Hadoop workloads is developed, and becomes our primary bulk storage solution.
- 2013 - Our custom solution is developed for Mesos, TFE, and cache workloads.
- 2014 - Our custom SSD key/value server completes development.
- 2015 - Our custom database solution is developed.
- 2016 - We developed GPU systems for inference and training of machine learning models.
-
-#### Lessons learned
-
-The objective of our Hardware Engineering team is to significantly reduce the capital expenditure and operating expenditure by making small tradeoffs that improve our TCO. Two generalizations can apply to reduce the cost of a server:
-
-1. Removing the unused components
-2. Improving utilization
-
-Twitter's workload is divided into four main verticals: storage, compute, database, and gpu. Twitter defines requirements on a per vertical basis, allowing Hardware Engineering to produce a focused feature set for each. This approach allows us to optimize component selection where the equipment may go unused or underutilized. For example, our storage configuration has been designed specifically for Hadoop workloads and was delivered at a TCO reduction of 20% over the original OEM solution. At the same time, the design improved both the performance and reliability of the hardware. Similarly, for our compute vertical, the Hardware Engineering Team has improved the efficiency of these systems by removing unnecessary features.
-
-There is a minimum overhead required to operate a server, and we quickly reached a point where it could no longer remove components to reduce cost. In the compute vertical specifically, we decided the best approach was to look at solutions that replaced multiple nodes with a single node, and rely on Aurora/Mesos to manage the capacity. We settled on a design that replaced two of our previous generation compute nodes with a single node.
-
-Our design verification began with a series of rough benchmarks, and then progressed to a series of production load tests confirming a scaling factor of 2. Most of this improvement came from simply increasing the thread count of the CPU, but our testing confirmed a 20-50% improvement in our per thread performance. Additionally we saw a 25% increase in our per thread power efficiency, due to sharing the overhead of the server across more threads.
-
-For the initial deployment, our monitoring showed a 1.5 replacement factor, which was well below the design goal. An examination of the performance data revealed there was a flawed assumption in the workload characteristics, and that it needed to be identified.
-
-Our Hardware Engineering Team's initial action was to develop a model to predict the packing efficiency of the current Aurora job set into various hardware configurations. This model correctly predicted the scaling factor we were observing in the fleet, and suggested we were stranding cores due to unforeseen storage requirements. Additionally, the model predicted we would see a still improved scaling factor by changing the memory configuration as well.
-
-Hardware configuration changes take time to implement, so Hardware Engineering identified a few large jobs and worked with our SRE teams to adjust the scheduling requirements to reduce the storage needs. These changes were quick to deploy, and resulted in an immediate improvement to a 1.85 scaling factor.
-
-In order to address the situation permanently, we needed to adjust to configuration of the server. Simply expanding the installed memory and disk capacity resulted in a 20% improvement in the CPU core utilization, at a minimal cost increase. Hardware Engineering worked with our manufacturing partners to adjust the bill of materials for the initial shipments of these servers. Follow up observations confirmed a 2.4 scaling factor exceeding the target design.
-
-### Migration from bare metal to mesos
-
-Until 2012, running a service inside Twitter required hardware requisitions. Service owners had to find out and request the particular model or class of server, worry about your rack diversity, maintain scripts to deploy code, and manage dead hardware. There was essentially no "service discovery." When a web service needed to talk to the user service, it typically loaded up a YAML file containing all of the host IPs and ports of the user service and the service used that list (port reservations were tracked in a wiki page). As hardware died or was added, managing required editing and committing changes to the YAML file that would go out with the next deploy. Making changes in the caching tier meant many deploys over hours and days, adding a few hosts at a time and deploying in stages. Dealing with cache inconsistencies during the deploy was a common occurrence, since some hosts would be using the new list and some the old. It was possible to have a host running old code (because the box was temporarily down during the deploy) resulting in a flaky behavior with the site.
-
-In 2012/2013, two things started to get adopted at Twitter: service discovery (via a zookeeper cluster and a library in the core module of Finagle) and Mesos (including our own scheduler framework on top of Mesos called Aurora, now an Apache project).
-
-Service discovery no longer required static YAML host lists. A service either self-registered on startup or was automatically registered under mesos into a "serverset" (which is just a path to a list of znodes in zookeeper based on the role, environment, and service name). Any service that needed to talk to that service would just watch that path and get a live view of what servers were out there.
-
-With Mesos/Aurora, instead of having a script (we were heavy users of Capistrano) that took a list of hosts, pushed binaries around and orchestrated a rolling restart, a service owner pushed the package into a service called "packer" (which is a service backed by HDFS), uploaded an aurora configuration that described the service (how many CPUs it needed, how much memory, how many instances needed, the command lines of all the tasks each instance should run) and Aurora would complete the deploy. It schedules instances on an available hosts, downloads the artifact from packer, registers it in service discovery, and launches it. If there are any failures (hardware dies, network fails, etc), Mesos/Aurora automatically reschedules the instance on another host.
-
-#### Twitter's Private PaaS
-
-Mesos/Aurora and Service Discovery in combination were revolutionary. There were many bugs and growing pains over the next few years and many hard lessons learned about distributed systems, but the fundamental design was sound. In the old world, the teams were constantly dealing with and thinking about hardware and its management. In the new world, the engineers only have to think about how best to configure their services and how much capacity to deploy. We were also able to radically improve the CPU utilization of Twitter's fleet over time, since generally each service that got their own bare metal hardware didn't fully utilize its resources and did a poor job of managing capacity. Mesos allows us to pack multiple services into a box without having to think about it, and adding capacity to a service is only requesting quota, changing one line of a config, and doing a deploy.
-
-Within two years, most "stateless" services moved into Mesos. Some of the most important and largest services (including our user service and our ads serving system) were among the first to move. Being the largest, they saw the biggest benefit to their operational burden. This allowed them to reduce their operational burden.
-
-We are continuously looking for ways to improve the efficiency and optimization of the infrastructure. As part of this, we regularly benchmark against public cloud providers and offerings to validate our TCO and performance expectations of the infrastructure. We also have a good presence in public cloud, and will continue to utilize the public cloud when it's the best available option. The next series of this post will mainly focus on the scale of our infrastructure.
-
-Special thanks to Jennifer Fraser, David Barr, Geoff Papilion, Matt Singer, and Lam Dong for all their contributions to this blog post.
-
-
-
-
-
--------------------------------------------------------------------------------
-
-via: https://blog.twitter.com/2016/the-infrastructure-behind-twitter-efficiency-and-optimization?utm_source=webopsweekly&utm_medium=email
-
-作者：[mazdakh][a]
-译者：[译者ID](https://github.com/译者ID)
-校对：[校对者ID](https://github.com/校对者ID)
-
-本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译，[Linux中国](https://linux.cn/) 荣誉推出
-
-[a]: https://twitter.com/intent/user?screen_name=mazdakh
-[1]: https://twitter.com/jenniferfraser
-[2]: https://twitter.com/davebarr
-[3]: https://twitter.com/gpapilion
-[4]: https://twitter.com/lamdong
-
-
-
-
-
-
-
--- a/sources/tech/awk/Part
+++ b/sources/tech/awk/Part
@ -1,161 +0,0 @@
-chunyang-wen doing
-Part 13 - How to Write Scripts Using Awk Programming Language
-====
-
-All along from the beginning of the Awk series up to Part 12, we have been writing small Awk commands and programs on the command line and in shell scripts respectively.
-
-However, Awk, just as Shell, is also an interpreted language, therefore, with all that we have walked through from the start of this series, you can now write Awk executable scripts.
-
-Similar to how we write a shell script, Awk scripts start with the line:
-
-```
-#! /path/to/awk/utility -f 
-```
-
-For example on my system, the Awk utility is located in /usr/bin/awk, therefore, I would start an Awk script as follows:
-
-```
-#! /usr/bin/awk -f 
-```
-
-Explaining the line above:
-
-```
-#! – referred to as Shebang, which specifies an interpreter for the instructions in a script
-/usr/bin/awk – is the interpreter
-f – interpreter option, used to read a program file
-```
-
-That said, let us now dive into looking at some examples of Awk executable scripts, we can start with the simple script below. Use your favorite editor to open a new file as follows:
-
-```
-$ vi script.awk
-```
-
-And paste the code below in the file:
-
-```
-#!/usr/bin/awk -f 
-BEGIN { printf "%s\n","Writing my first Awk executable script!" }
-```
-
-Save the file and exit, then make the script executable by issuing the command below:
-
-```
-$ chmod +x script.awk
-```
-
-Thereafter, run it:
-
-```
-$ ./script.awk
-```
-
-Sample Output
-
-```
-Writing my first Awk executable script!
-```
-
-A critical programmer out there must be asking, “where are the comments?”, yes, you can also include comments in your Awk script. Writing comments in your code is always a good programming practice.
-
-It helps other programmers looking through your code to understand what you are trying to achieve in each section of a script or program file.
-
-Therefore, you can include comments in the script above as follows.
-
-```
-#!/usr/bin/awk -f 
-#This is how to write a comment in Awk
-#using the BEGIN special pattern to print a sentence 
-BEGIN { printf "%s\n","Writing my first Awk executable script!" }
-```
-
-Next, we shall look at an example where we read input from a file. We want to search for a system user named aaronkilik in the account file, /etc/passwd, then print the username, user ID and user GID as follows:
-
-Below is the content of our script called second.awk.
-
-```
-#! /usr/bin/awk -f 
-#use BEGIN sepecial character to set FS built-in variable
-BEGIN { FS=":" }
-#search for username: aaronkilik and print account details 
-/aaronkilik/ { print "Username :",$1,"User ID :",$3,"User GID :",$4 }
-```
-
-Save the file and exit, make the script executable and execute it as below:
-
-```
-$ chmod +x second.awk
-$ ./second.awk /etc/passwd
-```
-
-Sample Output
-
-```
-Username : aaronkilik User ID : 1000 User GID : 1000
-```
-
-In the last example below, we shall use do while statement to print out numbers from 0-10:
-
-Below is the content of our script called do.awk.
-
-```
-#! /usr/bin/awk -f 
-#printing from 0-10 using a do while statement 
-#do while statement 
-BEGIN {
-#initialize a counter
-x=0
-do {
-print x;
-x+=1;
-}
-while(x<=10)
-}
-```
-
-After saving the file, make the script executable as we have done before. Afterwards, run it:
-
-```
-$ chmod +x do.awk
-$ ./do.awk
-```
-
-Sample Output
-
-```
-0
-1
-2
-3
-4
-5
-6
-7
-8
-9
-10
-```
-
-### Summary
-
-We have come to the end of this interesting Awk series, I hope you have learned a lot from all the 13 parts, as an introduction to Awk programming language.
-
-As I mentioned from the beginning, Awk is a complete text processing language, for that reason, you can learn more other aspects of Awk programming language such as environmental variables, arrays, functions (built-in & user defined) and beyond.
-
-There is yet additional parts of Awk programming to learn and master, so, below, I have provided some links to important online resources that you can use to expand your Awk programming skills, these are not necessarily all that you need, you can also look out for useful Awk programming books.
-
-
-For any thoughts you wish to share or questions, use the comment form below. Remember to always stay connected to Tecmint for more exciting series.
-
--------------------------------------------------------------------------------
-
-via: http://www.tecmint.com/write-shell-scripts-in-awk-programming/
-
-作者：[Aaron Kili |][a]
-译者：[译者ID](https://github.com/译者ID)
-校对：[校对者ID](https://github.com/校对者ID)
-
-本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译，[Linux中国](https://linux.cn/) 荣誉推出
-
-[a]: http://www.tecmint.com/author/aaronkili/
--- a/sources/tech/awk/part
+++ b/sources/tech/awk/part
@ -1,247 +0,0 @@
-chunyang-wen doing
-How to Use Flow Control Statements in Awk - part12
-====
-
-When you review all the Awk examples we have covered so far, right from the start of the Awk series, you will notice that all the commands in the various examples are executed sequentially, that is one after the other. But in certain situations, we may want to run some text filtering operations based on some conditions, that is where the approach of flow control statements sets in.
-
-![](http://www.tecmint.com/wp-content/uploads/2016/08/Use-Flow-Control-Statements-in-Awk.png)
-
-There are various flow control statements in Awk programming and these include:
-
- if-else statement
- for statement
- while statement
- do-while statement
- break statement
- continue statement
- next statement
- nextfile statement
- exit statement
-
-However, for the scope of this series, we shall expound on: if-else, for, while and do while statements. Remember that we already walked through how to use next statement in Part 6 of this Awk series.
-
-### 1. The if-else Statement
-
-The expected syntax of the if statement is similar to that of the shell if statement:
-
-```
-if  (condition1) {
-actions1
-}
-else {
-actions2
-}
-```
-
-In the above syntax, condition1 and condition2 are Awk expressions, and actions1 and actions2 are Awk commands executed when the respective conditions are satisfied.
-
-When condition1 is satisfied, meaning it’s true, then actions1 is executed and the if statement exits, otherwise actions2 is executed.
-
-The if statement can also be expanded to a if-else_if-else statement as below:
-
-```
-if (condition1){
-actions1
-}
-else if (conditions2){
-actions2
-}
-else{
-actions3
-}
-```
-
-For the form above, if condition1 is true, then actions1 is executed and the if statement exits, otherwise condition2 is evaluated and if it is true, then actions2 is executed and the if statement exits. However, when condition2 is false then, actions3 is executed and the if statement exits.
-
-Here is a case in point of using if statements, we have a list of users and their ages stored in the file, users.txt.
-
-We want to print a statement indicating a user’s name and whether the user’s age is less or more than 25 years old.
-
-```
-aaronkilik@tecMint ~ $ cat users.txt
-Sarah L			35    	F
-Aaron Kili		40    	M
-John  Doo		20    	M
-Kili  Seth		49    	M    
-```
-
-We can write a short shell script to carry out our job above, here is the content of the script:
-
-```
-#!/bin/bash
-awk ' { 
-if ( $3 <= 25 ){
-print "User",$1,$2,"is less than 25 years old." ;
-}
-else {
-print "User",$1,$2,"is more than 25 years old" ; 
-}
-}'    ~/users.txt
-```
-
-Then save the file and exit, make the script executable and run it as follows:
-
-```
-$ chmod +x test.sh
-$ ./test.sh
-```
-
-Sample Output
-
-```
-User Sarah L is more than 25 years old
-User Aaron Kili is more than 25 years old
-User John Doo is less than 25 years old.
-User Kili Seth is more than 25 years old
-```
-
-### 2. The for Statement
-
-In case you want to execute some Awk commands in a loop, then the for statement offers you a suitable way to do that, with the syntax below:
-
-Here, the approach is simply defined by the use of a counter to control the loop execution, first you need to initialize the counter, then run it against a test condition, if it is true, execute the actions and finally increment the counter. The loop terminates when the counter does not satisfy the condition.
-
-```
-for ( counter-initialization; test-condition; counter-increment ){
-actions
-}
-```
-
-The following Awk command shows how the for statement works, where we want to print the numbers 0-10:
-
-```
-$ awk 'BEGIN{ for(counter=0;counter<=10;counter++){ print counter} }'
-```
-
-Sample Output
-
-```
-0
-1
-2
-3
-4
-5
-6
-7
-8
-9
-10
-```
-
-### 3. The while Statement
-
-The conventional syntax of the while statement is as follows:
-
-```
-while ( condition ) {
-actions
-}
-```
-
-The condition is an Awk expression and actions are lines of Awk commands executed when the condition is true.
-
-Below is a script to illustrate the use of while statement to print the numbers 0-10:
-
-```
-#!/bin/bash
-awk ' BEGIN{ counter=0 ;
-while(counter<=10){
-print counter;
-counter+=1 ;
-}
-}  
-```
-
-Save the file and make the script executable, then run it:
-
-```
-$ chmod +x test.sh
-$ ./test.sh
-```
-
-Sample Output
-
-```
-0
-1
-2
-3
-4
-5
-6
-7
-8
-9
-10
-```
-
-### 4. The do while Statement
-
-It is a modification of the while statement above, with the following underlying syntax:
-
-```
-do {
-actions
-}
-while (condition) 
-```
-
-The slight difference is that, under do while, the Awk commands are executed before the condition is evaluated. Using the very example under while statement above, we can illustrate the use of do while by altering the Awk command in the test.sh script as follows:
-
-```
-#!/bin/bash
-awk ' BEGIN{ counter=0 ;  
-do{
-print counter;  
-counter+=1 ;    
-}
-while (counter<=10)   
-} 
-'
-```
-
-After modifying the script, save the file and exit. Then make the script executable and execute it as follows:
-
-```
-$ chmod +x test.sh
-$ ./test.sh
-```
-
-Sample Output
-
-```
-0
-1
-2
-3
-4
-5
-6
-7
-8
-9
-10
-```
-
-### Conclusion
-
-This is not a comprehensive guide regarding Awk flow control statements, as I had mentioned earlier on, there are several other flow control statements in Awk.
-
-Nonetheless, this part of the Awk series should give you a clear fundamental idea of how execution of Awk commands can be controlled based on certain conditions.
-
-You can as well expound more on the rest of the flow control statements to gain more understanding on the subject matter. Finally, in the next section of the Awk series, we shall move into writing Awk scripts.
-
--------------------------------------------------------------------------------
-
-via: http://www.tecmint.com/use-flow-control-statements-with-awk-command/
-
-作者：[Aaron Kili][a]
-译者：[译者ID](https://github.com/译者ID)
-校对：[校对者ID](https://github.com/校对者ID)
-
-本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译，[Linux中国](https://linux.cn/) 荣誉推出
-
-[a]: http://www.tecmint.com/author/aaronkili/
-
-
--- a/translated/tech/20160823
+++ b/translated/tech/20160823
@ -0,0 +1,129 @@
+Twitter背后的基础设施：效率与优化
+===========
+
+过去我们曾经发布过一些关于 [Finagle](https://twitter.github.io/finagle/) , [Manhattan](https://blog.twitter.com/2014/manhattan-our-real-time-multi-tenant-distributed-database-for-twitter-scale) 这些项目的文章，还写过一些针对大型事件活动的架构优化的文章，例如天空之城，超级碗， 2014 世界杯，全球新年夜庆祝活动等。在这篇基础设施系列文章中，我主要聚焦于 Twitter 的一些关键设施和组件。我也会写一些我们在系统的扩展性，可靠性，效率性方面的做过的改进，例如我们基础设施的历史，遇到过的挑战，学到的教训，做过的升级，以及我们现在前进的方向等等。
+
+> 天空之城：2013年8月2日，宫崎骏的《天空之城》在NTV迎来其第14次电视重播，剧情发展到高潮之时，Twitter的TPS（Tweets Per Second）也被推上了新的高度——143,199 TPS，是平均值的25倍，这个记录保持至今 -- 译者注。
+
+### 数据中心的效率优化
+
+#### 历史
+
+当前Twitter硬件和数据中心的规模已经超过大多数公司。但达到这样的规模不是一蹴而就的，系统是随着软硬件的升级优化一步步成熟起来的，过程中我们也曾经犯过很多错误。
+
+有个一时期我们的系统故障不断。软件问题，硬件问题，甚至底层设备问题不断爆发，常常导致系统运营中断。随着 Twitter 在客户、服务、媒体上的影响力不断扩大，构建一个高效、可靠的系统来提供服务成为我们的战略诉求。
+
+> Twitter系统故障的界面被称为失败鲸（Fail Whale），如下图 -- 译者注
+![Fail Whale](https://upload.wikimedia.org/wikipedia/en/d/de/Failwhale.png)
+
+#### 挑战
+
+一开始，我们的软件是直接安装在服务器，这意味着软件可靠性依赖硬件，电源、网络以及其他的环境因素都是威胁。这种情况下，如果要增加容错能力，就需要统筹考虑物理设备和在上面运行的服务。
+
+最早采购数据中心方案的时候，我们都还是菜鸟，对于站点选择、运营和设计都非常不专业。我们先直接租用主机，业务增长后我们改用主机托管。早期遇到的问题主要是因为设备故障、数据中心设计问题、维护问题以及人为操作失误。我们也在持续迭代我们的硬件设计，从而增强硬件和数据中心的容错性。
+
+服务中断的原因有很多，其中硬件故障常发生在服务器、机架交换机、核心交换机这地方。举一个我们曾经犯过的错误，硬件团队最初在设计服务器的时候，认为双路电源对减少供电问题的意义不大 -- 他们真的就移除了一块电源。然而数据中心一般给机架提供两路供电来提高冗余性，防止电网故障传导到服务器，而这需要两块电源。最终我们不得不在机架上增加了一个 ATS 单元（AC transfer switch 交流切换开关）来接入第二路供电。
+
+提高系统的可靠性靠的就是这样的改进，给网络、供电甚至机房增加冗余，从而将影响控制到最小范围。
+
+#### 我们学到的教训以及技术的升级、迁移和选型
+
+我们学到的第一个教训就是要先建模，将可能出故障的地方（例如建筑的供电和冷却系统、硬件、光线网络等）和运行在上面的服务之间的依赖关系弄清楚，这样才能更好地分析，从而优化设计提升容错能力。
+
+我们增加了更多的数据中心提升地理容灾能力，减少自然灾害的影响。而且这种站点隔离也降低了软件的风险，减少了例如软件部署升级和系统故障的风险。这种多活的数据中心架构提供了代码灰度发布的能力，减少代码首次上线时候的影响。
+
+我们设计新硬件使之能够在更高温度下正常运行，数据中心的能源效率因此有所提升。
+
+#### 下一步工作
+
+随着公司的战略发展和运营增长，我们在不影响我们的最终用户的前提下，持续不断改进我们的数据中心。下一步工作主要是在当前能耗和硬件的基础上，通过维护和优化来提升效率。
+
+### 硬件的效率优化
+
+#### 历史和挑战
+
+我们的硬件工程师团队刚成立的时候只能测试市面上现有硬件，而现在我们能自己定制硬件以节省成本并提升效率。
+
+Twitter 是一个很大的公司，它对硬件的要求对任何团队来说都是一个不小的挑战。为了满足整个公司的需求，我们的首要工作是能检测并保证购买的硬件的品质。团队重点关注的是性能和可靠性这两部分。对于硬件我们会做系统性的测试来保证其性能可预测，保证尽量不引入新的问题。
+
+随着我们一些关键组件的负荷越来越大（如 Mesos , Hadoop , Manhattan , MySQL 等），市面上的产品已经无法满足我们的需求。同时供应商提供的一些高级服务器功能，例如 Raid 管理或者电源热切换等，可靠性提升很小，反而会拖累系统性能而且价格高昂，例如一些 Raid 控制器价格高达系统总报价的三分之一，还拖累了 SSD 的性能。
+
+那时，我们也是 MySQL 数据库的一个大型用户。SAS（Serial Attached SCSI，串行连接 SCSI ）设备的供应和性能都有很大的问题。我们大量使用 1 u 的服务器，它的驱动器和回写缓存一起也只能支撑每秒 2000 次顺序 IO。为了获得更好的效果，我们只得不断增加 CPU 核心数并加强磁盘能力。我们那时候找不到更节省成本的方案。
+
+后来随着我们对硬件需求越来越大，我们可以成立了一个硬件团队，从而自己来设计更便宜更高效的硬件。
+
+#### 关键技术变更与选择
+
+我们不断的优化硬件相关的技术，下面是我们采用的新技术和自研平台的时间轴。
+
+- 2012 - 采用 SSD 作为我们 MySQL 和 Key-Value 数据库的主要存储。
+- 2013 - 我们开发了第一个定制版 Hadoop 工作站，它现在是我们主要的大容量存储方案。
+- 2013 - 我们定制的解决方案应用在 Mesos 、 TFE（ Twitter Front-End ）以及缓存设备上。
+- 2014 - 我们定制的 SSD Key-Value 服务器完成开发。
+- 2015 - 我们定制的数据库解决方案完成开发。
+- 2016 - 我们开发了一个 GPU 系统来做模糊推理和训练机器学习。
+
+#### 学到的教训
+
+硬件团队的工作本质是通过做取舍来优化TCO（总体拥有成本），最终达到达到降低 CAPEX（资本支出）和 OPEX（运营支出）的目的。概括来说，服务器降成本就是：
+
+1. 删除无用的功能和组件
+2. 提升利用率
+
+Twitter 的设备总体来说有这四大类：存储设备、计算设备、数据库和 GPU 。 Twitter 对每一类都定义了详细的需求，让硬件工程师更针对性地设计产品，从而优化掉那些用不到或者极少用的冗余部分。例如，我们的存储设备就专门为 Hadoop 优化，设备的购买和运营成本相比于 OEM 产品降低了 20% 。同时，这样做减法还提高了设备的性能和可靠性。同样的，对于计算设备，硬件工程师们也通过移除无用的特性获得了效率提升。
+
+一个服务器可以移除的组件总是有限的，我们很快就把能移除的都扔掉了。于是我们想出了其他办法，例如在存储设备里，我们认为降低成本最好的办法是用一个节点替换多个节点，并通过 Aurora/Mesos 来管理任务负载。这就是我们现在正在做的东西。
+
+对于这个我们自己新设计的服务器，首先要通过一系列的标准测试，然后会再做一系列负载测试，我们的目标是一台新设备至少能替换两台旧设备。大多数的提升都比较简单，例如增加 CPU 的进程数，同时我们的测试也比较出新 CPU 的 单线程能力提高了 20~50% ，对应能耗降低了 25% ，这都是我们测试环节需要做的工作。
+
+这个新设备首次部署的时候，监控发现新设备只能替换 1.5 台旧设备，这比我们的目标低了很多。对性能数据检查后发现，我们之前新硬件的部分指标是错的，而这正是我们在做性能测试需要发现的问题。
+
+对此我们硬件团队开发了一个模型，用来预测在不同的硬件配置下当前 Aurora 任务的打包效率。这个模型正确的预测了新旧硬件的性能比例。模型还指出了我们一开始没有考虑到的存储需求，并因此建议我们增加 CPU 核心数。另外，它还预测，如果我们修改内存的配置，那系统的性能还会有较大提高。
+
+硬件配置的改变都需要花时间去操作，所以我们的硬件工程师们就首先找出几个关键痛点。例如我们和站点工程团队一起调整任务顺序来降低存储需求，这种修改很简单也很有效，新设备可以代替 1.85 个旧设备了。
+
+为了更好的优化效率，我们对新硬件的配置做了修改，扩大了内存和磁盘容量就将 CPU 利用率提高了20% ，而这只增加了非常小的成本。同时我们的硬件工程师也和生产的伙伴一起优化发货顺序来降低货运成本。后续的观察发现我们的自己的新设备实际上可以代替 2.4 台旧设备，这个超出了预定的目标。
+
+### 从裸设备迁移到 mesos 集群
+
+直到2012年为止，软件团队在 Twitter 开通一个新服务还需要自己操心硬件：配置硬件的规格需求，研究机架尺寸，开发部署脚本以及处理硬件故障。同时，系统中没有所谓的“服务发现”机制，当一个服务需要调用一个另一个服务时候，需要读取一个 YAML 配置文件，这个配置文件中有目标服务对应的主机 IP 和端口信息（端口信息是由一个公共 wiki 页面维护的）。随着硬件的替换和更新，YAML 配置文件里的内容也会不断的编辑更新。每次更新都需要花几个小时甚至几天来重启在各个服务，从而将新配置刷新到所有服务的缓存里，所以我们只能尽量一次增加多个配置并且按次序分别重启。我们经常遇到重启过程中 cache 不一致导致的问题，因为有的主机在使用旧的配置有的主机在用新的。有时候一台主机的异常（例如它正在重启）会导致整个站点都无法正常工作。 
+
+在 2012/2013 年的时候，Twitter 开始尝试两个新事物：服务发现（来自 ZooKeeper 集群和 Finagle 核心模块中的一个库）和 Mesos（包括基于 Mesos 的一个自研的计划任务框架 Aurora ，它现在也是 Apache 基金会的一个项目）。
+
+服务发现功能意味着不需要再维护一个静态 YAML 主机列表了。服务或者在启动后主动注册，或者自动被 mesos 接入到一个“服务集”（就是一个 ZooKeeper 中的 znode 列表，包含角色、环境和服务名信息）中。任何想要访问这个服务的组件都只需要监控这个路径就可以实时获取到一个正在工作的服务列表。
+
+现在我们通过 Mesos/Aurora ，而不是使用脚本（我们曾经是 Capistrano 的重度用户）来获取一个主机列表、分发代码并规划重启任务。现在软件团队如果想部署一个新服务，只需要将软件包上传到一个叫 Packer 的工具上（它是一个基于 HDFS 的服务），再在 Aurora 配置上描述文件（需要多少 CPU ，多少内存，多少个实例，启动的命令行代码），然后 Aurora 就会自动完成整个部署过程。 Aurora 先找到可用的主机，从 Packer 下载代码，注册到“服务发现”，最后启动这个服务。如果整个过程中遇到失败（硬件故障、网络中断等等）， Mesos/Aurora 会自动重选一个新主机并将服务部署上去。
+
+#### Twitter 的私有 PaaS 云平台
+
+Mesos/Aurora 和服务发现这两个功能给我们带了革命性的变化。虽然在接下来几年里，我们碰到了无数 bug ，伤透了无数脑筋，学到了分布式系统里的无数教训，但是这套架还是非常赞的。以前大家一直忙于处理硬件搭配和管理，而现在，大家只需要考虑如何优化业务以及需要多少系统能力就可以了。同时，我们也从根本上解决了 CPU 利用率低的问题，以前服务直接安装在服务器上，这种方式无法充分利用服务器资源，任务协调能力也很差。现在 Mesos 允许我们把多个服务打包成一个服务包，增加一个新服务只需要修改硬件配额，再改一行配置就可以了。
+
+在两年时间里，多数“无状态”服务迁移到了 Mesos 平台。一些大型且重要的服务（包括我们的用户服务和广告服务）是最先迁移上去的。因为它们的体量巨大，所以他们从这些服务里获得的好处也最多。
+
+我们一直在不断追求效率提升和架构优化的最佳实践。我们会定期去测试公有云的产品，和我们自己产品的 TCO 以及性能做对比。我们也拥抱公有云的服务，事实上我们现在正在使用公有云产品。最后，这个系列的下一篇将会主要聚焦于我们基础设施的体量方面。
+
+特别感谢 Jennifer Fraser, David Barr, Geoff Papilion, Matt Singer, Lam Dong 对这篇文章的贡献。
+
+
+
+
+--------------------------------------------------------------------------------
+
+via: https://blog.twitter.com/2016/the-infrastructure-behind-twitter-efficiency-and-optimization?utm_source=webopsweekly&utm_medium=email
+
+作者：[mazdakh][a]
+译者：[译者ID](https://github.com/译者ID)
+校对：[校对者ID](https://github.com/校对者ID)
+
+本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译，[Linux中国](https://linux.cn/) 荣誉推出
+
+[a]: https://twitter.com/intent/user?screen_name=mazdakh
+[1]: https://twitter.com/jenniferfraser
+[2]: https://twitter.com/davebarr
+[3]: https://twitter.com/gpapilion
+[4]: https://twitter.com/lamdong
+
+
+
+
+
--- a/translated/tech/awk/Part
+++ b/translated/tech/awk/Part
@ -0,0 +1,159 @@
+如何使用 Awk 语言写脚本 - Part 13
+====
+
+从 Awk 系列开始直到第 12 部分，我们都是在命令行或者脚本文件写一些简短的 Awk 命令和程序。
+
+然而 Awk 和 Shell 一样也是一个解释语言。通过从开始到现在的一系列的学习，你现在能写可以执行的 Awk 脚本了。
+
+和写 shell 脚本差不多，Awk 脚本以下面这一行开头：
+
+```
+#! /path/to/awk/utility -f
+```
+
+例如在我的系统上，Awk 工具安装在 /user/bin/awk 目录，所以我的 Awk 脚本以如下内容作为开头：
+
+```
+#! /usr/bin/awk -f
+```
+
+上面一行的解释如下：
+
+```
+#! – 称为 Shebang，指明使用那个解释器来执行脚本中的命令
+/usr/bin/awk –解释器
+-f – 解释器选项，用来指定读取的程序文件
+```
+
+说是这么说，现在从下面的简单例子开始，让我们深入研究一些可执行的 Awk 脚本。使用你最喜欢的编辑器创建一个新文件，像下面这样：
+
+```
+$ vi script.awk
+```
+
+然后把下面代码粘贴到文件中：
+
+```
+#!/usr/bin/awk -f
+BEGIN { printf "%s\n","Writing my first Awk executable script!" }
+```
+
+保存文件后退出，然后执行下面命令，使得脚本可执行：
+
+```
+$ chmod +x script.awk
+```
+
+然后，执行它：
+
+```
+$ ./script.awk
+```
+
+输出样例：
+
+```
+Writing my first Awk executable script!
+```
+
+一个严格的程序员一定会问：“注释呢？”。是的，你可以在 Awk 脚本中包含注释。在代码中写注释是一种良好的编程习惯。
+
+它有利于其它程序员阅读你的代码，理解程序文件或者脚本中每一部分的功能。
+
+所以，你可以像下面这样在脚本中增加注释：
+
+```
+#!/usr/bin/awk -f
+#This is how to write a comment in Awk
+#using the BEGIN special pattern to print a sentence
+BEGIN { printf "%s\n","Writing my first Awk executable script!" }
+```
+
+接下来我们看一个读文件的例子。我们想从帐号文件 /etc/passwd 中查找一个叫 aaronkilik 的用户，然后像下面这样打印用户名，用户的 ID，用户的 GID (译者注：组 ID)：
+
+下面是我们脚本文件的内容，文件名为 second.awk。
+
+```
+#! /usr/bin/awk -f
+#use BEGIN sepecial character to set FS built-in variable
+BEGIN { FS=":" }
+#search for username: aaronkilik and print account details
+/aaronkilik/ { print "Username :",$1,"User ID :",$3,"User GID :",$4 }
+```
+
+保存文件后退出，使得脚本可执行，然后像下面这样执行它：
+
+```
+$ chmod +x second.awk
+$ ./second.awk /etc/passwd
+```
+
+输出样例
+
+```
+Username : aaronkilik User ID : 1000 User GID : 1000
+```
+
+在下面最后一个例子中，我们将使用 do while 语句来打印数字 0-10：
+
+下面是我们脚本文件的内容，文件名为 do.awk。
+
+```
+#! /usr/bin/awk -f
+#printing from 0-10 using a do while statement
+#do while statement
+BEGIN {
+#initialize a counter
+x=0
+do {
+print x;
+x+=1;
+}
+while(x<=10)
+}
+```
+
+保存文件后，像之前操作一样使得脚本可执行。然后，运行它：
+
+```
+$ chmod +x do.awk
+$ ./do.awk
+```
+
+输出样例
+
+```
+0
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+```
+
+### 总结
+
+我们已经到达这个精彩的 Awk 系列的最后，我希望你从整个 13 部分中学到了很多知识，把这些当作你 Awk 编程语言的入门指导。
+
+我一开始就提到过，Awk 是一个完整的文本处理语言，所以你可以学习很多 Awk 编程语言的其它方面，例如环境变量，数组，函数（内置的或者用户自定义的），等等。
+
+Awk 编程还有其它内容需要学习和掌握，所以在文末我提供了一些重要的在线资源的链接，你可以利用他们拓展你的 Awk 编程技能。但这不是必须的，你也可以阅读一些关于 Awk 的书籍。
+
+如果你任何想要分享的想法或者问题，在下面留言。记得保持关注 Tecmint，会有更多的精彩内容。
+
+--------------------------------------------------------------------------------
+
+via: http://www.tecmint.com/write-shell-scripts-in-awk-programming/
+
+作者：[Aaron Kili |][a]
+译者：[chunyang-wen](https://github.com/chunyang-wen)
+校对：[校对者ID](https://github.com/校对者ID)
+
+本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译，[Linux中国](https://linux.cn/) 荣誉推出
+
+[a]: http://www.tecmint.com/author/aaronkili/
--- a/translated/tech/awk/part
+++ b/translated/tech/awk/part
@ -0,0 +1,247 @@
+如何使用 Awk 中的流控制语句 - part12
+====
+
+回顾从 Awk 系列最开始到现在我们所讲的所有关于 Awk 的例子，你会发现不同例子中的所有命令都是顺序执行的，也就是一个接一个的执行。但是在某些场景下，我们可能希望根据一些条件来执行一些文本过滤，这个时候流控制语句就派上用场了。
+
+![](http://www.tecmint.com/wp-content/uploads/2016/08/Use-Flow-Control-Statements-in-Awk.png)
+
+Awk 包含很多的流控制语句，包括：
+
+- if-else 语句
+- for 语句
+- while 语句
+- do-while 语句
+- break 语句
+- continue 语句
+- next 语句
+- nextfile 语句
+- exit 语句
+
+但是在这个系列中，我们将详细解释：if-else，for，while，do-while 语句。关于如何使用 next 语句，如果你们记得的话，我们已经在 Awk 系列的第6部分介绍过了。
+
+### 1. if-else 语句
+
+if 语句的语法和 shell 里面的 if 语句类似：
+
+```
+if  (condition1) {
+actions1
+}
+else {
+actions2
+}
+```
+
+上面的语法中，condition1 和 condition2 是 Awk 的表达式，actions1 和 actions2 是当相应的条件满足时执行的 Awk 命令。
+
+当 condition1 满足时，意味着它的值是 true，此时会执行 actions1，if 语句退出，否则（译注：condition1 为 false）执行 actions2。
+
+if 语句可以扩展成如下的 if-else_if-else：
+
+```
+if (condition1){
+actions1
+}
+else if (conditions2){
+actions2
+}
+else{
+actions3
+}
+```
+
+上面例子中，如果 condition1 为 true，执行 actions1，if 语句退出；否则对 condition2 求值，如果值为 true，那么执行 actions2，if 语句退出。然而如果 condition2 是 false，那么会执行 actions3 退出 if语句。
+
+下面是一个使用 if 语句的例子，我们有一个存储用户和他们年龄列表的文件，users.txt。
+
+我们想要打印用户的名字以及他们的年龄是大于 25 还是小于 25。
+
+```
+aaronkilik@tecMint ~ $ cat users.txt
+Sarah L			35    	F
+Aaron Kili		40    	M
+John  Doo		20    	M
+Kili  Seth		49    	M    
+```
+
+我们可以写一个简短的 shell 脚本来执行我们上面的任务，下面是脚本的内容：
+
+```
+#!/bin/bash
+awk ' { 
+if ( $3 <= 25 ){
+print "User",$1,$2,"is less than 25 years old." ;
+}
+else {
+print "User",$1,$2,"is more than 25 years old" ; 
+}
+}'    ~/users.txt
+```
+
+保存文件后退出，执行下面命令让脚本可执行，然后执行：
+
+```
+$ chmod +x test.sh
+$ ./test.sh
+```
+
+输出样例
+
+```
+User Sarah L is more than 25 years old
+User Aaron Kili is more than 25 years old
+User John Doo is less than 25 years old.
+User Kili Seth is more than 25 years old
+```
+
+### 2. for 语句
+
+如果你想循环执行一些 Awk 命令，那么 for 语句十分合适，它的语法如下：
+
+这里只是简单的定义一个计数器来控制循环的执行。首先你要初始化那个计数器 （counter），然后根据某个条件判断是否执行，如果该条件为 true 则执行，最后增加计数器。当计数器不满足条件时则终止循环。
+
+```
+for ( counter-initialization; test-condition; counter-increment ){
+actions
+}
+```
+
+下面的 Awk 命令利用打印数字 0-10 来说明 for 语句是怎么工作的。
+
+```
+$ awk 'BEGIN{ for(counter=0;counter<=10;counter++){ print counter} }'
+```
+
+输出样例
+
+```
+0
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+```
+
+### 3. while 语句
+
+传统的 while 语句语法如下：
+
+```
+while ( condition ) {
+actions
+}
+```
+
+上面的 condition 是 Awk 表达式，actions 是当 condition 为 true 时执行的 Awk命令。
+
+下面是仍然用打印数字 0-10 来解释 while 语句的用法：
+
+```
+#!/bin/bash
+awk ' BEGIN{ counter=0 ;
+while(counter<=10){
+print counter;
+counter+=1 ;
+}
+}
+```
+
+保存文件，让文件可执行，然后执行：
+
+```
+$ chmod +x test.sh
+$ ./test.sh
+```
+
+输出样例
+Sample Output
+
+```
+0
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+```
+
+### 4. do-while 语句
+
+这个是上面的 while 语句语法的一个变化，其语法如下：
+
+```
+do {
+actions
+}
+while (condition) 
+```
+
+二者的区别是，在 do-while 中，Awk 的命令在条件求值前先执行。我们使用 while 语句中同样的例子来解释 do-while 的使用，将 test.sh 脚本中的 Awk 命令做如下更改： 
+
+```
+#!/bin/bash
+awk ' BEGIN{ counter=0 ;  
+do{
+print counter;  
+counter+=1 ;    
+}
+while (counter<=10)   
+} 
+'
+```
+
+修改脚本后，保存退出。让脚本可执行，然后按如下方式执行：
+
+```
+$ chmod +x test.sh
+$ ./test.sh
+```
+
+输出样例
+
+```
+0
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+```
+
+### 结论
+
+前面指出，这并不是一个 Awk 流控制的完整介绍。在 Awk 中还有其它几个流控制语句。
+
+不管怎样，Awk 系列的此部分给你一个如何基于某些条件来控制 Awk 命令执行的基本概念。
+
+你可以接着通过仔细看看其余的流控制语句来获得关于这个主题的更多知识。最后，Awk 系列的下一部分，我们将会介绍如何写 Awk 脚本。
+
+--------------------------------------------------------------------------------
+
+via: http://www.tecmint.com/use-flow-control-statements-with-awk-command/
+
+作者：[Aaron Kili][a]
+译者：[chunyang-wen](https://github.com/chunyang-wen)
+校对：[校对者ID](https://github.com/校对者ID)
+
+本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译，[Linux中国](https://linux.cn/) 荣誉推出
+
+[a]: http://www.tecmint.com/author/aaronkili/
+
+