mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-03-12 01:40:10 +08:00
Merge branch 'master' of https://github.com/fuzheng1998/TranslateProject
update to latest
This commit is contained in:
commit
59309b7195
@ -1,157 +0,0 @@
|
||||
"amesy translating."
|
||||
|
||||
Education of a Programmer
|
||||
============================================================
|
||||
|
||||
_When I left Microsoft in October 2016 after almost 21 years there and almost 35 years in the industry, I took some time to reflect on what I had learned over all those years. This is a lightly edited version of that post. Pardon the length!_
|
||||
|
||||
There are an amazing number of things you need to know to be a proficient programmer — details of languages, APIs, algorithms, data structures, systems and tools. These things change all the time — new languages and programming environments spring up and there always seems to be some hot new tool or language that “everyone” is using. It is important to stay current and proficient. A carpenter needs to know how to pick the right hammer and nail for the job and needs to be competent at driving the nail straight and true.
|
||||
|
||||
At the same time, I’ve found that there are some concepts and strategies that are applicable over a wide range of scenarios and across decades. We have seen multiple orders of magnitude change in the performance and capability of our underlying devices and yet certain ways of thinking about the design of systems still say relevant. These are more fundamental than any specific implementation. Understanding these recurring themes is hugely helpful in both the analysis and design of the complex systems we build.
|
||||
|
||||
Humility and Ego
|
||||
|
||||
This is not limited to programming, but in an area like computing which exhibits so much constant change, one needs a healthy balance of humility and ego. There is always more to learn and there is always someone who can help you learn it — if you are willing and open to that learning. One needs both the humility to recognize and acknowledge what you don’t know and the ego that gives you confidence to master a new area and apply what you already know. The biggest challenges I have seen are when someone works in a single deep area for a long time and “forgets” how good they are at learning new things. The best learning comes from actually getting hands dirty and building something, even if it is just a prototype or hack. The best programmers I know have had both a broad understanding of technology while at the same time have taken the time to go deep into some technology and become the expert. The deepest learning happens when you struggle with truly hard problems.
|
||||
|
||||
End to End Argument
|
||||
|
||||
Back in 1981, Jerry Saltzer, Dave Reed and Dave Clark were doing early work on the Internet and distributed systems and wrote up their [classic description][4] of the end to end argument. There is much misinformation out there on the Internet so it can be useful to go back and read the original paper. They were humble in not claiming invention — from their perspective this was a common engineering strategy that applies in many areas, not just in communications. They were simply writing it down and gathering examples. A minor paraphrasing is:
|
||||
|
||||
When implementing some function in a system, it can be implemented correctly and completely only with the knowledge and participation of the endpoints of the system. In some cases, a partial implementation in some internal component of the system may be important for performance reasons.
|
||||
|
||||
The SRC paper calls this an “argument”, although it has been elevated to a “principle” on Wikipedia and in other places. In fact, it is better to think of it as an argument — as they detail, one of the hardest problem for a system designer is to determine how to divide responsibilities between components of a system. This ends up being a discussion that involves weighing the pros and cons as you divide up functionality, isolate complexity and try to design a reliable, performant system that will be flexible to evolving requirements. There is no simple set of rules to follow.
|
||||
|
||||
Much of the discussion on the Internet focuses on communications systems, but the end-to-end argument applies in a much wider set of circumstances. One example in distributed systems is the idea of “eventual consistency”. An eventually consistent system can optimize and simplify by letting elements of the system get into a temporarily inconsistent state, knowing that there is a larger end-to-end process that can resolve these inconsistencies. I like the example of a scaled-out ordering system (e.g. as used by Amazon) that doesn’t require every request go through a central inventory control choke point. This lack of a central control point might allow two endpoints to sell the same last book copy, but the overall system needs some type of resolution system in any case, e.g. by notifying the customer that the book has been backordered. That last book might end up getting run over by a forklift in the warehouse before the order is fulfilled anyway. Once you realize an end-to-end resolution system is required and is in place, the internal design of the system can be optimized to take advantage of it.
|
||||
|
||||
In fact, it is this design flexibility in the service of either ongoing performance optimization or delivering other system features that makes this end-to-end approach so powerful. End-to-end thinking often allows internal performance flexibility which makes the overall system more robust and adaptable to changes in the characteristics of each of the components. This makes an end-to-end approach “anti-fragile” and resilient to change over time.
|
||||
|
||||
An implication of the end-to-end approach is that you want to be extremely careful about adding layers and functionality that eliminates overall performance flexibility. (Or other flexibility, but performance, especially latency, tends to be special.) If you expose the raw performance of the layers you are built on, end-to-end approaches can take advantage of that performance to optimize for their specific requirements. If you chew up that performance, even in the service of providing significant value-add functionality, you eliminate design flexibility.
|
||||
|
||||
The end-to-end argument intersects with organizational design when you have a system that is large and complex enough to assign whole teams to internal components. The natural tendency of those teams is to extend the functionality of those components, often in ways that start to eliminate design flexibility for applications trying to deliver end-to-end functionality built on top of them.
|
||||
|
||||
One of the challenges in applying the end-to-end approach is determining where the end is. “Little fleas have lesser fleas… and so on ad infinitum.”
|
||||
|
||||
Concentrating Complexity
|
||||
|
||||
Coding is an incredibly precise art, with each line of execution required for correct operation of the program. But this is misleading. Programs are not uniform in the overall complexity of their components or the complexity of how those components interact. The most robust programs isolate complexity in a way that lets significant parts of the system appear simple and straightforward and interact in simple ways with other components in the system. Complexity hiding can be isomorphic with other design approaches like information hiding and data abstraction but I find there is a different design sensibility if you really focus on identifying where the complexity lies and how you are isolating it.
|
||||
|
||||
The example I’ve returned to over and over again in my [writing][5] is the screen repaint algorithm that was used by early character video terminal editors like VI and EMACS. The early video terminals implemented control sequences for the core action of painting characters as well as additional display functions to optimize redisplay like scrolling the current lines up or down or inserting new lines or moving characters within a line. Each of those commands had different costs and those costs varied across different manufacturer’s devices. (See [TERMCAP][6] for links to code and a fuller history.) A full-screen application like a text editor wanted to update the screen as quickly as possible and therefore needed to optimize its use of these control sequences to transition the screen from one state to another.
|
||||
|
||||
These applications were designed so this underlying complexity was hidden. The parts of the system that modify the text buffer (where most innovation in functionality happens) completely ignore how these changes are converted into screen update commands. This is possible because the performance cost of computing the optimal set of updates for _any_ change in the content is swamped by the performance cost of actually executing the update commands on the terminal itself. It is a common pattern in systems design that performance analysis plays a key part in determining how and where to hide complexity. The screen update process can be asynchronous to the changes in the underlying text buffer and can be independent of the actual historical sequence of changes to the buffer. It is not important _how_ the buffer changed, but only _what_ changed. This combination of asynchronous coupling, elimination of the combinatorics of historical path dependence in the interaction between components and having a natural way for interactions to efficiently batch together are common characteristics used to hide coupling complexity.
|
||||
|
||||
Success in hiding complexity is determined not by the component doing the hiding but by the consumers of that component. This is one reason why it is often so critical for a component provider to actually be responsible for at least some piece of the end-to-end use of that component. They need to have clear optics into how the rest of the system interacts with their component and how (and whether) complexity leaks out. This often shows up as feedback like “this component is hard to use” — which typically means that it is not effectively hiding the internal complexity or did not pick a functional boundary that was amenable to hiding that complexity.
|
||||
|
||||
Layering and Componentization
|
||||
|
||||
It is the fundamental role of a system designer to determine how to break down a system into components and layers; to make decisions about what to build and what to pick up from elsewhere. Open Source may keep money from changing hands in this “build vs. buy” decision but the dynamics are the same. An important element in large scale engineering is understanding how these decisions will play out over time. Change fundamentally underlies everything we do as programmers, so these design choices are not only evaluated in the moment, but are evaluated in the years to come as the product continues to evolve.
|
||||
|
||||
Here are a few things about system decomposition that end up having a large element of time in them and therefore tend to take longer to learn and appreciate.
|
||||
|
||||
* Layers are leaky. Layers (or abstractions) are [fundamentally leaky][1]. These leaks have consequences immediately but also have consequences over time, in two ways. One consequence is that the characteristics of the layer leak through and permeate more of the system than you realize. These might be assumptions about specific performance characteristics or behavior ordering that is not an explicit part of the layer contract. This means that you generally are more _vulnerable_ to changes in the internal behavior of the component that you understood. A second consequence is it also means you are more _dependent_ on that internal behavior than is obvious, so if you consider changing that layer the consequences and challenges are probably larger than you thought.
|
||||
* Layers are too functional. It is almost a truism that a component you adopt will have more functionality than you actually require. In some cases, the decision to use it is based on leveraging that functionality for future uses. You adopt specifically because you want to “get on the train” and leverage the ongoing work that will go into that component. There are a few consequences of building on this highly functional layer. 1) The component will often make trade-offs that are biased by functionality that you do not actually require. 2) The component will embed complexity and constraints because of functionality you do not require and those constraints will impede future evolution of that component. 3) There will be more surface area to leak into your application. Some of that leakage will be due to true “leaky abstractions” and some will be explicit (but generally poorly controlled) increased dependence on the full capabilities of the component. Office is big enough that we found that for any layer we built on, we eventually fully explored its functionality in some part of the system. While that might appear to be positive (we are more completely leveraging the component), all uses are not equally valuable. So we end up having a massive cost to move from one layer to another based on this long-tail of often lower value and poorly recognized use cases. 4) The additional functionality creates complexity and opportunities for misuse. An XML validation API we used would optionally dynamically download the schema definition if it was specified as part of the XML tree. This was mistakenly turned on in our basic file parsing code which resulted in both a massive performance degradation as well as an (unintentional) distributed denial of service attack on a w3c.org web server. (These are colloquially known as “land mine” APIs.)
|
||||
* Layers get replaced. Requirements evolve, systems evolve, components are abandoned. You eventually need to replace that layer or component. This is true for external component dependencies as well as internal ones. This means that the issues above will end up becoming important.
|
||||
* Your build vs. buy decision will change. This is partly a corollary of above. This does not mean the decision to build or buy was wrong at the time. Often there was no appropriate component when you started and it only becomes available later. Or alternatively, you use a component but eventually find that it does not match your evolving requirements and your requirements are narrow enough, well-understood or so core to your value proposition that it makes sense to own it yourself. It does mean that you need to be just as concerned about leaky layers permeating more of the system for layers you build as well as for layers you adopt.
|
||||
* Layers get thick. As soon as you have defined a layer, it starts to accrete functionality. The layer is the natural throttle point to optimize for your usage patterns. The difficulty with a thick layer is that it tends to reduce your ability to leverage ongoing innovation in underlying layers. In some sense this is why OS companies hate thick layers built on top of their core evolving functionality — the pace at which innovation can be adopted is inherently slowed. One disciplined approach to avoid this is to disallow any additional state storage in an adaptor layer. Microsoft Foundation Classes took this general approach in building on top of Win32\. It is inevitably cheaper in the short term to just accrete functionality on to an existing layer (leading to all the eventual problems above) rather than refactoring and recomponentizing. A system designer who understands this looks for opportunities to break apart and simplify components rather than accrete more and more functionality within them.
|
||||
|
||||
Einsteinian Universe
|
||||
|
||||
I had been designing asynchronous distributed systems for decades but was struck by this quote from Pat Helland, a SQL architect, at an internal Microsoft talk. “We live in an Einsteinian universe — there is no such thing as simultaneity. “ When building distributed systems — and virtually everything we build is a distributed system — you cannot hide the distributed nature of the system. It’s just physics. This is one of the reasons I’ve always felt Remote Procedure Call, and especially “transparent” RPC that explicitly tries to hide the distributed nature of the interaction, is fundamentally wrong-headed. You need to embrace the distributed nature of the system since the implications almost always need to be plumbed completely through the system design and into the user experience.
|
||||
|
||||
Embracing the distributed nature of the system leads to a number of things:
|
||||
|
||||
* You think through the implications to the user experience from the start rather than trying to patch on error handling, cancellation and status reporting as an afterthought.
|
||||
* You use asynchronous techniques to couple components. Synchronous coupling is _impossible._ If something appears synchronous, it’s because some internal layer has tried to hide the asynchrony and in doing so has obscured (but definitely not hidden) a fundamental characteristic of the runtime behavior of the system.
|
||||
* You recognize and explicitly design for interacting state machines and that these states represent robust long-lived internal system states (rather than ad-hoc, ephemeral and undiscoverable state encoded by the value of variables in a deep call stack).
|
||||
* You recognize that failure is expected. The only guaranteed way to detect failure in a distributed system is to simply decide you have waited “too long”. This naturally means that [cancellation is first-class][2]. Some layer of the system (perhaps plumbed through to the user) will need to decide it has waited too long and cancel the interaction. Cancelling is only about reestablishing local state and reclaiming local resources — there is no way to reliably propagate that cancellation through the system. It can sometimes be useful to have a low-cost, unreliable way to attempt to propagate cancellation as a performance optimization.
|
||||
* You recognize that cancellation is not rollback since it is just reclaiming local resources and state. If rollback is necessary, it needs to be an end-to-end feature.
|
||||
* You accept that you can never really know the state of a distributed component. As soon as you discover the state, it may have changed. When you send an operation, it may be lost in transit, it might be processed but the response is lost, or it may take some significant amount of time to process so the remote state ultimately transitions at some arbitrary time in the future. This leads to approaches like idempotent operations and the ability to robustly and efficiently rediscover remote state rather than expecting that distributed components can reliably track state in parallel. The concept of “[eventual consistency][3]” succinctly captures many of these ideas.
|
||||
|
||||
I like to say you should “revel in the asynchrony”. Rather than trying to hide it, you accept it and design for it. When you see a technique like idempotency or immutability, you recognize them as ways of embracing the fundamental nature of the universe, not just one more design tool in your toolbox.
|
||||
|
||||
Performance
|
||||
|
||||
I am sure Don Knuth is horrified by how misunderstood his partial quote “Premature optimization is the root of all evil” has been. In fact, performance, and the incredible exponential improvements in performance that have continued for over 6 decades (or more than 10 decades depending on how willing you are to project these trends through discrete transistors, vacuum tubes and electromechanical relays), underlie all of the amazing innovation we have seen in our industry and all the change rippling through the economy as “software eats the world”.
|
||||
|
||||
A key thing to recognize about this exponential change is that while all components of the system are experiencing exponential change, these exponentials are divergent. So the rate of increase in capacity of a hard disk changes at a different rate from the capacity of memory or the speed of the CPU or the latency between memory and CPU. Even when trends are driven by the same underlying technology, exponentials diverge. [Latency improvements fundamentally trail bandwidth improvements][7]. Exponential change tends to look linear when you are close to it or over short periods but the effects over time can be overwhelming. This overwhelming change in the relationship between the performance of components of the system forces reevaluation of design decisions on a regular basis.
|
||||
|
||||
A consequence of this is that design decisions that made sense at one point no longer make sense after a few years. Or in some cases an approach that made sense two decades ago starts to look like a good trade-off again. Modern memory mapping has characteristics that look more like process swapping of the early time-sharing days than it does like demand paging. (This does sometimes result in old codgers like myself claiming that “that’s just the same approach we used back in ‘75” — ignoring the fact that it didn’t make sense for 40 years and now does again because some balance between two components — maybe flash and NAND rather than disk and core memory — has come to resemble a previous relationship).
|
||||
|
||||
Important transitions happen when these exponentials cross human constraints. So you move from a limit of two to the sixteenth characters (which a single user can type in a few hours) to two to the thirty-second (which is beyond what a single person can type). So you can capture a digital image with higher resolution than the human eye can perceive. Or you can store an entire music collection on a hard disk small enough to fit in your pocket. Or you can store a digitized video recording on a hard disk. And then later the ability to stream that recording in real time makes it possible to “record” it by storing it once centrally rather than repeatedly on thousands of local hard disks.
|
||||
|
||||
The things that stay as a fundamental constraint are three dimensions and the speed of light. We’re back to that Einsteinian universe. We will always have memory hierarchies — they are fundamental to the laws of physics. You will always have stable storage and IO, memory, computation and communications. The relative capacity, latency and bandwidth of these elements will change, but the system is always about how these elements fit together and the balance and tradeoffs between them. Jim Gray was the master of this analysis.
|
||||
|
||||
Another consequence of the fundamentals of 3D and the speed of light is that much of performance analysis is about three things: locality, locality, locality. Whether it is packing data on disk, managing processor cache hierarchies, or coalescing data into a communications packet, how data is packed together, the patterns for how you touch that data with locality over time and the patterns of how you transfer that data between components is fundamental to performance. Focusing on less code operating on less data with more locality over space and time is a good way to cut through the noise.
|
||||
|
||||
Jon Devaan used to say “design the data, not the code”. This also generally means when looking at the structure of a system, I’m less interested in seeing how the code interacts — I want to see how the data interacts and flows. If someone tries to explain a system by describing the code structure and does not understand the rate and volume of data flow, they do not understand the system.
|
||||
|
||||
A memory hierarchy also implies we will always have caches — even if some system layer is trying to hide it. Caches are fundamental but also dangerous. Caches are trying to leverage the runtime behavior of the code to change the pattern of interaction between different components in the system. They inherently need to model that behavior, even if that model is implicit in how they fill and invalidate the cache and test for a cache hit. If the model is poor _or becomes_ poor as the behavior changes, the cache will not operate as expected. A simple guideline is that caches _must_ be instrumented — their behavior will degrade over time because of changing behavior of the application and the changing nature and balance of the performance characteristics of the components you are modeling. Every long-time programmer has cache horror stories.
|
||||
|
||||
I was lucky that my early career was spent at BBN, one of the birthplaces of the Internet. It was very natural to think about communications between asynchronous components as the natural way systems connect. Flow control and queueing theory are fundamental to communications systems and more generally the way that any asynchronous system operates. Flow control is inherently resource management (managing the capacity of a channel) but resource management is the more fundamental concern. Flow control also is inherently an end-to-end responsibility, so thinking about asynchronous systems in an end-to-end way comes very naturally. The story of [buffer bloat][8]is well worth understanding in this context because it demonstrates how lack of understanding the dynamics of end-to-end behavior coupled with technology “improvements” (larger buffers in routers) resulted in very long-running problems in the overall network infrastructure.
|
||||
|
||||
The concept of “light speed” is one that I’ve found useful in analyzing any system. A light speed analysis doesn’t start with the current performance, it asks “what is the best theoretical performance I could achieve with this design?” What is the real information content being transferred and at what rate of change? What is the underlying latency and bandwidth between components? A light speed analysis forces a designer to have a deeper appreciation for whether their approach could ever achieve the performance goals or whether they need to rethink their basic approach. It also forces a deeper understanding of where performance is being consumed and whether this is inherent or potentially due to some misbehavior. From a constructive point of view, it forces a system designer to understand what are the true performance characteristics of their building blocks rather than focusing on the other functional characteristics.
|
||||
|
||||
I spent much of my career building graphical applications. A user sitting at one end of the system defines a key constant and constraint in any such system. The human visual and nervous system is not experiencing exponential change. The system is inherently constrained, which means a system designer can leverage ( _must_ leverage) those constraints, e.g. by virtualization (limiting how much of the underlying data model needs to be mapped into view data structures) or by limiting the rate of screen update to the perception limits of the human visual system.
|
||||
|
||||
The Nature of Complexity
|
||||
|
||||
I have struggled with complexity my entire career. Why do systems and apps get complex? Why doesn’t development within an application domain get easier over time as the infrastructure gets more powerful rather than getting harder and more constrained? In fact, one of our key approaches for managing complexity is to “walk away” and start fresh. Often new tools or languages force us to start from scratch which means that developers end up conflating the benefits of the tool with the benefits of the clean start. The clean start is what is fundamental. This is not to say that some new tool, platform or language might not be a great thing, but I can guarantee it will not solve the problem of complexity growth. The simplest way of controlling complexity growth is to build a smaller system with fewer developers.
|
||||
|
||||
Of course, in many cases “walking away” is not an alternative — the Office business is built on hugely valuable and complex assets. With OneNote, Office “walked away” from the complexity of Word in order to innovate along a different dimension. Sway is another example where Office decided that we needed to free ourselves from constraints in order to really leverage key environmental changes and the opportunity to take fundamentally different design approaches. With the Word, Excel and PowerPoint web apps, we decided that the linkage with our immensely valuable data formats was too fundamental to walk away from and that has served as a significant and ongoing constraint on development.
|
||||
|
||||
I was influenced by Fred Brook’s “[No Silver Bullet][9]” essay about accident and essence in software development. There is much irreducible complexity embedded in the essence of what the software is trying to model. I just recently re-read that essay and found it surprising on re-reading that two of the trends he imbued with the most power to impact future developer productivity were increasing emphasis on “buy” in the “build vs. buy” decision — foreshadowing the change that open-source and cloud infrastructure has had. The other trend was the move to more “organic” or “biological” incremental approaches over more purely constructivist approaches. A modern reader sees that as the shift to agile and continuous development processes. This in 1986!
|
||||
|
||||
I have been much taken with the work of Stuart Kauffman on the fundamental nature of complexity. Kauffman builds up from a simple model of Boolean networks (“[NK models][10]”) and then explores the application of this fundamentally mathematical construct to things like systems of interacting molecules, genetic networks, ecosystems, economic systems and (in a limited way) computer systems to understand the mathematical underpinning to emergent ordered behavior and its relationship to chaotic behavior. In a highly connected system, you inherently have a system of conflicting constraints that makes it (mathematically) hard to evolve that system forward (viewed as an optimization problem over a rugged landscape). A fundamental way of controlling this complexity is to batch the system into independent elements and limit the interconnections between elements (essentially reducing both “N” and “K” in the NK model). Of course this feels natural to a system designer applying techniques of complexity hiding, information hiding and data abstraction and using loose asynchronous coupling to limit interactions between components.
|
||||
|
||||
A challenge we always face is that many of the ways we want to evolve our systems cut across all dimensions. Real-time co-authoring has been a very concrete (and complex) recent example for the Office apps.
|
||||
|
||||
Complexity in our data models often equates with “power”. An inherent challenge in designing user experiences is that we need to map a limited set of gestures into a transition in the underlying data model state space. Increasing the dimensions of the state space inevitably creates ambiguity in the user gesture. This is “[just math][11]” which means that often times the most fundamental way to ensure that a system stays “easy to use” is to constrain the underlying data model.
|
||||
|
||||
Management
|
||||
|
||||
I started taking leadership roles in high school (student council president!) and always found it natural to take on larger responsibilities. At the same time, I was always proud that I continued to be a full-time programmer through every management stage. VP of development for Office finally pushed me over the edge and away from day-to-day programming. I’ve enjoyed returning to programming as I stepped away from that job over the last year — it is an incredibly creative and fulfilling activity (and maybe a little frustrating at times as you chase down that “last” bug).
|
||||
|
||||
Despite having been a “manager” for over a decade by the time I arrived at Microsoft, I really learned about management after my arrival in 1996\. Microsoft reinforced that “engineering leadership is technical leadership”. This aligned with my perspective and helped me both accept and grow into larger management responsibilities.
|
||||
|
||||
The thing that most resonated with me on my arrival was the fundamental culture of transparency in Office. The manager’s job was to design and use transparent processes to drive the project. Transparency is not simple, automatic, or a matter of good intentions — it needs to be designed into the system. The best transparency comes by being able to track progress as the granular output of individual engineers in their day-to-day activity (work items completed, bugs opened and fixed, scenarios complete). Beware subjective red/green/yellow, thumbs-up/thumbs-down dashboards!
|
||||
|
||||
I used to say my job was to design feedback loops. Transparent processes provide a way for every participant in the process — from individual engineer to manager to exec to use the data being tracked to drive the process and result and understand the role they are playing in the overall project goals. Ultimately transparency ends up being a great tool for empowerment — the manager can invest more and more local control in those closest to the problem because of confidence they have visibility to the progress being made. Coordination emerges naturally.
|
||||
|
||||
Key to this is that the goal has actually been properly framed (including key resource constraints like ship schedule). Decision-making that needs to constantly flow up and down the management chain usually reflects poor framing of goals and constraints by management.
|
||||
|
||||
I was at Beyond Software when I really internalized the importance of having a singular leader over a project. The engineering manager departed (later to hire me away for FrontPage) and all four of the leads were hesitant to step into the role — not least because we did not know how long we were going to stick around. We were all very technically sharp and got along well so we decided to work as peers to lead the project. It was a mess. The one obvious problem is that we had no strategy for allocating resources between the pre-existing groups — one of the top responsibilities of management! The deep accountability one feels when you know you are personally in charge was missing. We had no leader really accountable for unifying goals and defining constraints.
|
||||
|
||||
I have a visceral memory of the first time I fully appreciated the importance of _listening_ for a leader. I had just taken on the role of Group Development Manager for Word, OneNote, Publisher and Text Services. There was a significant controversy about how we were organizing the text services team and I went around to each of the key participants, heard what they had to say and then integrated and wrote up all I had heard. When I showed the write-up to one of the key participants, his reaction was “wow, you really heard what I had to say”! All of the largest issues I drove as a manager (e.g. cross-platform and the shift to continuous engineering) involved carefully listening to all the players. Listening is an active process that involves trying to understand the perspectives and then writing up what I learned and testing it to validate my understanding. When a key hard decision needed to happen, by the time the call was made everyone knew they had been heard and understood (whether they agreed with the decision or not).
|
||||
|
||||
It was the previous job, as FrontPage development manager, where I internalized the “operational dilemma” inherent in decision making with partial information. The longer you wait, the more information you will have to make a decision. But the longer you wait, the less flexibility you will have to actually implement it. At some point you just need to make a call.
|
||||
|
||||
Designing an organization involves a similar tension. You want to increase the resource domain so that a consistent prioritization framework can be applied across a larger set of resources. But the larger the resource domain, the harder it is to actually have all the information you need to make good decisions. An organizational design is about balancing these two factors. Software complicates this because characteristics of the software can cut across the design in an arbitrary dimensionality. Office has used [shared teams][12] to address both these issues (prioritization and resources) by having cross-cutting teams that can share work (add resources) with the teams they are building for.
|
||||
|
||||
One dirty little secret you learn as you move up the management ladder is that you and your new peers aren’t suddenly smarter because you now have more responsibility. This reinforces that the organization as a whole better be smarter than the leader at the top. Empowering every level to own their decisions within a consistent framing is the key approach to making this true. Listening and making yourself accountable to the organization for articulating and explaining the reasoning behind your decisions is another key strategy. Surprisingly, fear of making a dumb decision can be a useful motivator for ensuring you articulate your reasoning clearly and make sure you listen to all inputs.
|
||||
|
||||
Conclusion
|
||||
|
||||
At the end of my interview round for my first job out of college, the recruiter asked if I was more interested in working on “systems” or “apps”. I didn’t really understand the question. Hard, interesting problems arise at every level of the software stack and I’ve had fun plumbing all of them. Keep learning.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://hackernoon.com/education-of-a-programmer-aaecf2d35312
|
||||
|
||||
作者:[ Terry Crowley][a]
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]:https://hackernoon.com/@terrycrowley
|
||||
[1]:https://medium.com/@terrycrowley/leaky-by-design-7b423142ece0#.x67udeg0a
|
||||
[2]:https://medium.com/@terrycrowley/how-to-think-about-cancellation-3516fc342ae#.3pfjc5b54
|
||||
[3]:http://queue.acm.org/detail.cfm?id=2462076
|
||||
[4]:http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf
|
||||
[5]:https://medium.com/@terrycrowley/model-view-controller-and-loose-coupling-6370f76e9cde#.o4gnupqzq
|
||||
[6]:https://en.wikipedia.org/wiki/Termcap
|
||||
[7]:http://www.ll.mit.edu/HPEC/agendas/proc04/invited/patterson_keynote.pdf
|
||||
[8]:https://en.wikipedia.org/wiki/Bufferbloat
|
||||
[9]:http://worrydream.com/refs/Brooks-NoSilverBullet.pdf
|
||||
[10]:https://en.wikipedia.org/wiki/NK_model
|
||||
[11]:https://medium.com/@terrycrowley/the-math-of-easy-to-use-14645f819201#.untmk9eq7
|
||||
[12]:https://medium.com/@terrycrowley/breaking-conways-law-a0fdf8500413#.gqaqf1c5k
|
@ -1,993 +0,0 @@
|
||||
Translating by qhwdw Network automation with Ansible
|
||||
================
|
||||
|
||||
### Network Automation
|
||||
|
||||
As the IT industry transforms with technologies from server virtualization to public and private clouds with self-service capabilities, containerized applications, and Platform as a Service (PaaS) offerings, one of the areas that continues to lag behind is the network.
|
||||
|
||||
Over the past 5+ years, the network industry has seen many new trends emerge, many of which are categorized as software-defined networking (SDN).
|
||||
|
||||
###### Note
|
||||
|
||||
SDN is a new approach to building, managing, operating, and deploying networks. The original definition for SDN was that there needed to be a physical separation of the control plane from the data (packet forwarding) plane, and the decoupled control plane must control several devices.
|
||||
|
||||
Nowadays, many more technologies get put under the _SDN umbrella_, including controller-based networks, APIs on network devices, network automation, whitebox switches, policy networking, Network Functions Virtualization (NFV), and the list goes on.
|
||||
|
||||
For purposes of this report, we refer to SDN solutions as solutions that include a network controller as part of the solution, and improve manageability of the network but don’t necessarily decouple the control plane from the data plane.
|
||||
|
||||
One of these trends is the emergence of application programming interfaces (APIs) on network devices as a way to manage and operate these devices and truly offer machine to machine communication. APIs simplify the development process when it comes to automation and building network applications, providing more structure on how data is modeled. For example, when API-enabled devices return data in JSON/XML, it is structured and easier to work with as compared to CLI-only devices that return raw text that then needs to be manually parsed.
|
||||
|
||||
Prior to APIs, the two primary mechanisms used to configure and manage network devices were the command-line interface (CLI) and Simple Network Management Protocol (SNMP). If we look at each of those, the CLI was meant as a human interface to the device, and SNMP wasn’t built to be a real-time programmatic interface for network devices.
|
||||
|
||||
Luckily, as many vendors scramble to add APIs to devices, sometimes _just because_ it’s a check in the box on an RFP, there is actually a great byproduct—enabling network automation. Once a true API is exposed, the process for accessing data within the device, as well as managing the configuration, is greatly simplified, but as we’ll review in this report, automation is also possible using more traditional methods, such as CLI/SNMP.
|
||||
|
||||
###### Note
|
||||
|
||||
As network refreshes happen in the months and years to come, vendor APIs should no doubt be tested and used as key decision-making criteria for purchasing network equipment (virtual and physical). Users should want to know how data is modeled by the equipment, what type of transport is used by the API, if the vendor offers any libraries or integrations to automation tools, and if open standards/protocols are being used.
|
||||
|
||||
Generally speaking, network automation, like most types of automation, equates to doing things faster. While doing more faster is nice, reducing the time for deployments and configuration changes isn’t always a problem that needs solving for many IT organizations.
|
||||
|
||||
Including speed, we’ll now take a look at a few of the reasons that IT organizations of all shapes and sizes should look at gradually adopting network automation. You should note that the same principles apply to other types of automation as well.
|
||||
|
||||
|
||||
### Simplified Architectures
|
||||
|
||||
Today, every network is a unique snowflake, and network engineers take pride in solving transport and application issues with one-off network changes that ultimately make the network not only harder to maintain and manage, but also harder to automate.
|
||||
|
||||
Instead of thinking about network automation and management as a secondary or tertiary project, it needs to be included from the beginning as new architectures and designs are deployed. Which features work across vendors? Which extensions work across platforms? What type of API or automation tooling works when using particular network device platforms? When these questions get answered earlier on in the design process, the resulting architecture becomes simpler, repeatable, and easier to maintain _and_ automate, all with fewer vendor proprietary extensions enabled throughout the network.
|
||||
|
||||
### Deterministic Outcomes
|
||||
|
||||
In an enterprise organization, change review meetings take place to review upcoming changes on the network, the impact they have on external systems, and rollback plans. In a world where a human is touching the CLI to make those _upcoming changes_, the impact of typing the wrong command is catastrophic. Imagine a team with three, four, five, or 50 engineers. Every engineer may have his own way of making that particular _upcoming change_. And the ability to use a CLI or a GUI does not eliminate or reduce the chance of error during the control window for the change.
|
||||
|
||||
Using proven and tested network automation helps achieve more predictable behavior and gives the executive team a better chance at achieving deterministic outcomes, moving one step closer to having the assurance that the task is going to get done right the first time without human error.
|
||||
|
||||
|
||||
### Business Agility
|
||||
|
||||
It goes without saying that network automation offers speed and agility not only for deploying changes, but also for retrieving data from network devices as fast as the business demands. Since the advent of server virtualization, server and virtualization admins have had the ability to deploy new applications almost instantaneously. And the faster applications are deployed, the more questions are raised as to why it takes so long to configure a VLAN, route, FW ACL, or load-balancing policy.
|
||||
|
||||
By understanding the most common workflows within an organization and _why_ network changes are really required, the process to deploy modern automation tooling such as Ansible becomes much simpler.
|
||||
|
||||
This chapter introduced some of the high-level points on why you should consider network automation. In the next section, we take a look at what Ansible is and continue to dive into different types of network automation that are relevant to IT organizations of all sizes.
|
||||
|
||||
|
||||
### What Is Ansible?
|
||||
|
||||
Ansible is one of the newer IT automation and configuration management platforms that exists in the open source world. It’s often compared to other tools such as Puppet, Chef, and SaltStack. Ansible emerged on the scene in 2012 as an open source project created by Michael DeHaan, who also created Cobbler and cocreated Func, both of which are very popular in the open source community. Less than 18 months after the Ansible open source project started, Ansible Inc. was formed and received $6 million in Series A funding. It became and is still the number one contributor to and supporter of the Ansible open source project. In October 2015, Red Hat acquired Ansible Inc.
|
||||
|
||||
But, what exactly is Ansible?
|
||||
|
||||
_Ansible is a super-simple automation platform that is agentless and extensible._
|
||||
|
||||
Let’s dive into this statement in a bit more detail and look at the attributes of Ansible that have helped it gain a significant amount of traction within the industry.
|
||||
|
||||
|
||||
### Simple
|
||||
|
||||
One of the most attractive attributes of Ansible is that you _DO NOT_ need any special coding skills in order to get started. All instructions, or tasks to be automated, are documented in a standard, human-readable data format that anyone can understand. It is not uncommon to have Ansible installed and automating tasks in under 30 minutes!
|
||||
|
||||
For example, the following task from an Ansible playbook is used to ensure a VLAN exists on a Cisco Nexus switch:
|
||||
|
||||
```
|
||||
- nxos_vlan: vlan_id=100 name=web_vlan
|
||||
```
|
||||
|
||||
You can tell by looking at this almost exactly what it’s going to do without understanding or writing any code!
|
||||
|
||||
###### Note
|
||||
|
||||
The second half of this report covers the Ansible terminology (playbooks, plays, tasks, modules, etc.) in great detail. However, we have included a few brief examples in the meantime to convey key concepts when using Ansible for network automation.
|
||||
|
||||
### Agentless
|
||||
|
||||
If you look at other tools on the market, such as Puppet and Chef, you’ll learn that, by default, they require that each device you are automating have specialized software installed. This is _NOT_ the case with Ansible, and this is the major reason why Ansible is a great choice for networking automation.
|
||||
|
||||
It’s well understood that IT automation tools, including Puppet, Chef, CFEngine, SaltStack, and Ansible, were initially built to manage and automate the configuration of Linux hosts to increase the pace at which applications are deployed. Because Linux systems were being automated, getting agents installed was never a technical hurdle to overcome. If anything, it just delayed the setup, since now _N_ number of hosts (the hosts you want to automate) needed to have software deployed on them.
|
||||
|
||||
On top of that, when agents are used, there is additional complexity required for DNS and NTP configuration. These are services that most environments do have already, but when you need to get something up fairly quick or simply want to see what it can do from a test perspective, it could significantly delay the overall setup and installation process.
|
||||
|
||||
Since this report is meant to cover Ansible for network automation, it’s worth pointing out that having Ansible as an agentless platform is even more compelling to network admins than to sysadmins. Why is this?
|
||||
|
||||
It’s more compelling for network admins because as mentioned, Linux operating systems are open, and anything can be installed on them. For networking, this is definitely not the case, although it is gradually changing. If we take the most widely deployed network operating system, Cisco IOS, as just one example and ask the question, _"Can third-party software be installed on IOS based platforms?"_ it shouldn’t come as a surprise that the answer is _NO_.
|
||||
|
||||
For the last 20+ years, nearly all network operating systems have been closed and vertically integrated with the underlying network hardware. Because it’s not so easy to load an agent on a network device (router, switch, load balancer, firewall, etc.) without vendor support, having an automation platform like Ansible that was built from the ground up to be agentless and extensible is just what the doctor ordered for the network industry. We can finally start eliminating manual interactions with the network with ease!
|
||||
|
||||
### Extensible
|
||||
|
||||
Ansible is also extremely extensible. As open source and code start to play a larger role in the network industry, having platforms that are extensible is a must. This means that if the vendor or community doesn’t provide a particular feature or function, the open source community, end user, customer, consultant, or anyone else can _extend_ Ansible to enable a given set of functionality. In the past, the network vendor or tool vendor was on the hook to provide the new plug-ins and integrations. Imagine using an automation platform like Ansible, and your network vendor of choice releases a new feature that you _really_ need automated. While the network vendor or Ansible could in theory release the new plug-in to automate that particular feature, the great thing is, anyone from your internal engineers to your value-added reseller (VARs) or consultant could now provide these integrations.
|
||||
|
||||
It is a fact that Ansible is extremely extensible because as stated, Ansible was initially built to automate applications and systems. It is because of Ansible’s extensibility that Ansible integrations have been written for network vendors, including but not limited to Cisco, Arista, Juniper, F5, HP, A10, Cumulus, and Palo Alto Networks.
|
||||
|
||||
|
||||
### Why Ansible for Network Automation?
|
||||
|
||||
We’ve taken a brief look at what Ansible is and also some of the benefits of network automation, but why should Ansible be used for network automation?
|
||||
|
||||
In full transparency, many of the reasons already stated are what make Ansible such as great platform for automating application deployments. However, we’ll take this a step further now, getting even more focused on networking, and continue to outline a few other key points to be aware of.
|
||||
|
||||
|
||||
### Agentless
|
||||
|
||||
The importance of an agentless architecture cannot be stressed enough when it comes to network automation, especially as it pertains to automating existing devices. If we take a look at all devices currently installed at various parts of the network, from the DMZ and campus, to the branch and data center, the lion’s share of devices do _NOT_ have a modern device API. While having an API makes things so much simpler from an automation perspective, an agentless platform like Ansible makes it possible to automate and manage those _legacy_ _(traditional)_ devices, for example, _CLI-based devices_, making it a tool that can be used in any network environment.
|
||||
|
||||
###### Note
|
||||
|
||||
If CLI-only devices are integrated with Ansible, the mechanisms as to how the devices are accessed for read-only and read-write operations occur through protocols such as telnet, SSH, and SNMP.
|
||||
|
||||
As standalone network devices like routers, switches, and firewalls continue to add support for APIs, SDN solutions are also emerging. The one common theme with SDN solutions is that they all offer a single point of integration and policy management, usually in the form of an SDN controller. This is true for solutions such as Cisco ACI, VMware NSX, Big Switch Big Cloud Fabric, and Juniper Contrail, as well as many of the other SDN offerings from companies such as Nuage, Plexxi, Plumgrid, Midokura, and Viptela. This even includes open source controllers such as OpenDaylight.
|
||||
|
||||
These solutions all simplify the management of networks, as they allow an administrator to start to migrate from box-by-box management to network-wide, single-system management. While this is a great step in the right direction, these solutions still don’t eliminate the risks for human error during change windows. For example, rather than configure _N_ switches, you may need to configure a single GUI that could take just as long in order to make the required configuration change—it may even be more complex, because after all, who prefers a GUI _over_ a CLI! Additionally, you may possibly have different types of SDN solutions deployed per application, network, region, or data center.
|
||||
|
||||
The need to automate networks, for configuration management, monitoring, and data collection, does not go away as the industry begins migrating to controller-based network architectures.
|
||||
|
||||
As most software-defined networks are deployed with a controller, nearly all controllers expose a modern REST API. And because Ansible has an agentless architecture, it makes it extremely simple to automate not only legacy devices that may not have an API, but also software-defined networking solutions via REST APIs, all without requiring any additional software (agents) on the endpoints. The net result is being able to automate any type of device using Ansible with or without an API.
|
||||
|
||||
|
||||
### Free and Open Source Software (FOSS)
|
||||
|
||||
Being that Ansible is open source with all code publicly accessible on GitHub, it is absolutely free to get started using Ansible. It can literally be installed and providing value to network engineers in minutes. Ansible, the open source project, or Ansible Inc., do not require any meetings with sales reps before they hand over software either. That is stating the obvious, since it’s true for all open source projects, but being that the use of open source, community-driven software within the network industry is fairly new and gradually increasing, we wanted to explicitly make this point.
|
||||
|
||||
It is also worth stating that Ansible, Inc. is indeed a company and needs to make money somehow, right? While Ansible is open source, it also has an enterprise product called Ansible Tower that adds features such as role-based access control (RBAC), reporting, web UI, REST APIs, multi-tenancy, and much more, which is usually a nice fit for enterprises looking to deploy Ansible. And the best part is that even Ansible Tower is _FREE_ for up to 10 devices—so, at least you can get a taste of Tower to see if it can benefit your organization without spending a dime and sitting in countless sales meetings.
|
||||
|
||||
|
||||
### Extensible
|
||||
|
||||
We stated earlier that Ansible was primarily built as an automation platform for deploying Linux applications, although it has expanded to Windows since the early days. The point is that the Ansible open source project did not have the goal of automating network infrastructure. The truth is that the more the Ansible community understood how flexible and extensible the underlying Ansible architecture was, the easier it became to _extend_ Ansible for their automation needs, which included networking. Over the past two years, there have been a number of Ansible integrations developed, many by industry independents such as Matt Oswalt, Jason Edelman, Kirk Byers, Elisa Jasinska, David Barroso, Michael Ben-Ami, Patrick Ogenstad, and Gabriele Gerbino, as well as by leading networking network vendors such as Arista, Juniper, Cumulus, Cisco, F5, and Palo Alto Networks.
|
||||
|
||||
|
||||
### Integrating into Existing DevOps Workflows
|
||||
|
||||
Ansible is used for application deployments within IT organizations. It’s used by operations teams that need to manage the deployment, monitoring, and management of various types of applications. By integrating Ansible with the network infrastructure, it expands what is possible when new applications are turned up or migrated. Rather than have to wait for a new top of rack (TOR) switch to be turned up, a VLAN to be added, or interface speed/duplex to be checked, all of these network-centric tasks can be automated and integrated into existing workflows that already exist within the IT organization.
|
||||
|
||||
|
||||
### Idempotency
|
||||
|
||||
The term _idempotency_ (pronounced item-potency) is used often in the world of software development, especially when working with REST APIs, as well as in the world of _DevOps_ automation and configuration management frameworks, including Ansible. One of Ansible’s beliefs is that all Ansible modules (integrations) should be idempotent. Okay, so what does it mean for a module to be idempotent? After all, this is a new term for most network engineers.
|
||||
|
||||
The answer is simple. Being idempotent allows the defined task to run one time or a thousand times without having an adverse effect on the target system, only ever making the change once. In other words, if a change is required to get the system into its desired state, the change is made; and if the device is already in its desired state, no change is made. This is unlike most traditional custom scripts and the copy and pasting of CLI commands into a terminal window. When the same command or script is executed repeatedly on the same system, errors are (sometimes) raised. Ever paste a command set into a router and get some type of error that invalidates the rest of your configuration? Was that fun?
|
||||
|
||||
Another example is if you have a text file or a script that configures 10 VLANs, the same commands are then entered 10 times _EVERY_ time the script is run. If an idempotent Ansible module is used, the existing configuration is gathered first from the network device, and each new VLAN being configured is checked against the current configuration. Only if the new VLAN needs to be added (or changed—VLAN name, as an example) is a change or command actually pushed to the device.
|
||||
|
||||
As the technologies become more complex, the value of idempotency only increases because with idempotency, you shouldn’t care about the _existing_ state of the network device being modified, only the _desired_ state that you are trying to achieve from a network configuration and policy perspective.
|
||||
|
||||
|
||||
### Network-Wide and Ad Hoc Changes
|
||||
|
||||
One of the problems solved with configuration management tools is configuration drift (when a device’s desired configuration gradually drifts, or changes, over time due to manual change and/or having multiple disparate tools being used in an environment)—in fact, this is where tools like Puppet and Chef got started. Agents _phone home_ to the head-end server, validate its configuration, and if a change is required, the change is made. The approach is simple enough. What if an outage occurs and you need to troubleshoot though? You usually bypass the management system, go direct to a device, find the fix, and quickly leave for the day, right? Sure enough, at the next time interval when the agent phones back home, the change made to fix the problem is overwritten (based on how the _master/head-end server_ is configured). One-off changes should always be limited in highly automated environments, but tools that still allow for them are greatly valuable. As you guessed, one of these tools is Ansible.
|
||||
|
||||
Because Ansible is agentless, there is not a default push or pull to prevent configuration drift. The tasks to automate are defined in what is called an Ansible playbook. When using Ansible, it is up to the user to run the playbook. If the playbook is to be executed at a given time interval and you’re not using Ansible Tower, you will definitely know how often the tasks are run; if you are just using the native Ansible command line from a terminal prompt, the playbook is run once and only once.
|
||||
|
||||
Running a playbook once by default is attractive for network engineers. It is added peace of mind that changes made manually on the device are not going to be automatically overwritten. Additionally, the scope of devices that a playbook is executed against is easily changed when needed such that even if a single change needs to automate only a single device, Ansible can still be used. The _scope_ of devices is determined by what is called an Ansible inventory file; the inventory could have one device or a thousand devices.
|
||||
|
||||
The following shows a sample inventory file with two groups defined and a total of six network devices:
|
||||
|
||||
```
|
||||
[core-switches]
|
||||
dc-core-1
|
||||
dc-core-2
|
||||
|
||||
[leaf-switches]
|
||||
leaf1
|
||||
leaf2
|
||||
leaf3
|
||||
leaf4
|
||||
```
|
||||
|
||||
To automate all hosts, a snippet from your play definition in a playbook looks like this:
|
||||
|
||||
```
|
||||
hosts: all
|
||||
```
|
||||
|
||||
And to automate just one leaf switch, it looks like this:
|
||||
|
||||
```
|
||||
hosts: leaf1
|
||||
```
|
||||
|
||||
And just the core switches:
|
||||
|
||||
```
|
||||
hosts: core-switches
|
||||
```
|
||||
|
||||
###### Note
|
||||
|
||||
As stated previously, playbooks, plays, and inventories are covered in more detail later on this report.
|
||||
|
||||
Being able to easily automate one device or _N_ devices makes Ansible a great choice for making those one-off changes when they are required. It’s also great for those changes that are network-wide: possibly for shutting down all interfaces of a given type, configuring interface descriptions, or adding VLANs to wiring closets across an enterprise campus network.
|
||||
|
||||
### Network Task Automation with Ansible
|
||||
|
||||
This report is gradually getting more technical in two areas. The first area is around the details and architecture of Ansible, and the second area is about exactly what types of tasks can be automated from a network perspective with Ansible. The latter is what we’ll take a look at in this chapter.
|
||||
|
||||
Automation is commonly equated with speed, and considering that some network tasks don’t require speed, it’s easy to see why some IT teams don’t see the value in automation. VLAN configuration is a great example because you may be thinking, "How _fast_ does a VLAN really need to get created? Just how many VLANs are being added on a daily basis? Do _I_ really need automation?”
|
||||
|
||||
In this section, we are going to focus on several other tasks where automation makes sense such as device provisioning, data collection, reporting, and compliance. But remember, as we stated earlier, automation is much more than speed and agility as it’s offering you, your team, and your business more predictable and more deterministic outcomes.
|
||||
|
||||
### Device Provisioning
|
||||
|
||||
One of the easiest and fastest ways to get started using Ansible for network automation is creating device configuration files that are used for initial device provisioning and pushing them to network devices.
|
||||
|
||||
If we take this process and break it down into two steps, the first step is creating the configuration file, and the second is pushing the configuration onto the device.
|
||||
|
||||
First, we need to decouple the _inputs_ from the underlying vendor proprietary syntax (CLI) of the config file. This means we’ll have separate files with values for the configuration parameters such as VLANs, domain information, interfaces, routing, and everything else, and then, of course, a configuration template file(s). For this example, this is our standard golden template that’s used for all devices getting deployed. Ansible helps bridge the gap between rendering the inputs and values with the configuration template. In less than a few seconds, Ansible can generate hundreds of configuration files predictably and reliably.
|
||||
|
||||
Let’s take a quick look at an example of taking a current configuration and decomposing it into a template and separate variables (inputs) file.
|
||||
|
||||
Here is an example of a configuration file snippet:
|
||||
|
||||
```
|
||||
hostname leaf1
|
||||
ip domain-name ntc.com
|
||||
!
|
||||
vlan 10
|
||||
name web
|
||||
!
|
||||
vlan 20
|
||||
name app
|
||||
!
|
||||
vlan 30
|
||||
name db
|
||||
!
|
||||
vlan 40
|
||||
name test
|
||||
!
|
||||
vlan 50
|
||||
name misc
|
||||
```
|
||||
|
||||
If we extract the input values, this file is transformed into a template.
|
||||
|
||||
###### Note
|
||||
|
||||
Ansible uses the Python-based Jinja2 templating language, thus the template called _leaf.j2_ is a Jinja2 template.
|
||||
|
||||
Note that in the following example the _double curly braces_ denote a variable.
|
||||
|
||||
The resulting template looks like this and is given the filename _leaf.j2_:
|
||||
|
||||
```
|
||||
!
|
||||
hostname {{ inventory_hostname }}
|
||||
ip domain-name {{ domain_name }}
|
||||
!
|
||||
!
|
||||
{% for vlan in vlans %}
|
||||
vlan {{ vlan.id }}
|
||||
name {{ vlan.name }}
|
||||
{% endfor %}
|
||||
!
|
||||
```
|
||||
|
||||
Since the double curly braces denote variables, and we see those values are not in the template, they need to be stored somewhere. They get stored in a variables file. A matching variables file for the previously shown template looks like this:
|
||||
|
||||
```
|
||||
---
|
||||
hostname: leaf1
|
||||
domain_name: ntc.com
|
||||
vlans:
|
||||
- { id: 10, name: web }
|
||||
- { id: 20, name: app }
|
||||
- { id: 30, name: db }
|
||||
- { id: 40, name: test }
|
||||
- { id: 50, name: misc }
|
||||
```
|
||||
|
||||
This means if the team that controls VLANs wants to add a VLAN to the network devices, no problem. Have them change it in the variables file and regenerate a new config file using the Ansible module called `template`. This whole process is idempotent too; only if there is a change to the template or values being entered will a new configuration file be generated.
|
||||
|
||||
Once the configuration is generated, it needs to be _pushed_ to the network device. One such method to push configuration files to network devices is using the open source Ansible module called `napalm_install_config`.
|
||||
|
||||
The next example is a sample playbook to _build and push_ a configuration to network devices. Again, this playbook uses the `template` module to build the configuration files and the `napalm_install_config` to push them and activate them as the new running configurations on the devices.
|
||||
|
||||
Even though every line isn’t reviewed in the example, you can still make out what is actually happening.
|
||||
|
||||
###### Note
|
||||
|
||||
The following playbook introduces new concepts such as the built-in variable `inventory_hostname`. These concepts are covered in [Ansible Terminology and Getting Started][1].
|
||||
|
||||
```
|
||||
---
|
||||
|
||||
- name: BUILD AND PUSH NETWORK CONFIGURATION FILES
|
||||
hosts: leaves
|
||||
connection: local
|
||||
gather_facts: no
|
||||
|
||||
tasks:
|
||||
- name: BUILD CONFIGS
|
||||
template:
|
||||
src=templates/leaf.j2
|
||||
dest=configs/{{inventory_hostname }}.conf
|
||||
|
||||
- name: PUSH CONFIGS
|
||||
napalm_install_config:
|
||||
hostname={{ inventory_hostname }}
|
||||
username={{ un }}
|
||||
password={{ pwd }}
|
||||
dev_os={{ os }}
|
||||
config_file=configs/{{ inventory_hostname }}.conf
|
||||
commit_changes=1
|
||||
replace_config=0
|
||||
```
|
||||
|
||||
This two-step process is the simplest way to get started with network automation using Ansible. You simply template your configs, build config files, and push them to the network device—otherwise known as the _BUILD and PUSH_ method.
|
||||
|
||||
###### Note
|
||||
|
||||
Another example like this is reviewed in much more detail in [Ansible Network Integrations][2].
|
||||
|
||||
|
||||
### Data Collection and Monitoring
|
||||
|
||||
Monitoring tools typically use SNMP—these tools poll certain management information bases (MIBs) and return data to the monitoring tool. Based on the data being returned, it may be more or less than you actually need. What if interface stats are being polled? You are likely getting back every counter that is displayed in a _show interface_ command. What if you only need _interface resets_ and wish to see these resets correlated to the interfaces that have CDP/LLDP neighbors on them? Of course, this is possible with current technology; it could be you are running multiple show commands and parsing the output manually, or you’re using an SNMP-based tool but going between tabs in the GUI trying to find the data you actually need. How does Ansible help with this?
|
||||
|
||||
Being that Ansible is totally open and extensible, it’s possible to collect and monitor the exact counters or values needed. This may require some up-front custom work but is totally worth it in the end, because the data being gathered is what you need, not what the vendor is providing you. Ansible also provides intuitive ways to perform certain tasks conditionally, which means based on data being returned, you can perform subsequent tasks, which may be to collect more data or to make a configuration change.
|
||||
|
||||
Network devices have _A LOT_ of static and ephemeral data buried inside, and Ansible helps extract the bits you need.
|
||||
|
||||
You can even use Ansible modules that use SNMP behind the scenes, such as a module called `snmp_device_version`. This is another open source module that exists within the community:
|
||||
|
||||
```
|
||||
- name: GET SNMP DATA
|
||||
snmp_device_version:
|
||||
host=spine
|
||||
community=public
|
||||
version=2c
|
||||
```
|
||||
|
||||
Running the preceding task returns great information about a device and adds some level of discovery capabilities to Ansible. For example, that task returns the following data:
|
||||
|
||||
```
|
||||
{"ansible_facts": {"ansible_device_os": "nxos", "ansible_device_vendor": "cisco", "ansible_device_version": "7.0(3)I2(1)"}, "changed": false}
|
||||
```
|
||||
|
||||
You can now determine what type of device something is without knowing up front. All you need to know is the read-only community string of the device.
|
||||
|
||||
|
||||
### Migrations
|
||||
|
||||
Migrating from one platform to the next is never an easy task. This may be from the same vendor or from different vendors. Vendors may offer a script or a tool to help with migrations. Ansible can be used to build out configuration templates for all types of network devices and operating systems in such a way that you could generate a configuration file for all vendors given a defined and common set of inputs (common data model). Of course, if there are vendor proprietary extensions, they’ll need to be accounted for, too. Having this type of flexibility helps with not only migrations, but also disaster recovery (DR), as it’s very common to have different switch models in the production and DR data centers, maybe even different vendors.
|
||||
|
||||
|
||||
### Configuration Management
|
||||
|
||||
As stated, configuration management is the most common type of automation. What Ansible allows you to do fairly easily is create _roles_ to streamline the consumption of task-based automation. From a high level, a role is a logical grouping of reusable tasks that are automated against a particular group of devices. Another way to think about roles is to think about workflows. First and foremost, workflows and processes need to be understood before automation is going to start adding value. It’s always important to start small and expand from there.
|
||||
|
||||
For example, a set of tasks that automate the configuration of routers and switches is very common and is a great place to start. But where do the IP addresses come from that are configured on network devices? Maybe an IP address management solution? Once the IP addresses are allocated for a given function and deployed, does DNS need to be updated too? Do DHCP scopes need to be created?
|
||||
|
||||
Can you see how the workflow can start small and gradually expand across different IT systems? As the workflow continues to expand, so would the role.
|
||||
|
||||
|
||||
### Compliance
|
||||
|
||||
As with many forms of automation, making configuration changes with any type of automation tool is seen as a risk. While making manual changes could arguably be riskier, as you’ve read and may have experienced firsthand, Ansible has capabilities to automate data collection, monitoring, and configuration building, which are all "read-only" and "low risk" actions. One _low risk_ use case that can use the data being gathered is configuration compliance checks and configuration validation. Does the deployed configuration meet security requirements? Are the required networks configured? Is protocol XYZ disabled? Since each module, or integration, with Ansible returns data, it is quite simple to _assert_ that something is _TRUE_ or _FALSE_. And again, based on _it_ being _TRUE_ or _FALSE_, it’s up to you to determine what happens next—maybe it just gets logged, or maybe a complex operation is performed.
|
||||
|
||||
### Reporting
|
||||
|
||||
We now understand that Ansible can also be used to collect data and perform compliance checks. The data being returned and collected from the device by way of Ansible is up for grabs in terms of what you want to do with it. Maybe the data being returned becomes inputs to other tasks, or maybe you just want to create reports. Being that reports are generated from templates combined with the actual important data to be inserted into the template, the process to create and use reporting templates is the same process used to create configuration templates.
|
||||
|
||||
From a reporting perspective, these templates may be flat text files, markdown files that are viewed on GitHub, HTML files that get dynamically placed on a web server, and the list goes on. The user has the power to create the exact type of report she wishes, inserting the exact data she needs to be part of that report.
|
||||
|
||||
It is powerful to create reports not only for executive management, but also for the ops engineers, since there are usually different metrics both teams need.
|
||||
|
||||
|
||||
### How Ansible Works
|
||||
|
||||
After looking at what Ansible can offer from a network automation perspective, we’ll now take a look at how Ansible works. You will learn about the overall communication flow from an Ansible control host to the nodes that are being automated. First, we review how Ansible works _out of the box_, and we then take a look at how Ansible, and more specifically Ansible _modules_, work when network devices are being automated.
|
||||
|
||||
### Out of the Box
|
||||
|
||||
By now, you should understand that Ansible is an automation platform. In fact, it is a lightweight automation platform that is installed on a single server or on every administrator’s laptop within an organization. You decide. Ansible is easily installed using utilities such as pip, apt, and yum on Linux-based machines.
|
||||
|
||||
###### Note
|
||||
|
||||
The machine that Ansible is installed on is referred to as the _control host_ through the remainder of this report.
|
||||
|
||||
The control host will perform all automation tasks that are defined in an Ansible playbook (don’t worry; we’ll cover playbooks and other Ansible terms soon enough). The important piece for now is to understand that a playbook is simply a set of automation tasks and instructions that gets executed on a given number of hosts.
|
||||
|
||||
When a playbook is created, you also need to define which hosts you want to automate. The mapping between the playbook and the hosts to automate happens by using what is known as an Ansible inventory file. This was already shown in an earlier example, but here is another sample inventory file showing two groups: `cisco`and `arista`:
|
||||
|
||||
```
|
||||
[cisco]
|
||||
nyc1.acme.com
|
||||
nyc2.acme.com
|
||||
|
||||
[arista]
|
||||
sfo1.acme.com
|
||||
sfo2.acme.com
|
||||
```
|
||||
|
||||
###### Note
|
||||
|
||||
You can also use IP addresses within the inventory file, instead of hostnames. For these examples, the hostnames were resolvable via DNS.
|
||||
|
||||
As you can see, the Ansible inventory file is a text file that lists hosts and groups of hosts. You then reference a specific host or a group from within the playbook, thus dictating which hosts get automated for a given play and playbook. This is shown in the following two examples.
|
||||
|
||||
The first example shows what it looks like if you wanted to automate all hosts within the `cisco` group, and the second example shows how to automate just the _nyc1.acme.com_ host:
|
||||
|
||||
```
|
||||
---
|
||||
|
||||
- name: TEST PLAYBOOK
|
||||
hosts: cisco
|
||||
|
||||
tasks:
|
||||
- TASKS YOU WANT TO AUTOMATE
|
||||
```
|
||||
|
||||
```
|
||||
---
|
||||
|
||||
- name: TEST PLAYBOOK
|
||||
hosts: nyc1.acme.com
|
||||
|
||||
tasks:
|
||||
- TASKS YOU WANT TO AUTOMATE
|
||||
```
|
||||
|
||||
Now that the basics of inventory files are understood, we can take a look at how Ansible (the control host) communicates with devices _out of the box_ and how tasks are automated on Linux endpoints. This is an important concept to understand, as this is usually different when network devices are being automated.
|
||||
|
||||
There are two main requirements for Ansible to work out of the box to automate Linux-based systems. These requirements are SSH and Python.
|
||||
|
||||
First, the endpoints must support SSH for transport, since Ansible uses SSH to connect to each target node. Because Ansible supports a pluggable connection architecture, there are also various plug-ins available for different types of SSH implementations.
|
||||
|
||||
The second requirement is how Ansible gets around the need to require an _agent_ to preexist on the target node. While Ansible does not require a software agent, it does require an onboard Python execution engine. This execution engine is used to execute Python code that is transmitted from the Ansible control host to the target node being automated.
|
||||
|
||||
If we elaborate on this out of the box workflow, it is broken down as follows:
|
||||
|
||||
1. When an Ansible play is executed, the control host connects to the Linux-based target node using SSH.
|
||||
|
||||
2. For each task, that is, Ansible module being executed within the play, Python code is transmitted over SSH and executed directly on the remote system.
|
||||
|
||||
3. Each Ansible module upon execution on the remote system returns JSON data to the control host. This data includes information such as if the configuration changed, if the task passed/failed, and other module-specific data.
|
||||
|
||||
4. The JSON data returned back to Ansible can then be used to generate reports using templates or as inputs to subsequent modules.
|
||||
|
||||
5. Repeat step 3 for each task that exists within the play.
|
||||
|
||||
6. Repeat step 1 for each play within the playbook.
|
||||
|
||||
Shouldn’t this mean that network devices should work out of the box with Ansible because they also support SSH? It is true that network devices do support SSH, but it is the first requirement combined with the second one that limits the functionality possible for network devices.
|
||||
|
||||
To start, most network devices do not support Python, so it makes using the default Ansible connection mechanism process a non-starter. That said, over the past few years, vendors have added Python support on several different device platforms. However, most of these platforms still lack the integration needed to allow Ansible to get direct access to a Linux shell over SSH with the proper permissions to copy over the required code, create temp directories and files, and execute the code on box. While all the parts are there for Ansible to work natively with SSH/Python _and_ Linux-based network devices, it still requires network vendors to open their systems more than they already have.
|
||||
|
||||
###### Note
|
||||
|
||||
It is worth noting that Arista does offer native integration because it is able to drop SSH users directly into a Linux shell with access to a Python execution engine, which in turn does allow Ansible to use its default connection mechanism. Because we called out Arista, we need to also highlight Cumulus as working with Ansible’s default connection mechanism, too. This is because Cumulus Linux is native Linux, and there isn’t a need to use a vendor API for the automation of the Cumulus Linux OS.
|
||||
|
||||
### Ansible Network Integrations
|
||||
|
||||
The previous section covered the way Ansible works by default. We looked at how Ansible sets up a connection to a device at the beginning of a _play_, executes tasks by copying Python code to the devices, executes the code, and then returns results back to the Ansible control host.
|
||||
|
||||
In this section, we’ll take a look at what this process is when automating network devices with Ansible. As already covered, Ansible has a pluggable connection architecture. For _most_ network integrations, the `connection` parameter is set to `local`. The most common place to make the connection type local is within the playbook, as shown in the following example:
|
||||
|
||||
```
|
||||
---
|
||||
|
||||
- name: TEST PLAYBOOK
|
||||
hosts: cisco
|
||||
connection: local
|
||||
|
||||
tasks:
|
||||
- TASKS YOU WANT TO AUTOMATE
|
||||
```
|
||||
|
||||
Notice how within the play definition, this example added the `connection` parameter as compared to the examples in the previous section.
|
||||
|
||||
This tells Ansible not to connect to the target device via SSH and to just connect to the local machine running the playbook. Basically, this delegates the connection responsibility to the actual Ansible modules being used within the _tasks_ section of the playbook. Delegating power for each type of module allows the modules to connect to the device in whatever fashion necessary; this could be NETCONF for Juniper and HP Comware7, eAPI for Arista, NX-API for Cisco Nexus, or even SNMP for traditional/legacy-based systems that don’t have a programmatic API.
|
||||
|
||||
###### Note
|
||||
|
||||
Network integrations in Ansible come in the form of Ansible modules. While we continue to whet your appetite using terminology such as playbooks, plays, tasks, and modules to convey key concepts, each of these terms are finally covered in greater detail in [Ansible Terminology and Getting Started][3] and [Hands-on Look at Using Ansible for Network Automation][4].
|
||||
|
||||
Let’s take a look at another sample playbook:
|
||||
|
||||
```
|
||||
---
|
||||
|
||||
- name: TEST PLAYBOOK
|
||||
hosts: cisco
|
||||
connection: local
|
||||
|
||||
tasks:
|
||||
- nxos_vlan: vlan_id=10 name=WEB_VLAN
|
||||
```
|
||||
|
||||
If you notice, this playbook now includes a task, and this task uses the `nxos_vlan` module. The `nxos_vlan` module is just a Python file, and it is in this file where the connection to the Cisco NX-OS device is made using NX-API. However, the connection could have been set up using any other device API, and this is how vendors and users like us are able to build our own integrations. Integrations (modules) are typically done on a per-feature basis, although as you’ve already seen with modules like `napalm_install_config`, they can be used to _push_ a full configuration file, too.
|
||||
|
||||
One of the major differences is that with the default connection mechanism, Ansible launches a persistent SSH connection to the device, and this connection persists for a given play. When the connection setup and teardown occurs within the module, as with many network modules that use `connection=local`, Ansible is logging in/out of the device on _every_ task versus this happening on the play level.
|
||||
|
||||
And in traditional Ansible fashion, each network module returns JSON data. The only difference is the massaging of this data is happening locally on the Ansible control host versus on the target node. The data returned back to the playbook varies per vendor and type of module, but as an example, many of the Cisco NX-OS modules return back existing state, proposed state, and end state, as well as the commands (if any) that are being sent to the device.
|
||||
|
||||
As you get started using Ansible for network automation, it is important to remember that setting the connection parameter to local is taking Ansible out of the connection setup/teardown process and leaving that up to the module. This is why modules supported for different types of vendor platforms will have different ways of communicating with the devices.
|
||||
|
||||
|
||||
### Ansible Terminology and Getting Started
|
||||
|
||||
This chapter walks through many of the terms and key concepts that have been gradually introduced already in this report. These are terms such as _inventory file_, _playbook_, _play_, _tasks_, and _modules_. We also review a few other concepts that are helpful to be aware of when getting started with Ansible for network automation.
|
||||
|
||||
Please reference the following sample inventory file and playbook throughout this section, as they are continuously used in the examples that follow to convey what each Ansible term means.
|
||||
|
||||
_Sample inventory_:
|
||||
|
||||
```
|
||||
# sample inventory file
|
||||
# filename inventory
|
||||
|
||||
[all:vars]
|
||||
user=admin
|
||||
pwd=admin
|
||||
|
||||
[tor]
|
||||
rack1-tor1 vendor=nxos
|
||||
rack1-tor2 vendor=nxos
|
||||
rack2-tor1 vendor=arista
|
||||
rack2-tor2 vendor=arista
|
||||
|
||||
[core]
|
||||
core1
|
||||
core2
|
||||
```
|
||||
|
||||
_Sample playbook_:
|
||||
|
||||
```
|
||||
---
|
||||
# sample playbook
|
||||
# filename site.yml
|
||||
|
||||
- name: PLAY 1 - Top of Rack (TOR) Switches
|
||||
hosts: tor
|
||||
connection: local
|
||||
|
||||
tasks:
|
||||
- name: ENSURE VLAN 10 EXISTS ON CISCO TOR SWITCHES
|
||||
nxos_vlan:
|
||||
vlan_id=10
|
||||
name=WEB_VLAN
|
||||
host={{ inventory_hostname }}
|
||||
username=admin
|
||||
password=admin
|
||||
when: vendor == "nxos"
|
||||
|
||||
- name: ENSURE VLAN 10 EXISTS ON ARISTA TOR SWITCHES
|
||||
eos_vlan:
|
||||
vlanid=10
|
||||
name=WEB_VLAN
|
||||
host={{ inventory_hostname }}
|
||||
username={{ user }}
|
||||
password={{ pwd }}
|
||||
when: vendor == "arista"
|
||||
|
||||
- name: PLAY 2 - Core (TOR) Switches
|
||||
hosts: core
|
||||
connection: local
|
||||
|
||||
tasks:
|
||||
- name: ENSURE VLANS EXIST IN CORE
|
||||
nxos_vlan:
|
||||
vlan_id={{ item }}
|
||||
host={{ inventory_hostname }}
|
||||
username={{ user }}
|
||||
password={{ pwd }}
|
||||
with_items:
|
||||
- 10
|
||||
- 20
|
||||
- 30
|
||||
- 40
|
||||
- 50
|
||||
```
|
||||
|
||||
### Inventory File
|
||||
|
||||
Using an inventory file, such as the preceding one, enables us to automate tasks for specific hosts and groups of hosts by referencing the proper host/group using the `hosts` parameter that exists at the top section of each play.
|
||||
|
||||
It is also possible to store variables within an inventory file. This is shown in the example. If the variable is on the same line as a host, it is a host-specific variable. If the variables are defined within brackets such as `[all:vars]`, it means that the variables are in scope for the group `all`, which is a default group that includes _all_ hosts in the inventory file.
|
||||
|
||||
###### Note
|
||||
|
||||
Inventory files are the quickest way to get started with Ansible, but should you already have a source of truth for network devices such as a network management tool or CMDB, it is possible to create and use a dynamic inventory script rather than a static inventory file.
|
||||
|
||||
### Playbook
|
||||
|
||||
The playbook is the top-level object that is executed to automate network devices. In our example, this is the file _site.yml_, as depicted in the preceding example. A playbook uses YAML to define the set of tasks to automate, and each playbook is comprised of one or more plays. This is analogous to a football playbook. Like in football, teams have playbooks made up of plays, and Ansible playbooks are made up of plays, too.
|
||||
|
||||
###### Note
|
||||
|
||||
YAML is a data format that is supported by all programming languages. YAML is itself a superset of JSON, and it’s quite easy to recognize YAML files, as they always start with three dashes (hyphens), `---`.
|
||||
|
||||
|
||||
### Play
|
||||
|
||||
One or more plays can exist within an Ansible playbook. In the preceding example, there are two plays within the playbook. Each starts with a _header_ section where play-specific parameters are defined.
|
||||
|
||||
The two plays from that example have the following parameters defined:
|
||||
|
||||
`name`
|
||||
|
||||
The text `PLAY 1 - Top of Rack (TOR) Switches` is arbitrary and is displayed when the playbook runs to improve readability during playbook execution and reporting. This is an optional parameter.
|
||||
|
||||
`hosts`
|
||||
|
||||
As covered previously, this is the host or group of hosts that are automated in this particular play. This is a required parameter.
|
||||
|
||||
`connection`
|
||||
|
||||
As covered previously, this is the type of connection mechanism used for the play. This is an optional parameter, but is commonly set to `local` for network automation plays.
|
||||
|
||||
|
||||
|
||||
Each play is comprised of one or more tasks.
|
||||
|
||||
|
||||
|
||||
### Tasks
|
||||
|
||||
Tasks represent what is automated in a declarative manner without worrying about the underlying syntax or "how" the operation is performed.
|
||||
|
||||
In our example, the first play has two tasks. Each task ensures VLAN 10 exists. The first task does this for Cisco Nexus devices, and the second task does this for Arista devices:
|
||||
|
||||
```
|
||||
tasks:
|
||||
- name: ENSURE VLAN 10 EXISTS ON CISCO TOR SWITCHES
|
||||
nxos_vlan:
|
||||
vlan_id=10
|
||||
name=WEB_VLAN
|
||||
host={{ inventory_hostname }}
|
||||
username=admin
|
||||
password=admin
|
||||
when: vendor == "nxos"
|
||||
```
|
||||
|
||||
Tasks can also use the `name` parameter just like plays can. As with plays, the text is arbitrary and is displayed when the playbook runs to improve readability during playbook execution and reporting. It is an optional parameter for each task.
|
||||
|
||||
The next line in the example task starts with `nxos_vlan`. This tell us that this task will execute the Ansible module called `nxos_vlan`.
|
||||
|
||||
We’ll now dig deeper into modules.
|
||||
|
||||
|
||||
|
||||
### Modules
|
||||
|
||||
It is critical to understand modules within Ansible. While any programming language can be used to write Ansible modules as long as they return JSON key-value pairs, they are almost always written in Python. In our example, we see two modules being executed: `nxos_vlan` and `eos_vlan`. The modules are both Python files; and in fact, while you can’t tell from looking at the playbook, the real filenames are _eos_vlan.py_ and _nxos_vlan.py_, respectively.
|
||||
|
||||
Let’s look at the first task in the first play from the preceding example:
|
||||
|
||||
```
|
||||
- name: ENSURE VLAN 10 EXISTS ON CISCO TOR SWITCHES
|
||||
nxos_vlan:
|
||||
vlan_id=10
|
||||
name=WEB_VLAN
|
||||
host={{ inventory_hostname }}
|
||||
username=admin
|
||||
password=admin
|
||||
when: vendor == "nxos"
|
||||
```
|
||||
|
||||
This task executes `nxos_vlan`, which is a module that automates VLAN configuration. In order to use modules, including this one, you need to specify the desired state or configuration policy you want the device to have. This example states: VLAN 10 should be configured with the name `WEB_VLAN`, and it should exist on each switch being automated. We can see this easily with the `vlan_id`and `name` parameters. There are three other parameters being passed into the module as well. They are `host`, `username`, and `password`:
|
||||
|
||||
`host`
|
||||
|
||||
This is the hostname (or IP address) of the device being automated. Since the hosts we want to automate are already defined in the inventory file, we can use the built-in Ansible variable `inventory_hostname`. This variable is equal to what is in the inventory file. For example, on the first iteration, the host in the inventory file is `rack1-tor1`, and on the second iteration, it is `rack1-tor2`. These names are passed into the module and then within the module, a DNS lookup occurs on each name to resolve it to an IP address. Then the communication begins with the device.
|
||||
|
||||
`username`
|
||||
|
||||
Username used to log in to the switch.
|
||||
|
||||
|
||||
`password`
|
||||
|
||||
Password used to log in to the switch.
|
||||
|
||||
|
||||
The last piece to cover here is the use of the `when` statement. This is how Ansible performs conditional tasks within a play. As we know, there are multiple devices and types of devices that exist within the `tor` group for this play. Using `when` offers an option to be more selective based on any criteria. Here we are only automating Cisco devices because we are using the `nxos_vlan` module in this task, while in the next task, we are automating only the Arista devices because the `eos_vlan` module is used.
|
||||
|
||||
###### Note
|
||||
|
||||
This isn’t the only way to differentiate between devices. This is being shown to illustrate the use of `when` and that variables can be defined within the inventory file.
|
||||
|
||||
Defining variables in an inventory file is great for getting started, but as you continue to use Ansible, you’ll want to use YAML-based variables files to help with scale, versioning, and minimizing change to a given file. This will also simplify and improve readability for the inventory file and each variables file used. An example of a variables file was given earlier when the build/push method of device provisioning was covered.
|
||||
|
||||
Here are a few other points to understand about the tasks in the last example:
|
||||
|
||||
* Play 1 task 1 shows the `username` and `password` hardcoded as parameters being passed into the specific module (`nxos_vlan`).
|
||||
|
||||
* Play 1 task 1 and play 2 passed variables into the module instead of hardcoding them. This masks the `username` and `password`parameters, but it’s worth noting that these variables are being pulled from the inventory file (for this example).
|
||||
|
||||
* Play 1 uses a _horizontal_ key=value syntax for the parameters being passed into the modules, while play 2 uses the vertical key=value syntax. Both work just fine. You can also use vertical YAML syntax with "key: value" syntax.
|
||||
|
||||
* The last task also introduces how to use a _loop_ within Ansible. This is by using `with_items` and is analogous to a for loop. That particular task is looping through five VLANs to ensure they all exist on the switch. Note: it’s also possible to store these VLANs in an external YAML variables file as well. Also note that the alternative to not using `with_items` would be to have one task per VLAN—and that just wouldn’t scale!
|
||||
|
||||
|
||||
### Hands-on Look at Using Ansible for Network Automation
|
||||
|
||||
In the previous chapter, a general overview of Ansible terminology was provided. This covered many of the specific Ansible terms, such as playbooks, plays, tasks, modules, and inventory files. This section will continue to provide working examples of using Ansible for network automation, but will provide more detail on working with modules to automate a few different types of devices. Examples will include automating devices from multiple vendors, including Cisco, Arista, Cumulus, and Juniper.
|
||||
|
||||
The examples in this section assume the following:
|
||||
|
||||
* Ansible is installed.
|
||||
|
||||
* The proper APIs are enabled on the devices (NX-API, eAPI, NETCONF).
|
||||
|
||||
* Users exist with the proper permissions on the system to make changes via the API.
|
||||
|
||||
* All Ansible modules exist on the system and are in the library path.
|
||||
|
||||
###### Note
|
||||
|
||||
Setting the module and library path can be done within the _ansible.cfg_ file. You can also use the `-M` flag from the command line to change it when executing a playbook.
|
||||
|
||||
The inventory used for the examples in this section is shown in the following section (with passwords removed and IP addresses changed). In this example, some hostnames are not FQDNs as they were in the previous examples.
|
||||
|
||||
|
||||
### Inventory File
|
||||
|
||||
```
|
||||
[cumulus]
|
||||
cvx ansible_ssh_host=1.2.3.4 ansible_ssh_pass=PASSWORD
|
||||
|
||||
[arista]
|
||||
veos1
|
||||
|
||||
[cisco]
|
||||
nx1 hostip=5.6.7.8 un=USERNAME pwd=PASSWORD
|
||||
|
||||
[juniper]
|
||||
vsrx hostip=9.10.11.12 un=USERNAME pwd=PASSWORD
|
||||
```
|
||||
|
||||
###### Note
|
||||
|
||||
Just in case you’re wondering at this point, Ansible does support functionality that allows you to store passwords in encrypted files. If you want to learn more about this feature, check out [Ansible Vault][5] in the docs on the Ansible website.
|
||||
|
||||
This inventory file has four groups defined with a single host in each group. Let’s review each section in a little more detail:
|
||||
|
||||
Cumulus
|
||||
|
||||
The host `cvx` is a Cumulus Linux (CL) switch, and it is the only device in the `cumulus` group. Remember that CL is native Linux, so this means the default connection mechanism (SSH) is used to connect to and automate the CL switch. Because `cvx` is not defined in DNS or _/etc/hosts_, we’ll let Ansible know not to use the hostname defined in the inventory file, but rather the name/IP defined for `ansible_ssh_host`. The username to log in to the CL switch is defined in the playbook, but you can see that the password is being defined in the inventory file using the `ansible_ssh_pass` variable.
|
||||
|
||||
Arista
|
||||
|
||||
The host called `veos1` is an Arista switch running EOS. It is the only host that exists within the `arista` group. As you can see for Arista, there are no other parameters defined within the inventory file. This is because Arista uses a special configuration file for their devices. This file is called _.eapi.conf_ and for our example, it is stored in the home directory. Here is the conf file being used for this example to function properly:
|
||||
|
||||
```
|
||||
[connection:veos1]
|
||||
host: 2.4.3.4
|
||||
username: unadmin
|
||||
password: pwadmin
|
||||
```
|
||||
|
||||
This file contains all required information for Ansible (and the Arista Python library called _pyeapi_) to connect to the device using just the information as defined in the conf file.
|
||||
|
||||
Cisco
|
||||
|
||||
Just like with Cumulus and Arista, there is only one host (`nx1`) that exists within the `cisco` group. This is an NX-OS-based Cisco Nexus switch. Notice how there are three variables defined for `nx1`. They include `un` and `pwd`, which are accessed in the playbook and passed into the Cisco modules in order to connect to the device. In addition, there is a parameter called `hostip`. This is required because `nx1` is also not defined in DNS or configured in the _/etc/hosts_ file.
|
||||
|
||||
|
||||
###### Note
|
||||
|
||||
We could have named this parameter anything. If automating a native Linux device, `ansible_ssh_host` is used just like we saw with the Cumulus example (if the name as defined in the inventory is not resolvable). In this example, we could have still used `ansible_ssh_host`, but it is not a requirement, since we’ll be passing this variable as a parameter into Cisco modules, whereas `ansible_ssh_host` is automatically checked when using the default SSH connection mechanism.
|
||||
|
||||
Juniper
|
||||
|
||||
As with the previous three groups and hosts, there is a single host `vsrx` that is located within the `juniper` group. The setup within the inventory file is identical to that of Cisco’s as both are used the same exact way within the playbook.
|
||||
|
||||
|
||||
### Playbook
|
||||
|
||||
The next playbook has four different plays. Each play is built to automate a specific group of devices based on vendor type. Note that this is only one way to perform these tasks within a single playbook. There are other ways in which we could have used conditionals (`when` statement) or created Ansible roles (which is not covered in this report).
|
||||
|
||||
Here is the example playbook:
|
||||
|
||||
```
|
||||
---
|
||||
|
||||
- name: PLAY 1 - CISCO NXOS
|
||||
hosts: cisco
|
||||
connection: local
|
||||
|
||||
tasks:
|
||||
- name: ENSURE VLAN 100 exists on Cisco Nexus switches
|
||||
nxos_vlan:
|
||||
vlan_id=100
|
||||
name=web_vlan
|
||||
host={{ hostip }}
|
||||
username={{ un }}
|
||||
password={{ pwd }}
|
||||
|
||||
- name: PLAY 2 - ARISTA EOS
|
||||
hosts: arista
|
||||
connection: local
|
||||
|
||||
tasks:
|
||||
- name: ENSURE VLAN 100 exists on Arista switches
|
||||
eos_vlan:
|
||||
vlanid=100
|
||||
name=web_vlan
|
||||
connection={{ inventory_hostname }}
|
||||
|
||||
- name: PLAY 3 - CUMULUS
|
||||
remote_user: cumulus
|
||||
sudo: true
|
||||
hosts: cumulus
|
||||
|
||||
tasks:
|
||||
- name: ENSURE 100.10.10.1 is configured on swp1
|
||||
cl_interface: name=swp1 ipv4=100.10.10.1/24
|
||||
|
||||
- name: restart networking without disruption
|
||||
shell: ifreload -a
|
||||
|
||||
- name: PLAY 4 - JUNIPER SRX changes
|
||||
hosts: juniper
|
||||
connection: local
|
||||
|
||||
tasks:
|
||||
- name: INSTALL JUNOS CONFIG
|
||||
junos_install_config:
|
||||
host={{ hostip }}
|
||||
file=srx_demo.conf
|
||||
user={{ un }}
|
||||
passwd={{ pwd }}
|
||||
logfile=deploysite.log
|
||||
overwrite=yes
|
||||
diffs_file=junpr.diff
|
||||
```
|
||||
|
||||
You will notice the first two plays are very similar to what we already covered in the original Cisco and Arista example. The only difference is that each group being automated (`cisco` and `arista`) is defined in its own play, and this is in contrast to using the `when`conditional that was used earlier.
|
||||
|
||||
There is no right way or wrong way to do this. It all depends on what information is known up front and what fits your environment and use cases best, but our intent is to show a few ways to do the same thing.
|
||||
|
||||
The third play automates the configuration of interface `swp1` that exists on the Cumulus Linux switch. The first task within this play ensures that `swp1` is a Layer 3 interface and is configured with the IP address 100.10.10.1\. Because Cumulus Linux is native Linux, the networking service needs to be restarted for the changes to take effect. This could have also been done using Ansible handlers (out of the scope of this report). There is also an Ansible core module called `service` that could have been used, but that would disrupt networking on the switch; using `ifreload` restarts networking non-disruptively.
|
||||
|
||||
Up until now in this section, we looked at Ansible modules focused on specific tasks such as configuring interfaces and VLANs. The fourth play uses another option. We’ll look at a module that _pushes_ a full configuration file and immediately activates it as the new running configuration. This is what we showed previously using `napalm_install_config`, but this example uses a Juniper-specific module called `junos_install_config`.
|
||||
|
||||
This module `junos_install_config` accepts several parameters, as seen in the example. By now, you should understand what `user`, `passwd`, and `host` are used for. The other parameters are defined as follows:
|
||||
|
||||
`file`
|
||||
|
||||
This is the config file that is copied from the Ansible control host to the Juniper device.
|
||||
|
||||
`logfile`
|
||||
|
||||
This is optional, but if specified, it is used to store messages generated while executing the module.
|
||||
|
||||
`overwrite`
|
||||
|
||||
When set to yes/true, the complete configuration is replaced with the file being sent (default is false).
|
||||
|
||||
`diffs_file`
|
||||
|
||||
This is optional, but if specified, will store the diffs generated when applying the configuration. An example of the diff generated when just changing the hostname but still sending a complete config file is shown next:
|
||||
|
||||
```
|
||||
# filename: junpr.diff
|
||||
[edit system]
|
||||
- host-name vsrx;
|
||||
+ host-name vsrx-demo;
|
||||
```
|
||||
|
||||
|
||||
That covers the detailed overview of the playbook. Let’s take a look at what happens when the playbook is executed:
|
||||
|
||||
###### Note
|
||||
|
||||
Note: the `-i` flag is used to specify the inventory file to use. The `ANSIBLE_HOSTS`environment variable can also be set rather than using the flag each time a playbook is executed.
|
||||
|
||||
```
|
||||
ntc@ntc:~/ansible/multivendor$ ansible-playbook -i inventory demo.yml
|
||||
|
||||
PLAY [PLAY 1 - CISCO NXOS] *************************************************
|
||||
|
||||
TASK: [ENSURE VLAN 100 exists on Cisco Nexus switches] *********************
|
||||
changed: [nx1]
|
||||
|
||||
PLAY [PLAY 2 - ARISTA EOS] *************************************************
|
||||
|
||||
TASK: [ENSURE VLAN 100 exists on Arista switches] **************************
|
||||
changed: [veos1]
|
||||
|
||||
PLAY [PLAY 3 - CUMULUS] ****************************************************
|
||||
|
||||
GATHERING FACTS ************************************************************
|
||||
ok: [cvx]
|
||||
|
||||
TASK: [ENSURE 100.10.10.1 is configured on swp1] ***************************
|
||||
changed: [cvx]
|
||||
|
||||
TASK: [restart networking without disruption] ******************************
|
||||
changed: [cvx]
|
||||
|
||||
PLAY [PLAY 4 - JUNIPER SRX changes] ****************************************
|
||||
|
||||
TASK: [INSTALL JUNOS CONFIG] ***********************************************
|
||||
changed: [vsrx]
|
||||
|
||||
PLAY RECAP ***************************************************************
|
||||
to retry, use: --limit @/home/ansible/demo.retry
|
||||
|
||||
cvx : ok=3 changed=2 unreachable=0 failed=0
|
||||
nx1 : ok=1 changed=1 unreachable=0 failed=0
|
||||
veos1 : ok=1 changed=1 unreachable=0 failed=0
|
||||
vsrx : ok=1 changed=1 unreachable=0 failed=0
|
||||
```
|
||||
|
||||
You can see that each task completes successfully; and if you are on the terminal, you’ll see that each changed task was displayed with an amber color.
|
||||
|
||||
Let’s run this playbook again. By running it again, we can verify that all of the modules are _idempotent_; and when doing this, we see that NO changes are made to the devices and everything is green:
|
||||
|
||||
```
|
||||
PLAY [PLAY 1 - CISCO NXOS] ***************************************************
|
||||
|
||||
TASK: [ENSURE VLAN 100 exists on Cisco Nexus switches] ***********************
|
||||
ok: [nx1]
|
||||
|
||||
PLAY [PLAY 2 - ARISTA EOS] ***************************************************
|
||||
|
||||
TASK: [ENSURE VLAN 100 exists on Arista switches] ****************************
|
||||
ok: [veos1]
|
||||
|
||||
PLAY [PLAY 3 - CUMULUS] ******************************************************
|
||||
|
||||
GATHERING FACTS **************************************************************
|
||||
ok: [cvx]
|
||||
|
||||
TASK: [ENSURE 100.10.10.1 is configured on swp1] *****************************
|
||||
ok: [cvx]
|
||||
|
||||
TASK: [restart networking without disruption] ********************************
|
||||
skipping: [cvx]
|
||||
|
||||
PLAY [PLAY 4 - JUNIPER SRX changes] ******************************************
|
||||
|
||||
TASK: [INSTALL JUNOS CONFIG] *************************************************
|
||||
ok: [vsrx]
|
||||
|
||||
PLAY RECAP ***************************************************************
|
||||
cvx : ok=2 changed=0 unreachable=0 failed=0
|
||||
nx1 : ok=1 changed=0 unreachable=0 failed=0
|
||||
veos1 : ok=1 changed=0 unreachable=0 failed=0
|
||||
vsrx : ok=1 changed=0 unreachable=0 failed=0
|
||||
```
|
||||
|
||||
Notice how there were 0 changes, but they still returned "ok" for each task. This verifies, as expected, that each of the modules in this playbook are idempotent.
|
||||
|
||||
|
||||
### Summary
|
||||
|
||||
Ansible is a super-simple automation platform that is agentless and extensible. The network community continues to rally around Ansible as a platform that can be used for network automation tasks that range from configuration management to data collection and reporting. You can push full configuration files with Ansible, configure specific network resources with idempotent modules such as interfaces or VLANs, or simply just automate the collection of information such as neighbors, serial numbers, uptime, and interface stats, and customize reports as you need them.
|
||||
|
||||
Because of its architecture, Ansible proves to be a great tool available here and now that helps bridge the gap from _legacy CLI/SNMP_ network device automation to modern _API-driven_ automation.
|
||||
|
||||
Ansible’s ease of use and agentless architecture accounts for the platform’s increasing following within the networking community. Again, this makes it possible to automate devices without APIs (CLI/SNMP); devices that have modern APIs, including standalone switches, routers, and Layer 4-7 service appliances; and even those software-defined networking (SDN) controllers that offer RESTful APIs.
|
||||
|
||||
There is no device left behind when using Ansible for network automation.
|
||||
|
||||
-----------
|
||||
|
||||
作者简介:
|
||||
|
||||

|
||||
|
||||
Jason Edelman, CCIE 15394 & VCDX-NV 167, is a born and bred network engineer from the great state of New Jersey. He was the typical “lover of the CLI” or “router jockey.” At some point several years ago, he made the decision to focus more on software, development practices, and how they are converging with network engineering. Jason currently runs a boutique consulting firm, Network to Code, helping vendors and end users take advantage of new tools and technologies to reduce their operational inefficiencies. Jason has a Bachelor’s...
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://www.oreilly.com/learning/network-automation-with-ansible
|
||||
|
||||
作者:[Jason Edelman][a]
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]:https://www.oreilly.com/people/ee4fd-jason-edelman
|
||||
[1]:https://www.oreilly.com/learning/network-automation-with-ansible#ansible_terminology_and_getting_started
|
||||
[2]:https://www.oreilly.com/learning/network-automation-with-ansible#ansible_network_integrations
|
||||
[3]:https://www.oreilly.com/learning/network-automation-with-ansible#ansible_terminology_and_getting_started
|
||||
[4]:https://www.oreilly.com/learning/network-automation-with-ansible#handson_look_at_using_ansible_for_network_automation
|
||||
[5]:http://docs.ansible.com/ansible/playbooks_vault.html
|
||||
[6]:https://www.oreilly.com/people/ee4fd-jason-edelman
|
||||
[7]:https://www.oreilly.com/people/ee4fd-jason-edelman
|
@ -1,3 +1,5 @@
|
||||
translating by liuxinyu123
|
||||
|
||||
Containing System Services in Red Hat Enterprise Linux – Part 1
|
||||
============================================================
|
||||
|
||||
|
@ -1,3 +1,4 @@
|
||||
Translating by qhwdw
|
||||
How to Install Software from Source Code… and Remove it Afterwards
|
||||
============================================================
|
||||
|
||||
|
@ -1,3 +1,4 @@
|
||||
fuzheng1998 translating
|
||||
A Large-Scale Study of Programming Languages and Code Quality in GitHub
|
||||
============================================================
|
||||
|
||||
|
@ -1,3 +1,5 @@
|
||||
Translating by FelixYFZ
|
||||
|
||||
Linux Networking Hardware for Beginners: Think Software
|
||||
============================================================
|
||||
|
||||
|
@ -1,97 +0,0 @@
|
||||
Image Processing on Linux
|
||||
============================================================
|
||||
|
||||
|
||||
I've covered several scientific packages in this space that generate nice graphical representations of your data and work, but I've not gone in the other direction much. So in this article, I cover a popular image processing package called ImageJ. Specifically, I am looking at [Fiji][4], an instance of ImageJ bundled with a set of plugins that are useful for scientific image processing.
|
||||
|
||||
The name Fiji is a recursive acronym, much like GNU. It stands for "Fiji Is Just ImageJ". ImageJ is a useful tool for analyzing images in scientific research—for example, you may use it for classifying tree types in a landscape from aerial photography. ImageJ can do that type categorization. It's built with a plugin architecture, and a very extensive collection of plugins is available to increase the available functionality.
|
||||
|
||||
The first step is to install ImageJ (or Fiji). Most distributions will have a package available for ImageJ. If you wish, you can install it that way and then install the individual plugins you need for your research. The other option is to install Fiji and get the most commonly used plugins at the same time. Unfortunately, most Linux distributions will not have a package available within their package repositories for Fiji. Luckily, however, an easy installation file is available from the main website. It's a simple zip file, containing a directory with all of the files required to run Fiji. When you first start it, you get only a small toolbar with a list of menu items (Figure 1).
|
||||
|
||||

|
||||
|
||||
Figure 1\. You get a very minimal interface when you first start Fiji.
|
||||
|
||||
If you don't already have some images to use as you are learning to work with ImageJ, the Fiji installation includes several sample images. Click the File→Open Samples menu item for a dropdown list of sample images (Figure 2). These samples cover many of the potential tasks you might be interested in working on.
|
||||
|
||||

|
||||
|
||||
Figure 2\. Several sample images are available that you can use as you learn how to work with ImageJ.
|
||||
|
||||
If you installed Fiji, rather than ImageJ alone, a large set of plugins already will be installed. The first one of note is the autoupdater plugin. This plugin checks the internet for updates to ImageJ, as well as the installed plugins, each time ImageJ is started.
|
||||
|
||||
All of the installed plugins are available under the Plugins menu item. Once you have installed a number of plugins, this list can become a bit unwieldy, so you may want to be judicious in your plugin selection. If you want to trigger the updates manually, click the Help→Update Fiji menu item to force the check and get a list of available updates (Figure 3).
|
||||
|
||||

|
||||
|
||||
Figure 3\. You can force a manual check of what updates are available.
|
||||
|
||||
Now, what kind of work can you do with Fiji/ImageJ? One example is doing counts of objects within an image. You can load a sample by clicking File→Open Samples→Embryos.
|
||||
|
||||

|
||||
|
||||
Figure 4\. With ImageJ, you can count objects within an image.
|
||||
|
||||
The first step is to set a scale to the image so you can tell ImageJ how to identify objects. First, select the line button on the toolbar and draw a line over the length of the scale legend on the image. You then can select Analyze→Set Scale, and it will set the number of pixels that the scale legend occupies (Figure 5). You can set the known distance to be 100 and the units to be "um".
|
||||
|
||||

|
||||
|
||||
Figure 5\. For many image analysis tasks, you need to set a scale to the image.
|
||||
|
||||
The next step is to simplify the information within the image. Click Image→Type→8-bit to reduce the information to an 8-bit gray-scale image. To isolate the individual objects, click Process→Binary→Make Binary to threshold the image automatically (Figure 6).
|
||||
|
||||

|
||||
|
||||
Figure 6\. There are tools to do automatic tasks like thresholding.
|
||||
|
||||
Before you can count the objects within the image, you need to remove artifacts like the scale legend. You can do that by using the rectangular selection tool to select it and then click Edit→Clear. Now you can analyze the image and see what objects are there.
|
||||
|
||||
Making sure that there are no areas selected in the image, click Analyze→Analyze Particles to pop up a window where you can select the minimum size, what results to display and what to show in the final image (Figure 7).
|
||||
|
||||

|
||||
|
||||
Figure 7\. You can generate a reduced image with identified particles.
|
||||
|
||||
Figure 8 shows an overall look at what was discovered in the summary results window. There is also a detailed results window for each individual particle.
|
||||
|
||||

|
||||
|
||||
Figure 8\. One of the output results includes a summary list of the particles identified.
|
||||
|
||||
Once you have an analysis worked out for a given image type, you often need to apply the exact same analysis to a series of images. This series may number into the thousands, so it's typically not something you will want to repeat manually for each image. In such cases, you can collect the required steps together into a macro so that they can be reapplied multiple times. Clicking Plugins→Macros→Record pops up a new window where all of your subsequent commands will be recorded. Once all of the steps are finished, you can save them as a macro file and rerun them on other images by clicking Plugins→Macros→Run.
|
||||
|
||||
If you have a very specific set of steps for your workflow, you simply can open the macro file and edit it by hand, as it is a simple text file. There is actually a complete macro language available to you to control the process that is being applied to your images more fully.
|
||||
|
||||
If you have a really large set of images that needs to be processed, however, this still might be too tedious for your workflow. In that case, go to Process→Batch→Macro to pop up a new window where you can set up your batch processing workflow (Figure 9).
|
||||
|
||||

|
||||
|
||||
Figure 9\. You can run a macro on a batch of input image files with a single command.
|
||||
|
||||
From this window, you can select which macro file to apply, the source directory where the input images are located and the output directory where you want the output images to be written. You also can set the output file format and filter the list of images being used as input based on what the filename contains. Once everything is done, start the batch run by clicking the Process button at the bottom of the window.
|
||||
|
||||
If this is a workflow that will be repeated over time, you can save the batch process to a text file by clicking the Save button at the bottom of the window. You then can reload the same workflow by clicking the Open button, also at the bottom of the window. All of this functionality allows you to automate the most tedious parts of your research so you can focus on the actual science.
|
||||
|
||||
Considering that there are more than 500 plugins and more than 300 macros available from the main ImageJ website alone, it is an understatement that I've been able to touch on only the most basic of topics in this short article. Luckily, many domain-specific tutorials are available, along with the very good documentation for the core of ImageJ from the main project website. If you think this tool could be of use to your research, there is a wealth of information to guide you in your particular area of study.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
作者简介:
|
||||
|
||||
Joey Bernard has a background in both physics and computer science. This serves him well in his day job as a computational research consultant at the University of New Brunswick. He also teaches computational physics and parallel programming.
|
||||
|
||||
--------------------------------
|
||||
|
||||
via: https://www.linuxjournal.com/content/image-processing-linux
|
||||
|
||||
作者:[Joey Bernard][a]
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]:https://www.linuxjournal.com/users/joey-bernard
|
||||
[1]:https://www.linuxjournal.com/tag/science
|
||||
[2]:https://www.linuxjournal.com/tag/statistics
|
||||
[3]:https://www.linuxjournal.com/users/joey-bernard
|
||||
[4]:https://imagej.net/Fiji
|
126
sources/tech/20171117 How to Easily Remember Linux Commands.md
Normal file
126
sources/tech/20171117 How to Easily Remember Linux Commands.md
Normal file
@ -0,0 +1,126 @@
|
||||
translating by darksun
|
||||
# How to Easily Remember Linux Commands
|
||||
|
||||

|
||||
|
||||
|
||||
The command line can be daunting for new Linux users. Part of that is
|
||||
remembering the multitude of commands available. After all, in order to use
|
||||
the command line effectively, you need to know the commands.
|
||||
|
||||
Unfortunately, there's no getting around the fact that you need to learn the
|
||||
commands, but there are some tools that can help you out when you're getting
|
||||
started.
|
||||
|
||||
## History
|
||||
|
||||

|
||||
|
||||
The first thing you can use to remember commands that you've already used is
|
||||
your own command line history. Most [Linux shells](https://www.maketecheasier.com/remember-linux-commands/<https:/www.maketecheasier.com/alternative-linux-shells/>), including
|
||||
the most common default, Bash, create a history file that lists your past
|
||||
commands. For Bash, you can find it at "/home/<username>/.bash_history."
|
||||
|
||||
It's a plain text file, so you can open it in any text editor and loop back
|
||||
through or even search.
|
||||
|
||||
## Apropos
|
||||
|
||||
There's actually a command that helps you find _other_ commands. It 's called
|
||||
"apropos," and it helps you find the appropriate command to complete the
|
||||
action you search or. For example, if you need to know the command to list the
|
||||
contents of a directory, you can run the following command:
|
||||
|
||||
[code]
|
||||
|
||||
apropos "list directory"
|
||||
[/code]
|
||||
|
||||

|
||||
|
||||
There's a catch, though. It's very literal. Add an "s" to "directory," and try
|
||||
again.
|
||||
|
||||
[code]
|
||||
|
||||
apropos "list directories"
|
||||
[/code]
|
||||
|
||||
It doesn't work. What `apropos` does is search through a list of commands and
|
||||
the accompanying descriptions. If your search doesn't match the description,
|
||||
it won't pick up the command as a result.
|
||||
|
||||
There is something else you can do. By using the `-a` flag, you can add
|
||||
together search terms in a more flexible way. Try this command:
|
||||
|
||||
[code]
|
||||
|
||||
apropos "match pattern"
|
||||
[/code]
|
||||
|
||||

|
||||
|
||||
You'd think it'd turn up something, like
|
||||
[grep](https://www.maketecheasier.com/remember-linux-commands/<https:/www.maketecheasier.com/what-is-grep-and-uses/>)? Instead, you
|
||||
get nothing. Again, apropos is being too literal. Now, try separating the
|
||||
words and using the `-a` flag.
|
||||
|
||||
[code]
|
||||
|
||||
apropos "match" -a "pattern"
|
||||
[/code]
|
||||
|
||||
Suddenly, you have many of the results that you'd expect.
|
||||
|
||||
apropos is a great tool, but you always need to be aware of its quirks.
|
||||
|
||||
## ZSH
|
||||
|
||||

|
||||
|
||||
ZSH isn't really a tool for remembering commands. It's actually an alternative
|
||||
shell. You can substitute [ZSH](https://www.maketecheasier.com/remember-linux-
|
||||
commands/<https:/www.maketecheasier.com/understanding-the-different-shell-in-
|
||||
linux-zsh-shell/>) for Bash and use it as your command line shell. ZSH
|
||||
includes an autocorrect feature that catches you if you enter in a command
|
||||
wrong or misspell something. If you enable it, it'll ask you if you meant
|
||||
something close. You can continue to use the command line as you normally
|
||||
would with ZSH, but you get an extra safety net and some other really nice
|
||||
features, too. The easiest way to get the most of ZSH is with [Oh-My-
|
||||
ZSH](https://www.maketecheasier.com/remember-linux-commands/<https:/github.com/robbyrussell/oh-my-zsh>).
|
||||
|
||||
## Cheat Sheet
|
||||
|
||||
The last, and probably simplest, option is to use a [cheat sheet](https://www.maketecheasier.com/remember-linux-commands/<https:/www.maketecheasier.com/premium/cheatsheet/linux-command-line/>). There are plenty available online like [this
|
||||
one](https://www.maketecheasier.com/remember-linux-commands/<https:/www.cheatography.com/davechild/cheat-sheets/linux-command-line/>) that you can use to look up commands quickly.
|
||||
|
||||

|
||||
|
||||
You can actually even find them in image form and set one as your desktop
|
||||
wallpaper for quick reference.
|
||||
|
||||
This isn't the best solution for actually remembering the commands, but when
|
||||
you're starting out, it can save you from doing a search online every time you
|
||||
don't remember a command.
|
||||
|
||||
Rely on these methods when you're learning, and eventually you'll find
|
||||
yourself referring to them less and less. No one remembers everything, so
|
||||
don't feel bad if you occasionally forget or run into something you haven't
|
||||
seen before. That's what these resources and, of course, the Internet are
|
||||
there for.
|
||||
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://www.maketecheasier.com/remember-linux-commands/
|
||||
|
||||
作者:[Nick Congleton][a]
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
167
translated/talk/20170320 Education of a Programmer.md
Normal file
167
translated/talk/20170320 Education of a Programmer.md
Normal file
@ -0,0 +1,167 @@
|
||||
程序员的学习之路
|
||||
============================================================
|
||||
|
||||
*2016 年 10 月,当我从微软离职时,我已经在微软工作了近 21 年,在工业界也快 35 年了。我花了一些时间反思我这些年来学到的东西,这些文字是那篇帖子稍加修改后得到。请见谅,文章有一点长。*
|
||||
|
||||
要成为一名专业的程序员,你需要知道的事情多得令人吃惊:语言的细节,API,算法,数据结构,系统和工具。这些东西一直在随着时间变化——新的语言和编程环境不断出现,似乎总有热门的新工具或新语言是“每个人”都在使用的。紧跟潮流,保持专业,这很重要。木匠需要知道如何为工作选择合适的锤子和钉子,并且要有能力笔直精准地钉入钉子。
|
||||
|
||||
与此同时,我也发现有一些理论和方法有着广泛的应用场景,它们能使用几十年。底层设备的性能和容量在这几十年来增长了几个数量级,但系统设计的思考方式还是互相有关联的,这些思考方式比具体的实现更根本。理解这些重复出现的主题对分析与设计我们所负责的系统大有帮助。
|
||||
|
||||
谦卑和自我
|
||||
|
||||
这不仅仅局限于编程,但在编程这个持续发展的领域,一个人需要在谦卑和自我中保持平衡。总有新的东西需要学习,并且总有人能帮助你学习——如果你愿意学习的话。一个人即需要保持谦卑,认识到自己不懂并承认它,也要保持自我,相信自己能掌握一个新的领域,并且能运用你已经掌握的知识。我见过的最大的挑战就是一些人在某个领域深入专研了很长时间,“忘记”了自己擅长学习新的东西。最好的学习来自放手去做,建造一些东西,即便只是一个原型或者 hack。我知道的最好的程序员对技术有广泛的认识,但同时他们对某个技术深入研究,成为了专家。而深入的学习来自努力解决真正困难的问题。
|
||||
|
||||
端到端观点
|
||||
|
||||
1981 年,Jerry Saltzer, Dave Reed 和 Dave Clark 在做因特网和分布式系统的早期工作,他们提出了端到端观点,并作出了[经典的阐述][4]。网络上的文章有许多误传,所以更应该阅读论文本身。论文的作者很谦虚,没有声称这是他们自己的创造——从他们的角度看,这只是一个常见的工程策略,不只在通讯领域中,在其他领域中也有运用。他们只是将其写下来并收集了一些例子。下面是文章的一个小片段:
|
||||
|
||||
> 当我们设计系统的一个功能时,仅依靠端点的知识和端点的参与,就能正确地完整地实现这个功能。在一些情况下,系统的内部模块局部实现这个功能,可能会对性能有重要的提升。
|
||||
|
||||
论文称这是一个“观点”,虽然在维基百科和其他地方它已经被上升成“原则”。实际上,还是把它看作一个观点比较好,正如作者们所说,系统设计者面临的最难的问题之一就是如何在系统组件之间划分责任,这会引发不断的讨论:怎样在划分功能时权衡利弊,怎样隔离复杂性,怎样设计一个灵活的高性能系统来满足不断变化的需求。没有简单的原则可以直接遵循。
|
||||
|
||||
互联网上的大部分讨论集中在通信系统上,但端到端观点的适用范围其实更广泛。分布式系统中的“最终一致性”就是一个例子。一个满足“最终一致性”的系统,可以让系统中的元素暂时进入不一致的状态,从而简化系统,优化性能,因为有一个更大的端到端过程来解决不一致的状态。我喜欢横向拓展的订购系统的例子(例如亚马逊),它不要求每个请求都通过中央库存的控制点。缺少中央控制点可能允许两个终端出售相同的最后一本书,所以系统需要用某种方法来解决这个问题,如通知客户该书会延期交货。不论怎样设计,想购买的最后一本书在订单完成前都有可能被仓库中的叉车运出厍(译者注:比如被其他人下单购买)。一旦你意识到你需要一个端到端的解决方案,并实现了这个方案,那系统内部的设计就可以被优化,并利用这个解决方案。
|
||||
|
||||
事实上,这种设计上的灵活性可以优化系统的性能,或者提供其他的系统功能,从而使得端到端的方法变得如此强大。端到端的思考往往允许内部进行灵活的操作,使整个系统更加健壮,并且能适应每个组件特性的变化。这些都让端到端的方法变得健壮,并能适应变化。
|
||||
|
||||
端到端方法意味着,添加会牺牲整体性能灵活性的抽象层和功能时要非常小心(也可能是其他的灵活性,但性能,特别是延迟,往往是特殊的)。如果你展示出底层的原始性能(performance, 也可能指操作),端到端的方法可以根据这个性能(操作)来优化,实现特定的需求。如果你破坏了底层性能(操作),即使你实现了重要的有附加价值的功能,你也牺牲了设计灵活性。
|
||||
|
||||
如果系统足够庞大而且足够复杂,需要把整个开发团队分配给系统内部的组件,那么端到端观点可以和团队组织相结合。这些团队自然要扩展这些组件的功能,他们通常从牺牲设计上的灵活性开始,尝试在组件上实现端到端的功能。
|
||||
|
||||
应用端到端方法面临的挑战之一是确定端点在哪里。 俗话说,“大跳蚤上有小跳蚤,小跳蚤上有更少的跳蚤……等等”。
|
||||
|
||||
关注复杂性
|
||||
|
||||
编程是一门精确的艺术,每一行代码都要确保程序的正确执行。但这是带有误导的。编程的复杂性不在于各个部分的整合,也不在于各个部分之间如何相互交互。最健壮的程序将复杂性隔离开,让最重要的部分变的简单直接,通过简单的方式与其他部分交互。虽然隐藏复杂性和信息隐藏、数据抽象等其他设计方法一样,但我仍然觉得,如果你真的要定位出系统的复杂所在,并将其隔离开,那你需要对设计特别敏锐。
|
||||
|
||||
在我的[文章][5]中反复提到的例子是早期的终端编辑器 VI 和 Emacs 中使用的屏幕重绘算法。早期的视频终端实现了控制序列,来控制绘制字符核心操作,也实现了附加的显示功能,来优化重新绘制屏幕,如向上向下滚动当前行,或者插入新行,或在当前行中移动字符。这些命令都具有不同的开销,并且这些开销在不同制造商的设备中也是不同的。(参见[TERMCAP][6]以获取代码链接和更完整的历史记录。)像文本编辑器这样的全屏应用程序希望尽快更新屏幕,因此需要优化使用这些控制序列来转换屏幕从一个状态到另一个状态。
|
||||
|
||||
这些程序在设计上隐藏了底层的复杂性。系统中修改文本缓冲区的部分(功能上大多数创新都在这里)完全忽略了这些改变如何被转换成屏幕更新命令。这是可以接受的,因为针对*任何*内容的改变计算最佳命令所消耗的性能代价,远不及被终端本身实际执行这些更新命令的性能代价。在确定如何隐藏复杂性,以及隐藏哪些复杂性时,性能分析扮演着重要的角色,这一点在系统设计中非常常见。屏幕的更新与底层文本缓冲区的更改是异步的,并且可以独立于缓冲区的实际历史变化顺序。缓冲区*怎样*改变的并不重要,重要的是改变了*什么*。异步耦合,在组件交互时消除组件对历史路径依赖的组合,以及用自然的交互方式以有效地将组件组合在一起是隐藏耦合复杂度的常见特征。
|
||||
|
||||
隐藏复杂性的成功不是由隐藏复杂性的组件决定的,而是由使用该模块的使用者决定的。这就是为什么组件的提供者至少要为组件的某些端到端过程负责。他们需要清晰的知道系统的其他部分如何与组件相互作用,复杂性是如何泄漏出来的(以及是否泄漏出来)。这常常表现为“这个组件很难使用”这样的反馈——这通常意味着它不能有效地隐藏内部复杂性,或者没有选择一个隐藏复杂性的功能边界。
|
||||
|
||||
分层与组件化
|
||||
|
||||
系统设计人员的一个基本工作是确定如何将系统分解成组件和层;决定自己要开发什么,以及从别的地方获取什么。开源项目在决定自己开发组件还是购买服务时,大多会选择自己开发,但组件之间交互的过程是一样的。在大规模工程中,理解这些决策将如何随着时间的推移而发挥作用是非常重要的。从根本上说,变化是程序员所做的一切的基础,所以这些设计决定不仅在当下被评估,还要随着产品的不断发展而在未来几年得到评估。
|
||||
|
||||
以下是关于系统分解的一些事情,它们最终会占用大量的时间,因此往往需要更长的时间来学习和欣赏。
|
||||
|
||||
* 层泄漏。层(或抽象)[基本上是泄漏的][1]。这些泄漏会立即产生后果,也会随着时间的推移而产生两方面的后果。其中一方面就是该抽象层的特性渗透到了系统的其他部分,渗透的程度比你意识到得更深入。这些渗透可能是关于具体的性能特征的假设,以及抽象层的文档中没有明确的指出的行为发生的顺序。这意味着假如内部组件的行为发生变化,你的系统会比想象中更加脆弱。第二方面是你比表面上看起来更依赖组件内部的行为,所以如果你考虑改变这个抽象层,后果和挑战可能超出你的想象。
|
||||
|
||||
* 层具有太多功能了。您所采用的组件具有比实际需要更多的功能,这几乎是一个真理。在某些情况下,你决定采用这个组件是因为你想在将来使用那些尚未用到的功能。有时,你采用组件是想“上快车”,利用组件完成正在进行的工作。在功能强大的抽象层上开发会带来一些后果。1) 组件往往会根据你并不需要的功能作出取舍。 2) 为了实现那些你并不没有用到的功能,组件引入了复杂性和约束,这些约束将阻碍该组件的未来的演变。3) 层泄漏的范围更大。一些泄漏是由于真正的“抽象泄漏”,另一些是由于明显的,逐渐增加的对组件全部功能的依赖(但这些依赖通常都没有处理好)。Office 太大了,我们发现,对于我们建立的任何抽象层,我们最终都在系统的某个部分完全运用了它的功能。虽然这看起来是积极的(我们完全地利用了这个组件),但并不是所用的使用都有同样的价值。所以,我们最终要付出巨大的代价才能从一个抽象层往另一个抽象层迁移,这种“长尾巴”没什么价值,并且对使用场景认识不足。4) 附加的功能会增加复杂性,并增加功能滥用的可能。如果将验证 XML 的 API 指定为 XML 树的一部分,那这个 API 可以选择动态下载 XML 的模式定义。这在我们的基本文件解析代码中被错误地执行,导致 w3c.org 服务器上的大量性能下降以及(无意)分布式拒绝服务攻击。(这些被通俗地称为“地雷”API)。
|
||||
|
||||
* 抽象层被更换。需求发展,系统发展,组件被放弃。您最终需要更换该抽象层或组件。不管是对外部组件的依赖还是对内部组件的依赖都是如此。这意味着上述问题将变得重要起来。
|
||||
|
||||
* 自己构建还是购买的决定将会改变。这是上面几方面的必然结果。这并不意味着自己构建还是购买的决定在当时是错误的。一开始时往往没有合适的组件,一段时间之后才有合适的组件出现。或者,也可能你使用了一个组件,但最终发现它不符合您不断变化的要求,而且你的要求非常窄,能被理解,或着对你的价值体系来说是非常重要的,以至于拥有自己的模块是有意义的。这意味着你像关心自己构造的模块一样,关心购买的模块,关心它们是怎样泄漏并深入你的系统中的。
|
||||
|
||||
* 抽象层会变臃肿。一旦你定义了一个抽象层,它就开始增加功能。层是对使用模式优化的自然分界点。臃肿的层的困难在于,它往往会降低您利用底层的不断创新的能力。从某种意义上说,这就是操作系统公司憎恨构建在其核心功能之上的臃肿的层的原因——采用创新的速度放缓了。避免这种情况的一种比较规矩的方法是禁止在适配器层中进行任何额外的状态存储。微软基础类在 Win32 上采用这个一般方法。在短期内,将功能集成到现有层(最终会导致上述所有问题)而不是重构和重新推导是不可避免的。理解这一点的系统设计人员寻找分解和简化组件的方法,而不是在其中增加越来越多的功能。
|
||||
|
||||
爱因斯坦宇宙
|
||||
|
||||
几十年来,我一直在设计异步分布式系统,但是在微软内部的一次演讲中,SQL 架构师 Pat Helland 的一句话震惊了我。 “我们生活在爱因斯坦的宇宙中,没有同时性。”在构建分布式系统时(基本上我们构建的都是分布式系统),你无法隐藏系统的分布式特性。这是物理的。我一直感到远程过程调用在根本上错误的,这是一个原因,尤其是那些“透明的”远程过程调用,它们就是想隐藏分布式的交互本质。你需要拥抱系统的分布式特性,因为这些意义几乎总是需要通过系统设计和用户体验来完成。
|
||||
|
||||
拥抱分布式系统的本质则要遵循以下几个方面:
|
||||
|
||||
* 一开始就要思考设计对用户体验的影响,而不是试图在处理错误,取消请求和报告状态上打补丁。
|
||||
|
||||
* 使用异步技术来耦合组件。同步耦合是*不可能*的。如果某些行为看起来是同步的,是因为某些内部层尝试隐藏异步,这样做会遮蔽(但绝对不隐藏)系统运行时的基本行为特征。
|
||||
|
||||
* 认识到并且明确设计了交互状态机,这些状态表示长期的可靠的内部系统状态(而不是由深度调用堆栈中的变量值编码的临时,短暂和不可发现的状态)。
|
||||
|
||||
* 认识到失败是在所难免的。要保证能检测出分布式系统中的失败,唯一的办法就是直接看你的等待时间是否“太长”。这自然意味着[取消的等级最高][2]。系统的某一层(可能直接通向用户)需要决定等待时间是否过长,并取消操作。取消只是为了重建局部状态,回收局部的资源——没有办法在系统内广泛使用取消机制。有时用一种低成本,不可靠的方法广泛使用取消机制对优化性能可能有用。
|
||||
|
||||
* 认识到取消不是回滚,因为它只是回收本地资源和状态。如果回滚是必要的,它必须实现成一个端到端的功能。
|
||||
|
||||
* 承认永远不会真正知道分布式组件的状态。只要你发现一个状态,它可能就已经改变了。当你发送一个操作时,请求可能在传输过程中丢失,也可能被处理了但是返回的响应丢失了,或者请求需要一定的时间来处理,这样远程状态最终会在未来的某个任意的时间转换。这需要像幂等操作这样的方法,并且要能够稳健有效地重新发现远程状态,而不是期望可靠地跟踪分布式组件的状态。“[最终一致性][3]”的概念简洁地捕捉了这其中大多数想法。
|
||||
|
||||
我喜欢说你应该“陶醉在异步”。与其试图隐藏异步,不如接受异步,为异步而设计。当你看到像幂等性或不变性这样的技术时,你就认识到它们是拥抱宇宙本质的方法,而不仅仅是工具箱中的一个设计工具。
|
||||
|
||||
性能
|
||||
|
||||
我确信 Don Knuth 会对人们怎样误解他的名言“过早的优化是一切罪恶的根源”而感到震惊。事实上,性能,及性能持续超过60年的指数增长(或超过10年,取决于您是否愿意将晶体管,真空管和机电继电器的发展算入其中),为所有行业内的惊人创新和影响经济的“软件吃遍世界”的变化打下了基础。
|
||||
|
||||
要认识到这种指数变化的一个关键是,虽然系统的所有组件正在经历指数变化,但这些指数是不同的。硬盘容量的增长速度与内存容量的增长速度不同,与 CPU 的增长速度不同,与内存 CPU 之间的延迟的性能改善速度也不用。即使性能发展的趋势是由相同的基础技术驱动的,增长的指数也会有分歧。[延迟的改进从根本上改善了带宽][7]。指数变化在近距离或者短期内看起来是线性的,但随着时间的推移可能是压倒性的。系统不同组件的性能的增长不同,会出现压倒性的变化,并迫使对设计决策定期进行重新评估。
|
||||
|
||||
这样做的结果是,几年后,一度有意义的设计决定就不再有意义了。或者在某些情况下,二十年前有意义的方法又开始变成一个好的决定。现代内存映射的特点看起来更像是早期分时的进程切换,而不像分页那样。 (这样做有时会让我这样的老人说“这就是我们在 1975 年时用的方法”——忽略了这种方法在 40 年都没有意义,但现在又重新成为好的方法,因为两个组件之间的关系——可能是闪存和 NAND 而不是磁盘和核心内存——已经变得像以前一样了)。
|
||||
|
||||
当这些指数超越人自身的限制时,重要的转变就发生了。你能从 2 的 16 次方个字符(一个人可以在几个小时打这么多字)过渡到 2 的 3 次方个字符(远超出了一个人打字的范围)。你可以捕捉比人眼能感知的分辨率更高的数字图像。或者你可以将整个音乐专辑存在小巧的磁盘上,放在口袋里。或者你可以将数字化视频录制存储在硬盘上。再通过实时流式传输的能力,可以在一个地方集中存储一次,不需要在数千个本地硬盘上重复记录。
|
||||
|
||||
但有的东西仍然是根本的限制条件,那就是空间的三维和光速。我们又回到了爱因斯坦的宇宙。内存的分级结构将始终存在——它是物理定律的基础。稳定的存储和 IO,内存,计算和通信也都将一直存在。这些模块的相对容量,延迟和带宽将会改变,但是系统始终要考虑这些元素如何组合在一起,以及它们之间的平衡和折衷。Jim Gary 是这方面的大师。
|
||||
|
||||
空间和光速的根本限制造成的另一个后果是,性能分析主要是关于三件事:局部化 (locality),局部化,局部化。无论是将数据打包在磁盘上,管理处理器缓存的层次结构,还是将数据合并到通信数据包中,数据如何打包在一起,如何在一段时间内从局部获取数据,数据如何在组件之间传输数据是性能的基础。把重点放在减少管理数据的代码上,增加空间和时间上的局部性,是消除噪声的好办法。
|
||||
|
||||
Jon Devaan 曾经说过:“设计数据,而不是设计代码”。这也通常意味着当查看系统结构时,我不太关心代码如何交互——我想看看数据如何交互和流动。如果有人试图通过描述代码结构来解释一个系统,而不理解数据流的速率和数量,他们就不了解这个系统。
|
||||
|
||||
内存的层级结构也意味着我缓存将会一直存在——即使某些系统层正在试图隐藏它。缓存是根本的,但也是危险的。缓存试图利用代码的运行时行为,来改变系统中不同组件之间的交互模式。它们需要对运行时行为进行建模,即使模型填充缓存并使缓存失效,并测试缓存命中。如果模型由于行为改变而变差或变得不佳,缓存将无法按预期运行。一个简单的指导方针是,缓存必须被检测——由于应用程序行为的改变,事物不断变化的性质和组件之间性能的平衡,缓存的行为将随着时间的推移而退化。每一个老程序员都有缓存变糟的经历。
|
||||
|
||||
我很幸运,我的早期职业生涯是在互联网的发源地之一 BBN 度过的。 我们很自然地将将异步组件之间的通信视为系统连接的自然方式。流量控制和队列理论是通信系统的基础,更是任何异步系统运行的方式。流量控制本质上是资源管理(管理通道的容量),但资源管理是更根本的关注点。流量控制本质上也应该由端到端的应用负责,所以用端到端的方式思考异步系统是自然的。[缓冲区膨胀][8]的故事在这种情况下值得研究,因为它展示了当对端到端行为的动态性以及技术“改进”(路由器中更大的缓冲区)缺乏理解时,在整个网络基础设施中导致的长久的问题。
|
||||
|
||||
我发现“光速”的概念在分析任何系统时都非常有用。光速分析并不是从当前的性能开始分析,而是问“这个设计理论上能达到的最佳性能是多少?”真正传递的信息是什么,以什么样的速度变化?组件之间的底层延迟和带宽是多少?光速分析迫使设计师深入思考他们的方法能否达到性能目标,或者否需要重新考虑设计的基本方法。它也迫使人们更深入地了解性能在哪里损耗,以及损耗是由固有的,还是由于一些不当行为产生的。从构建的角度来看,它迫使系统设计人员了解其构建的模块的真实性能特征,而不是关注其他功能特性。
|
||||
|
||||
我的职业生涯大多花费在构建图形应用程序上。用户坐在系统的一端,定义关键的常量和约束。人类的视觉和神经系统没有经历过指数性的变化。它们固有地受到限制,这意味着系统设计者可以利用(必须利用)这些限制,例如,通过虚拟化(限制底层数据模型需要映射到视图数据结构中的数量),或者通过将屏幕更新的速率限制到人类视觉系统的感知限制。
|
||||
|
||||
复杂性的本质
|
||||
|
||||
我的整个职业生涯都在与复杂性做斗争。为什么系统和应用变得复杂呢?为什么在一个应用领域内进行开发并没有随着时间变得简单,而基础设施却没有变得更复杂,反而变得更强大了?事实上,管理复杂性的一个关键方法就是“走开”然后重新开始。通常新的工具或语言迫使我们从头开始,这意味着开发人员将工具的优点与从新开始的优点结合起来。从新开始是重要的。这并不是说新工具,新平台,或新语言可能不好,但我保证它们不能解决复杂性增长的问题。控制复杂性的最简单的方法就是用更少的程序员,建立一个更小的系统。
|
||||
|
||||
当然,很多情况下“走开”并不是一个选择——Office 建立在有巨大的价值的复杂的资源上。通过 OneNote, Office 从 Word 的复杂性上“走开”,从而在另一个维度上进行创新。Sway 是另一个例子, Office 决定从限制中跳出来,利用关键的环境变化,抓住机会从底层上采取全新的设计方案。我们有 Word,Excel,PowerPoint 这些应用,它们的数据结构非常有价值,我们并不能完全放弃这些数据结构,它们成为了开发中持续的显著的限制条件。
|
||||
|
||||
我受到 Fred Brook 讨论软件开发中的意外和本质的文章[《没有银子弹》][9]的影响,他希望用两个趋势来尽可能地推动程序员的生产力:一是在选择自己开发还是购买时,更多地关注购买——这预示了开源社区和云架构的改变;二是从单纯的构建方法转型到更“有机”或者“生态”的增量开发方法。现代的读者可以认为是向敏捷开发和持续开发的转型。但那篇文章可是写于 1986 年!
|
||||
|
||||
我很欣赏 Stuart Kauffman 的在复杂性的基本性上的研究工作。Kauffman 从一个简单的布尔网络模型(“[NK 模型][10]”)开始建立起来,然后探索这个基本的数学结构在相互作用的分子,基因网络,生态系统,经济系统,计算机系统(以有限的方式)等系统中的应用,来理解紧急有序行为的数学基础及其与混沌行为的关系。在一个高度连接的系统中,你固有地有一个相互冲突的约束系统,使得它(在数学上)很难向前发展(这被看作是在崎岖景观上的优化问题)。控制这种复杂性的基本方法是将系统分成独立元素并限制元素之间的相互连接(实质上减少 NK 模型中的“N”和“K”)。当然对那些使用复杂隐藏,信息隐藏和数据抽象,并且使用松散异步耦合来限制组件之间的交互的技术的系统设计者来说,这是很自然的。
|
||||
|
||||
|
||||
我们一直面临的一个挑战是,我们想到的许多拓展系统的方法,都跨越了所有的方面。实时共同编辑是 Office 应用程序最近的一个非常具体的(也是最复杂的)例子。
|
||||
|
||||
我们的数据模型的复杂性往往等同于“能力”。设计用户体验的固有挑战是我们需要将有限的一组手势,映射到底层数据模型状态空间的转换。增加状态空间的维度不可避免地在用户手势中产生模糊性。这是“[纯数学][11]”,这意味着确保系统保持“易于使用”的最基本的方式常常是约束底层的数据模型。
|
||||
|
||||
管理
|
||||
|
||||
我从高中开始着手一些领导角色(学生会主席!),对承担更多的责任感到理所当然。同时,我一直为自己在每个管理阶段都坚持担任全职程序员而感到自豪。但 Office 的开发副总裁最终还是让我从事管理,离开了日常的编程工作。当我在去年离开那份工作时,我很享受重返编程——这是一个出奇地充满创造力的充实的活动(当修完“最后”的 bug 时,也许也会有一点令人沮丧)。
|
||||
|
||||
尽管在我加入微软前已经做了十多年的“主管”,但是到了 1996 年我加入微软才真正了解到管理。微软强调“工程领导是技术领导”。这与我的观点一致,帮助我接受并承担更大的管理责任。
|
||||
|
||||
主管的工作是设计项目并透明地推进项目。透明并不简单,它不是自动的,也不仅仅是有好的意愿就行。透明需要被设计进系统中去。透明工作的最好方式是能够记录每个工程师每天活动的产出,以此来追踪项目进度(完成任务,发现 bug 并修复,完成一个情景)。留意主观上的红/绿/黄,点赞或踩的仪表板。
|
||||
|
||||
我过去说我的工作是设计反馈回路。独立工程师,经理,行政人员,每一个项目的参与者都能通过分析记录的项目数据,推进项目,产出结果,了解自己在整个项目中扮演的角色。最终,透明化最终成为增强能力的一个很好的工具——管理者可以将更多的局部控制权给予那些最接近问题的人,因为他们对所取得的进展有信心。这样的话,合作自然就会出现。
|
||||
|
||||
Key to this is that the goal has actually been properly framed (including key resource constraints like ship schedule). Decision-making that needs to constantly flow up and down the management chain usually reflects poor framing of goals and constraints by management.
|
||||
|
||||
关键需要确定目标框架(包括关键资源的约束,如发布的时间表)。如果决策需要在管理链上下不断流动,那说明管理层对目标和约束的框架不好。
|
||||
|
||||
当我在 Beyond Software 工作时,我真正理解了一个项目拥有一个唯一领导的重要性。原来的项目经理离职了(后来从 FrontPage 雇佣了我)。我们四个主管在是否接任这个岗位上都有所犹豫,这不仅仅由于我们都不知道要在这家公司坚持多久。我们都技术高超,并且相处融洽,所以我们决定以同级的身份一起来领导这个项目。然而这槽糕透了。有一个显而易见的问题,我们没有相应的战略用来在原有的组织之间分配资源——这应当是管理者的首要职责之一!当你知道你是唯一的负责人时,你会有很深的责任感,但在这个例子中,这种责任感缺失了。我们没有真正的领导来负责统一目标和界定约束。
|
||||
|
||||
我有清晰地记得,我第一次充分认识到*倾听*对一个领导者的重要性。那时我刚刚担任了 Word,OneNote,Publisher 和 Text Services 团队的开发经理。关于我们如何组织文本服务团队,我们有一个很大的争议,我走到了每个关键参与者身边,听他们想说的话,然后整合起来,写下了我所听到的一切。当我向其中一位主要参与者展示我写下的东西时,他的反应是“哇,你真的听了我想说的话”!作为一名管理人员,我所经历的所有最大的问题(例如,跨平台和转型持续工程)涉及到仔细倾听所有的参与者。倾听是一个积极的过程,它包括:尝试以别人的角度去理解,然后写出我学到的东西,并对其进行测试,以验证我的理解。当一个关键的艰难决定需要发生的时候,在最终决定前,每个人都知道他们的想法都已经被听到并理解(不论他们是否同意最后的决定)。
|
||||
|
||||
在 FrontPage 担任开发经理的工作,让我理解了在只有部分信息的情况下做决定的“操作困境”。你等待的时间越长,你就会有更多的信息做出决定。但是等待的时间越长,实际执行的灵活性就越低。在某个时候,你仅需要做出决定。
|
||||
|
||||
设计一个组织涉及类似的两难情形。您希望增加资源领域,以便可以在更大的一组资源上应用一致的优先级划分框架。但资源领域越大,越难获得作出决定所需要的所有信息。组织设计就是要平衡这两个因素。软件复杂化,因为软件的特点可以在任意维度切入设计。Office 已经使用[共享团队][12]来解决这两个问题(优先次序和资源),让跨领域的团队能与需要产品的团队分享工作(增加资源)。
|
||||
|
||||
随着管理阶梯的提升,你会懂一个小秘密:你和你的新同事不会因为你现在承担更多的责任,就突然变得更聪明。这强调了整个组织比顶层领导者更聪明。赋予每个级别在一致框架下拥有自己的决定是实现这一目标的关键方法。听取并使自己对组织负责,阐明和解释决策背后的原因是另一个关键策略。令人惊讶的是,害怕做出一个愚蠢的决定可能是一个有用的激励因素,以确保你清楚地阐明你的推理,并确保你听取所有的信息。
|
||||
|
||||
结语
|
||||
|
||||
我离开大学寻找第一份工作时,面试官在最后一轮面试时问我对做“系统”和做“应用”哪一个更感兴趣。我当时并没有真正理解这个问题。在软件技术栈的每一个层面都会有趣的难题,我很高兴深入研究这些问题。保持学习。
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://hackernoon.com/education-of-a-programmer-aaecf2d35312
|
||||
|
||||
作者:[ Terry Crowley][a]
|
||||
译者:[explosic4](https://github.com/explosic4)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]:https://hackernoon.com/@terrycrowley
|
||||
[1]:https://medium.com/@terrycrowley/leaky-by-design-7b423142ece0#.x67udeg0a
|
||||
[2]:https://medium.com/@terrycrowley/how-to-think-about-cancellation-3516fc342ae#.3pfjc5b54
|
||||
[3]:http://queue.acm.org/detail.cfm?id=2462076
|
||||
[4]:http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf
|
||||
[5]:https://medium.com/@terrycrowley/model-view-controller-and-loose-coupling-6370f76e9cde#.o4gnupqzq
|
||||
[6]:https://en.wikipedia.org/wiki/Termcap
|
||||
[7]:http://www.ll.mit.edu/HPEC/agendas/proc04/invited/patterson_keynote.pdf
|
||||
[8]:https://en.wikipedia.org/wiki/Bufferbloat
|
||||
[9]:http://worrydream.com/refs/Brooks-NoSilverBullet.pdf
|
||||
[10]:https://en.wikipedia.org/wiki/NK_model
|
||||
[11]:https://medium.com/@terrycrowley/the-math-of-easy-to-use-14645f819201#.untmk9eq7
|
||||
[12]:https://medium.com/@terrycrowley/breaking-conways-law-a0fdf8500413#.gqaqf1c5k
|
992
translated/tech/20160325 Network automation with Ansible.md
Normal file
992
translated/tech/20160325 Network automation with Ansible.md
Normal file
@ -0,0 +1,992 @@
|
||||
用 Ansible 实现网络自动化
|
||||
================
|
||||
|
||||
### 网络自动化
|
||||
|
||||
由于 IT 行业的技术变化,从服务器虚拟化到具有自服务能力的公有和私有云、容器化应用、以及提供的平台即服务(Paas),一直以来落后的一个领域是网络。
|
||||
|
||||
在过去的 5 时间,网络行业似乎有很多新的趋势出现,它们中的很多被归入到软件定义网络(SDN)。
|
||||
|
||||
###### 注意
|
||||
|
||||
SDN 是新出现的一种构建、管理、操作、和部署网络的方法。SDN 最初的定义是需要将控制层和数据层(包转发)物理分离,并且,解耦合的控制层必须管理好各自的设备。
|
||||
|
||||
如今,许多技术已经 _包括在 SDN _ 下面,包括基于控制器的网络(controller-based networks)、网络设备上的 APIs、网络自动化、白盒交换机、策略网络化、网络功能虚拟化(NFV)、等等。
|
||||
|
||||
由于这篇报告的目的,我们参考 SDN 的解决方案作为我们的解决方案,其中包括一个网络控制器作为解决方案的一部分,并且提升了网络的可管理性,但并不需要从数据层解耦控制层。
|
||||
|
||||
这些趋势的其中一个是,网络设备上出现的应用程序编辑接口(APIs)作为管理和操作这些设备的一种方法,和真正地提供了机器对机器的通讯。当需要自动化和构建网络应用程序、提供更多数据建模的结构时,APIs 简化了开发过程。例如,当启用 API 的设备在 JSON/XML 中返回数据时,它是结构化的,并且比返回原生文本信息、需要手工去解析的仅命令行的设备更易于使用。
|
||||
|
||||
在 APIs 之前,用于配置和管理网络设备的两个主要机制是命令行接口(CLI)和简单网络管理协议(SNMP)。让我们来了解一下它们,CLI 是一个到设备的人机界面,而 SNMP 并不是为设备提供实时编程的接口。
|
||||
|
||||
幸运的是,因为很多供应商争相为设备增加 APIs,有时候 _正是因为_ 它,才被保留到需求建议书(RFP)中,它有一个非常好的副作用 —— 支持网络自动化。一旦一个真实的 API 被披露,访问设备内数据的过程,以及管理配置,会极大的被简单化,因此,我们将在本报告中对此进行评估。虽然使用许多传统方法也可以实现自动化,比如,CLI/SNMP。
|
||||
|
||||
###### 注意
|
||||
|
||||
随着未来几个月或几年的网络设备更新,供应商的 APIs 无疑应该被测试,并且要做为采购网络设备(虚拟和物理)的关键决策标准。用户应该知道数据是如何通过设备建模的,被 API 使用的传输类型是什么,如果供应商提供一些库或集成到自动化工具中,并且,如果被用于一个开放的标准/协议。
|
||||
|
||||
总而言之,网络自动化,像大多数的自动化类型一样,是为了更快地工作。工作的更快是好事,降低部署和配置改变的时间并不总是许多 IT 组织需要去解决的问题。
|
||||
|
||||
包括速度,我们现在看看这些各种类型的 IT 组织逐渐采用网络自动化的几种原因。你应该注意到,同样的原则也适用于其它类型的自动化。
|
||||
|
||||
|
||||
### 简化架构
|
||||
|
||||
今天,每个网络都是一个独特的“雪花”型,并且,网络工程师们为能够解决传输和网络应用问题而感到自豪,这些问题最终使网络不仅难以维护和管理,而且也很难去实现自动化。
|
||||
|
||||
它需要从一开始就包含到新的架构和设计中去部署,而不是去考虑网络自动化和管理作为一个二级或三级项目。哪个特性可以跨不同的供应商工作?哪个扩展可以跨不同的平台工作?当使用特别的网络设备平台时,API 类型或者自动化工程是什么?当这些问题在设计进程之前得到答案,最终的架构将变成简单的、可重复的、并且易于维护 _和_ 自动化的,在整个网络中将很少启用供应商专用的扩展。
|
||||
|
||||
### 确定的结果
|
||||
|
||||
在一个企业组织中,改变审查会议(change review meeting)去评估即将到来的网络上的变化、它们对外部系统的影响、以及回滚计划。在这个世界上,人们为这些即 _将到来的变化_ 去接触 CLI,输入错误的命令造成的影响是灾难性的。想像一下,一个有三位、四位、五位、或者 50 位工程师的团队。每位工程师应对 _即将到来的变化_ 都有他们自己的独特的方法。并且,在管理这些变化的期间,使用一个 CLI 或者 GUI 的能力并不会消除和减少出现错误的机率。
|
||||
|
||||
使用经过验证和测试过的网络自动化可以帮助实现更多的可预测行为,并且使执行团队有更好的机会实现确实性结果,在保证任务没有人为错误的情况下首次正确完成的道路上更进一步。
|
||||
|
||||
|
||||
### 业务灵活性
|
||||
|
||||
不用说,网络自动化不仅为部署变化提供速度和灵活性,而且使得根据业务需要去从网络设备中检索数据的速度变得更快。自从服务器虚拟化实现以后,服务器和虚拟化使得管理员有能力在瞬间去部署一个新的应用程序。而且,更多的快速部署应用程序的问题出现在,配置一个 VLAN(虚拟局域网)、路由器、FW ACL(防火墙的访问控制列表)、或者负载均衡策略需要多长时间?
|
||||
|
||||
在一个组织内通过去熟悉大多数的通用工作流和 _为什么_ 网络改变是真实的需求?新的部署过程自动化工具,如 Ansible 将使这些变得非常简单。
|
||||
|
||||
这一章将介绍一些关于为什么应该去考虑网络自动化的高级知识点。在下一节,我们将带你去了解 Ansible 是什么,并且继续深入了解各种不同规模的 IT 组织的网络自动化的不同类型。
|
||||
|
||||
|
||||
### 什么是 Ansible?
|
||||
|
||||
Ansible 是存在于开源世界里的一种最新的 IT 自动化和配置管理平台。它经常被拿来与其它工具如 Puppet、Chef、和 SaltStack 去比较。Ansible 作为一个由 Michael DeHaan 创建的开源项目出现于 2012 年,Michael DeHaan 也创建了 Cobbler 和 cocreated Func,它们在开源社区都非常流行。在 Ansible 开源项目创建之后不足 18 个月时间, Ansilbe 公司成立,并收到了 $6 million 的一系列资金。它成为并一直保持着第一的贡献者和 Ansible 开源项目的支持者。在 2015 年 10 月,Red Hat 获得了 Ansible 公司。
|
||||
|
||||
但是,Ansible 到底是什么?
|
||||
|
||||
_Ansible 是一个无需代理和可扩展的超级简单的自动化平台。_
|
||||
|
||||
让我们更深入地了解它的细节,并且看一看 Ansible 的属性,它帮助 Ansible 在行业内获得大量的吸引力(traction)。
|
||||
|
||||
|
||||
### 简单
|
||||
|
||||
Ansible 的其中一个吸引人的属性是,去使用它你 _不_ 需要特定的编程技能。所有的指令,或者任务都是自动化的,在一个标准的、任何人都可以理解的人类可读的数据格式的一个文档中。在 30 分钟之内完成安装和自动化任务的情况并不罕见!
|
||||
|
||||
例如,下列的一个 Ansible playbook 任务是用于去确保在一个 Cisco Nexus 交换机中存在一个 VLAN:
|
||||
|
||||
```
|
||||
- nxos_vlan: vlan_id=100 name=web_vlan
|
||||
```
|
||||
|
||||
你无需熟悉或写任何代码就可以明确地看出它将要做什么!
|
||||
|
||||
###### 注意
|
||||
|
||||
这个报告的下半部分涉到 Ansible 术语(playbooks、plays、tasks、modules、等等)的细节。但是,在我们为网络自动化使用 Ansible 时,我们也同时有一些详细的示例去解释这些关键概念。
|
||||
|
||||
### 无代理
|
||||
|
||||
如果你看到市面上的其它工具,比如 Puppet 和 Chef,你学习它们会发现,一般情况下,它们要求每个实现自动化的设备必须安装特定的软件。这种情况在 Ansible 上 _并不_需要,这就是为什么 Ansible 是实现网络自动化的最佳选择的主要原因。
|
||||
|
||||
它很好理解,那些 IT 自动化工具,包括 Puppet、Chef、CFEngine、SaltStack、和 Ansible,它们最初构建是为管理和自动化配置 Linux 主机,以跟得上部署的应用程序增长的步伐。因为 Linux 系统是被配置成自动化的,要安装代理并不是一个技术难题。如果有的话,它会担误安装过程,因为,现在有 _N_ 多个(你希望去实现自动化的)主机需要在它们上面部署软件。
|
||||
|
||||
再加上,当使用代理时,它们需要的 DNS 和 NTP 配置更加复杂。这些都是大多数环境中已经配置好的服务,但是,当你希望快速地获取一些东西或者只是简单地想去测试一下它能做什么的时候,它将极大地担误整个设置和安装的过程。
|
||||
|
||||
由于本报告只是为介绍利用 Ansible 实现网络自动化,它最有价值的是,Ansible 作为一个无代理平台,对于系统管理员来说,它更具有吸引力,这是为什么呢?
|
||||
|
||||
正如前面所说的那样,对网络管理员来说,它是非常有吸引力的,Linux 操作系统是开源的,并且,任何东西都可以安装在它上面。对于网络来说,虽然它正在逐渐改变,但事实并非如此。如果我们更广泛地部署网络操作系统,如 Cisco IOS,它就是这样的一个例子,并且问一个问题, _“第三方软件能否部署在基于 IOS (译者注:此处的 IOS,指的是思科的网络操作系统 IOS)的平台上吗?”_它并不会给你惊喜,它的回答是 _NO_。
|
||||
|
||||
在过去的二十多年里,几乎所有的网络操作系统都是闭源的,并且,垂直整合到底层的网络硬件中。在一个网络设备中(路由器、交换机、负载均衡、防火墙、等等),不需要供应商的支持,有一个像 Ansible 这样的自动化平台,从头开始去构建一个无代理、可扩展的自动化平台,就像是它专门为网络行业订制的一样。我们最终将开始减少并消除与网络的人工交互。
|
||||
|
||||
### 可扩展
|
||||
|
||||
Ansible 的可扩展性也非常的好。作为一个开源的,并且从代码开始将在网络行业中发挥重要的作用,有一个可扩展的平台是必需的。这意味着如果供应商或社区不提供一个特定的特性或功能,开源社区,终端用户,消费者,顾问者,或者,任何可能去 _扩展_ Ansible 的人,去启用一个给定的功能集。过去,网络供应商或者工具供应商通过一个 hook 去提供插件和集成。想像一下,使用一个像 Ansible 这样的自动化平台,并且,你选择的网络供应商发布了你 _真正_ 需要的自动化的一个新特性。从理论上说,网络供应商或者 Ansible 可以发行一个新的插件去实现自动化这个独特的特性,这是一件非常好的事情,从你的内部工程师到你的增值分销商(VARs)或者你的顾问中的任何人,都可以去提供这种集成。
|
||||
|
||||
正如前面所说的那样,Ansible 实际上是极具扩展性的,Ansible 最初就是为自动化应用程序和系统构建的。这是因为,Ansible 的可扩展性是被网络供应商编写集成的,包括但不限于 Cisco、Arista、Juniper、F5、HP、A10、Cumulus、和 Palo Alto Networks。
|
||||
|
||||
|
||||
### 对于网络自动化,为什么要使用 Ansible?
|
||||
|
||||
我们已经简单了解除了 Ansible 是什么,以及一些网络自动化的好处,但是,对于网络自动化,我们为什么要使用 Ansible?
|
||||
|
||||
在一个完全透明的环境下,已经说的很多的理由是 Ansible 可以做什么,比如,作为一个很大的自动化应用程序部署平台,但是,我们现在要深入一些,更多地关注于网络,并且继续总结一些更需要注意的其它关键点。
|
||||
|
||||
|
||||
### 无代理
|
||||
|
||||
在实现网络自动化的时候,无代理架构的重要性并不是重点强调的,特别是当它适用于现有的自动化设备时。如果,我们看一下当前网络中已经安装的各种设备时,从 DMZ 和园区,到分支和数据中心,最大份额的设备 _并不_ 具有最新 API 的设备。从自动化的角度来看,API 可以使做一些事情变得很简单,像 Ansible 这样的无代理平台有可能去自动化和管理那些 _传统_ 的设备。例如,_基于CLI 的设备_,它的工具可以被用于任何网络环境中。
|
||||
|
||||
###### 注意
|
||||
|
||||
如果仅 CLI 的设备已经集成进 Ansible,它的机制就像是,怎么在设备上通过协议如 telnet、SSH、和 SNMP,去进行只读访问和读写操作。
|
||||
|
||||
作为一个独立的网络设备,像路由器、交换机、和防火墙持续去增加 APIs 的支持,SDN 解决方案也正在出现。SDN 解决方案的其中一个主题是,它们都提供一个单点集成和策略管理,通常是以一个 SDN 控制器的形式出现。这是真实的解决方案,比如,Cisco ACI、VMware NSX、Big Switch Big Cloud Fabric、和 Juniper Contrail,同时,其它的 SDN 提供者,比如 Nuage、Plexxi、Plumgrid、Midokura、和 Viptela。甚至包含开源的控制器,比如 OpenDaylight。
|
||||
|
||||
所有的这些解决方案都简化了网络管理,就像他们允许一个管理员去开始从“box-by-box”管理(译者注:指的是单个设备挨个去操作的意思)迁移到网络范围的管理。这是在正确方向上迈出的很大的一步,这些解决方案并不能消除在改变窗口中人类犯错的机率。例如,比起配置 _N_ 个交换机,你可能需要去配置一个单个的 GUI,它需要很长的时间才能实现所需要的配置改变 — 它甚至可能更复杂,毕竟,相对于一个 CLI,他们更喜欢 GUI!另外,你可能有不同类型的 SDN 解决方案部署在每个应用程序、网络、区域、或者数据中心。
|
||||
|
||||
在需要自动化的网络中,对于配置管理、监视、和数据收集,当行业开始向基于控制器的网络架构中迁移时,这些需求并不会消失。
|
||||
|
||||
大量的软件定义网络中都部署有控制器,所有最新的控制器都提供(expose)一个最新的 REST API。并且,因为 Ansible 是一个无代理架构,它实现自动化是非常简单的,而不仅仅是没有 API 的传统设备,但也有通过 REST APIs 的软件定义网络解决方案,在所有的终端上不需要有额外的软件(译者注:指的是代理)。最终的结果是,使用 Ansible,无论有或没有 API,可以使任何类型的设备都能够自动化。
|
||||
|
||||
|
||||
### 免费和开源软件(FOSS)
|
||||
|
||||
Ansible 是一个开源软件,它的全部代码在 GitHub 上都是公开的、可访问的,使用 Ansible 是完全免费的。它可以在几分钟内完成安装并为网络工程师提供有用的价值。Ansible,这个开源项目,或者 Ansible 公司,在它们交付软件之前,你不会遇到任何一个销售代表。那是显而易见的事实,因为它是一个真正的开源项目,但是,开源项目的使用,在网络行业中社区驱动的软件是非常少的,但是,也在逐渐增加,我们想明确指出这一点。
|
||||
|
||||
同样需要指出的一点是,Ansible, Inc. 也是一个公司,它也需要去赚钱,对吗?虽然 Ansible 是开源的,它也有一个叫 Ansible Tower 的企业产品,它增加了一些特性,比如,基于规则的访问控制(RBAC)、报告、 web UI、REST APIs、多租户、等等,(相比 Ansible)它更适合于企业去部署。并且,更重要的是,Ansible Tower 甚至可以最多在 10 台设备上 _免费_ 使用,至少,你可以去体验一下,它是否会为你的组织带来好处,而无需花费一分钱,并且,也不需要与无数的销售代表去打交道。
|
||||
|
||||
|
||||
### 可扩展性
|
||||
|
||||
我们在前面说过,Ansible 主要是为部署 Linux 应用程序而构建的自动化平台,虽然从早期开始已经扩展到 Windows。需要指出的是,Ansible 开源项目并没有自动化网络基础设施的目标。事实上是,Ansible 社区更多地理解了在底层的 Ansible 架构上怎么更具灵活性和可扩展性,对于他们的自动化需要,它变成了 _扩展_ 的 Ansible,它包含了网络。在过去的两年中,部署有许多的 Ansible 集成,许多行业独立人士(industry independents),比如,Matt Oswalt、Jason Edelman、Kirk Byers、Elisa Jasinska、David Barroso、Michael Ben-Ami、Patrick Ogenstad、和 Gabriele Gerbino,以及网络系统供应商的领导者,比如,Arista、Juniper、Cumulus、Cisco、F5、和 Palo Alto Networks。
|
||||
|
||||
|
||||
### 集成到已存在的 DevOps 工作流中
|
||||
|
||||
Ansible 在 IT 组织中被用于应用程序部署。它被用于需要管理部署、监视、和管理各种类型的应用程序的操作团队中。通过将 Ansible 集成到网络基础设施中,当新应用程序到来或迁移后,它扩展了可能的范围。而不是去等待一个新的顶架交换机(TOR,译者注:一种数据中心设备接入的方式)的到来、去添加一个 VLAN、或者去检查接口的速度/双工,所有的这些以网络为中心的任务都可以被自动化,并且可以集成到 IT 组织内已经存在的工作流中。
|
||||
|
||||
|
||||
### 幂等性
|
||||
|
||||
术语 _幂等性_ (明显能提升项目的效能) 经常用于软件开发的领域中,尤其是当使用 REST APIs工作的时候,以及在 _DevOps_ 自动化和配置管理框架的领域中,包括 Ansible。Ansible 的其中一个信念是,所有的 Ansible 模块(集成的)应该是幂等的。那么,对于一个模块来说,幂等是什么意思呢?毕竟,对大多数网络工程师来说,这是一个新的术语。
|
||||
|
||||
答案很简单。幂等性的本质是允许定义的任务,运行一次或者上千次都不会在目标系统上产生不利影响,仅仅是一种一次性的改变。换句话说,如果一个请求的改变去使系统进入到它期望的状态,这种改变完成之后,并且,如果这个设备已经达到这种状态,它不会再发生改变。这不像大多数传统的定制脚本,和拷贝(copy),以及过去的那些终端窗口中的 CLI 命令。当相同的命令或者脚本在同一个系统上重复运行,会出现错误(有时候)。以前,粘贴一组命令到一个路由器中,然后得到一些使你的其余的配置失效的错误类型?好玩吧?
|
||||
|
||||
另外的例子是,如果你有一个配置 10 个 VLANs 的文件文件或者脚本,那么 _每次_ 运行这个脚本,相同的命令命令会被输入 10 次。如果使用一个幂等的 Ansible 模块,首先会从网络设备中采集已存在的配置,并且,每个新的 VLAN 被配置后会再次检查当前配置。仅仅是这个新的 VLAN 需要去被添加(或者,改变 VLAN 名字,作为一个示例)是一个改变,或者命令真实地推送到设备。
|
||||
|
||||
当一个技术越来越复杂,幂等性的价值就越高,在你修改的时候,你并不能注意到 _已存在_ 的网络设备的状态,仅仅是从一个网络配置和策略角度去尝试达到 _期望的_ 状态。
|
||||
|
||||
|
||||
### 网络范围的和临时(Ad Hoc)的改变
|
||||
|
||||
用配置管理工具解决的其中一个问题是,配置“飘移”(当设备的期望配置逐渐漂移,或者改变,随着时间的推移手动改变和/或在一个环境中使用了多个不同的工具),事实上,这也是像 Puppet 和 Chef 得到使用的地方。代理商 _phone home_ 到前端服务器,验证它的配置,并且,如果需要一个改变,则改变它。这个方法是非常简单的。如果有故障了,需要去排除怎么办?你通常需要通过管理系统,直接连到设备,找到并修复它,然后,马上离开,对不对?果然,在下次当代理的电话打到家里,修复问题的改变被覆盖了(基于主/前端服务器是怎么配置的)。在高度自动化的环境中,一次性的改变应该被限制,但是,仍然允许它们(译者注:指的是一次性改变)使用的工具是非常有价值的。正如你想到的,其中一个这样的工具是 Ansible。
|
||||
|
||||
因为 Ansible 是无代理的,这里并没有一个默认的推送或者拉取去防止配置漂移。自动化任务被定义在 Ansible playbook 中,当使用 Ansible 时,它推送到用户去运行 playbook。如果 playbook 在一个给定的时间间隔内运行,并且你没有用 Ansible Tower,你肯定知道任务的执行频率;如果你正好在终端提示符下使用一个原生的 Ansible 命令行,playbook 运行一次,并且仅运行一次。
|
||||
|
||||
缺省运行的 playbook 对网络工程师是很具有吸引力的,让人欣慰的是,在设备上手动进行的改变不会自动被覆盖。另外,当需要的时候,一个 playbook 运行的设备范围很容易被改变,即使是对一个单个设备进行自动化的单次改变,Ansible 仍然可以用,设备的 _范围_ 由一个被称为 Ansible 清单(inventory)的文件决定;这个清单可以是一台设备或者是一千台设备。
|
||||
|
||||
下面展示的一个清单文件示例,它定义了两组共六台设备:
|
||||
|
||||
```
|
||||
[core-switches]
|
||||
dc-core-1
|
||||
dc-core-2
|
||||
|
||||
[leaf-switches]
|
||||
leaf1
|
||||
leaf2
|
||||
leaf3
|
||||
leaf4
|
||||
```
|
||||
|
||||
为了自动化所有的主机,你的 play 定义的 playbook 的一个片段看起来应该是这样的:
|
||||
|
||||
```
|
||||
hosts: all
|
||||
```
|
||||
|
||||
并且,一个自动化的叶子节点交换机,它看起来应该像这样:
|
||||
|
||||
```
|
||||
hosts: leaf1
|
||||
```
|
||||
|
||||
这是一个核心交换机:
|
||||
|
||||
```
|
||||
hosts: core-switches
|
||||
```
|
||||
|
||||
###### 注意
|
||||
|
||||
正如前面所说的那样,这个报告的后面部分将详细介绍 playbooks、plays、和清单(inventories)。
|
||||
|
||||
因为能够很容易地对一台设备或者 _N_ 台设备进行自动化,所以在需要对这些设备进行一次性改变时,Ansible 成为了最佳的选择。在网络范围内的改变它也做的很好:可以是关闭给定类型的所有接口、配置接口描述、或者是在一个跨企业园区布线的网络中添加 VLANs。
|
||||
|
||||
### 使用 Ansible 实现网络任务自动化
|
||||
|
||||
这个报告从两个方面逐渐深入地讲解一些技术。第一个方面是围绕 Ansible 架构和它的细节,第二个方面是,从一个网络的角度,讲解使用 Ansible 可以完成什么类型的自动化。在这一章中我们将带你去详细了解第二方面的内容。
|
||||
|
||||
自动化一般被认为是速度快,但是,考虑到一些任务并不要求速度,这就是为什么一些 IT 团队没有认识到自动化的价值所在。VLAN 配置是一个非常好的例子,因为,你可能会想,“创建一个 VLAN 到底有多快?一般情况下每天添加多少个 VLANs?我真的需要自动化吗?”
|
||||
|
||||
在这一节中,我们专注于另外几种有意义的自动化任务,比如,设备准备、数据收集、报告、和遵从情况。但是,需要注意的是,正如我们前面所说的,自动化为你、你的团队、以及你的精确的更可预测的结果和更多的确定性,提供了更快的速度和敏捷性。
|
||||
|
||||
### 设备准备
|
||||
|
||||
为网络自动化开始使用 Ansible 的最容易也是最快的方法是,为设备最初投入使用创建设备配置文件,并且将配置文件推送到网络设备中。
|
||||
|
||||
如果我们去完成这个过程,它将分解为两步,第一步是创建一个配置文件,第二步是推送这个配置到设备中。
|
||||
|
||||
首先,我们需要去从供应商配置文件的底层专用语法(CLI)中解耦 _输入_。这意味着我们需要对配置参数中分离出文件和值,比如,VLANs、域信息、接口、路由、和其它的内容、等等,然后,当然是一个配置的模块文件。在这个示例中,这里有一个标准模板,它可以用于所有设备的初始部署。Ansible 将帮助提供配置模板中需要的输入和值之间的部分。几秒钟之内,Ansible 可以生成数百个可靠的和可预测的配置文件。
|
||||
|
||||
让我们快速的看一个示例,它使用当前的配置,并且分解它到一个模板和单独的一个(作为一个输入源的)变量文件中。
|
||||
|
||||
这是一个配置文件片断的示例:
|
||||
|
||||
```
|
||||
hostname leaf1
|
||||
ip domain-name ntc.com
|
||||
!
|
||||
vlan 10
|
||||
name web
|
||||
!
|
||||
vlan 20
|
||||
name app
|
||||
!
|
||||
vlan 30
|
||||
name db
|
||||
!
|
||||
vlan 40
|
||||
name test
|
||||
!
|
||||
vlan 50
|
||||
name misc
|
||||
```
|
||||
|
||||
如果我们提取输入值,这个文件将被转换成一个模板。
|
||||
|
||||
###### 注意
|
||||
|
||||
Ansible 使用基于 Python 的 Jinja2 模板化语言,因此,这个被命名为 _leaf.j2_ 的文件是一个 Jinja2 模板。
|
||||
|
||||
注意,下列的示例中,_双大括号({{)_ 代表一个变量。
|
||||
|
||||
模板看起来像这些,并且给它命名为 _leaf.j2_:
|
||||
|
||||
```
|
||||
!
|
||||
hostname {{ inventory_hostname }}
|
||||
ip domain-name {{ domain_name }}
|
||||
!
|
||||
!
|
||||
{% for vlan in vlans %}
|
||||
vlan {{ vlan.id }}
|
||||
name {{ vlan.name }}
|
||||
{% endfor %}
|
||||
!
|
||||
```
|
||||
|
||||
因为双大括号代表变量,并且,我们看到这些值并不在模板中,所以它们需要将值保存在一个地方。值被保存在一个变量文件中。正如前面所说的,一个相应的变量文件看起来应该是这样的:
|
||||
|
||||
```
|
||||
---
|
||||
hostname: leaf1
|
||||
domain_name: ntc.com
|
||||
vlans:
|
||||
- { id: 10, name: web }
|
||||
- { id: 20, name: app }
|
||||
- { id: 30, name: db }
|
||||
- { id: 40, name: test }
|
||||
- { id: 50, name: misc }
|
||||
```
|
||||
|
||||
这意味着,如果管理 VLANs 的团队希望在网络设备中添加一个 VLAN,很简单,他们只需要在变量文件中改变它,然后,使用 Ansible 中一个叫 `template` 的模块,去重新生成一个新的配置文件。这整个过程也是幂等的;仅仅是在模板或者值发生改变时,它才会去生成一个新的配置文件。
|
||||
|
||||
一旦配置文件生成,它需要去 _推送_ 到网络设备。推送配置文件到网络设备使用一个叫做 `napalm_install_config`的开源的 Ansible 模块。
|
||||
|
||||
接下来的示例是一个简单的 playbook 去 _构建并推送_ 一个配置文件到网络设备。同样地,playbook 使用一个名叫 `template` 的模块去构建配置文件,然后使用一个名叫 `napalm_install_config` 的模块去推送它们,并且激活它作为设备上运行的新的配置文件。
|
||||
|
||||
虽然没有详细解释示例中的每一行,但是,你仍然可以看明白它们实际上做了什么。
|
||||
|
||||
###### 注意
|
||||
|
||||
下面的 playbook 介绍了新的概念,比如,内置变量 `inventory_hostname`。这些概念包含在 [Ansible 术语和入门][1] 中。
|
||||
|
||||
```
|
||||
---
|
||||
|
||||
- name: BUILD AND PUSH NETWORK CONFIGURATION FILES
|
||||
hosts: leaves
|
||||
connection: local
|
||||
gather_facts: no
|
||||
|
||||
tasks:
|
||||
- name: BUILD CONFIGS
|
||||
template:
|
||||
src=templates/leaf.j2
|
||||
dest=configs/{{inventory_hostname }}.conf
|
||||
|
||||
- name: PUSH CONFIGS
|
||||
napalm_install_config:
|
||||
hostname={{ inventory_hostname }}
|
||||
username={{ un }}
|
||||
password={{ pwd }}
|
||||
dev_os={{ os }}
|
||||
config_file=configs/{{ inventory_hostname }}.conf
|
||||
commit_changes=1
|
||||
replace_config=0
|
||||
```
|
||||
|
||||
这个两步的过程是一个使用 Ansible 进行网络自动化入门的简单方法。通过模板简化了你的配置,构建配置文件,然后,推送它们到网络设备 — 因此,被称为 _BUILD 和 PUSH_ 方法。
|
||||
|
||||
###### 注意
|
||||
|
||||
像这样的更详细的例子,请查看 [Ansible 网络集成][2]。
|
||||
|
||||
### 数据收集和监视
|
||||
|
||||
监视工具一般使用 SNMP — 这些工具拉某些管理信息库(MIBs),然后给监视工具返回数据。基于返回的数据,它可能多于也可能少于你真正所需要的数据。如果接口基于返回的数据统计你正在拉的内容,你可能会返回在 _show interface_ 命令中显示的计数器。如果你仅需要 _interface resets_ 并且,希望去看到与重置相关的邻接 CDP/LLDP 的接口,那该怎么做呢?当然,这也可以使用当前的技术;可以运行多个显示命令去手动解析输出信息,或者,使用基于 SNMP 的工具,在 GUI 中切换不同的选项卡(Tab)找到真正你所需要的数据。Ansible 怎么能帮助我们去完成这些工作呢?
|
||||
|
||||
由于 Ansible 是完全开放并且是可扩展的,它可以精确地去收集和监视所需要的计数器或者值。这可能需要一些预先的定制工作,但是,最终这些工作是非常有价值的。因为采集的数据是你所需要的,而不是供应商提供给你的。Ansible 也提供直观的方法去执行某些条件任务,这意味着基于正在返回的数据,你可以执行子任务,它可以收集更多的数据或者产生一个配置改变。
|
||||
|
||||
网络设备有 _许多_ 统计和隐藏在里面的临时数据,而 Ansible 可以帮你提取它们。
|
||||
|
||||
你甚至可以在 Ansible 中使用前面提到的 SNMP 的模块,模块的名字叫 `snmp_device_version`。这是在社区中存在的另一个开源模块:
|
||||
|
||||
```
|
||||
- name: GET SNMP DATA
|
||||
snmp_device_version:
|
||||
host=spine
|
||||
community=public
|
||||
version=2c
|
||||
```
|
||||
|
||||
运行前面的任务返回非常多的关于设备的信息,并且添加一些级别的发现能力到 Ansible中。例如,那个任务返回下列的数据:
|
||||
|
||||
```
|
||||
{"ansible_facts": {"ansible_device_os": "nxos", "ansible_device_vendor": "cisco", "ansible_device_version": "7.0(3)I2(1)"}, "changed": false}
|
||||
```
|
||||
|
||||
你现在可以决定某些事情,而不需要事先知道是什么类型的设备。你所需要知道的仅仅是设备的只读通讯字符串。
|
||||
|
||||
|
||||
### 迁移
|
||||
|
||||
从一个平台迁移到另外一个平台,可能是从同一个供应商或者是从不同的供应商,迁移从来都不是件容易的事。供应商可能提供一个脚本或者一个工具去帮助你迁移。Ansible 可以被用于去为所有类型的网络设备构建配置模板,然后,操作系统用这个方法去为所有的供应商生成一个配置文件,然后作为一个(通用数据模型的)输入设置。当然,如果有供应商专用的扩展,它也是会被用到的。这种灵活性不仅对迁移有帮助,而且也可以用于灾难恢复(DR),它在生产系统中不同的交换机型号之间和灾备数据中心中是经常使用的,即使是在不同的供应商的设备上。
|
||||
|
||||
|
||||
### 配置管理
|
||||
|
||||
正如前面所说的,配置管理是最常用的自动化类型。Ansible 可以很容易地做到创建 _角色(roles)_ 去简化基于任务的自动化。从更高的层面来看,角色是指针对一个特定设备组的可重用的自动化任务的逻辑分组。关于角色的另一种说法是,认为角色就是相关的工作流(workflows)。首先,在开始自动化添加值之前,需要理解工作流和过程。不论是开始一个小的自动化任务还是扩展它,理解工作流和过程都是非常重要的。
|
||||
|
||||
例如,一组自动化配置路由器和交换机的任务是非常常见的,并且它们也是一个很好的起点。但是,配置在哪台网络设备上?配置的 IP 地址是什么?或许需要一个 IP 地址管理方案?一旦用一个给定的功能分配了 IP 地址并且已经部署,DNS 也更新了吗?DHCP 的范围需要创建吗?
|
||||
|
||||
你可以看到工作流是怎么从一个小的任务开始,然后逐渐扩展到跨不同的 IT 系统?因为工作流持续扩展,所以,角色也一样(持续扩展)。
|
||||
|
||||
|
||||
### 遵从性
|
||||
|
||||
和其它形式的自动化工具一样,用任何形式的自动化工具产生配置改变都视为风险。手工去产生改变可能看上去风险更大,正如你看到的和亲身经历过的那样,Ansible 有能力去做自动数据收集、监视、和配置构建,这些都是“只读的”和“低风险”的动作。其中一个 _低风险_ 使用案例是,使用收集的数据进行配置遵从性检查和配置验证。部署的配置是否满足安全要求?是否配置了所需的网络?协议 XYZ 禁用了吗?因为每个模块、或者用 Ansible 返回数据的整合,它只是非常简单地 _声明_ 那些事是 _TRUE_ 还是 _FALSE_。然后接着基于 _它_ 是 _TRUE_ 或者是 _FALSE_, 接着由你决定应该发生什么 —— 或许它只是被记录下来,或者,也可能执行一个复杂操作。
|
||||
|
||||
### 报告
|
||||
|
||||
我们现在知道,Ansible 也可以用于去收集数据和执行遵从性检查。Ansible 可以根据你想要做的事情去从设备中返回和收集数据。或许返回的数据成为其它的任务的输入,或者你想去用它创建一个报告。从模板中生成报告,并将真实的数据插入到模板中,创建和使用报告模板的过程与创建配置模板的过程是相同的。
|
||||
|
||||
从一个报告的角度看,这些模板或许是纯文本文件,就像是在 GitHub 上看到的 markdown 文件、放置在 Web 服务器上的 HTML 文件,等等。用户有权去创建一个她希望的报告类型,插入她所需要的真实数据到报告中。
|
||||
|
||||
创建报告的用处很多,不仅是为行政管理,也为了运营工程师,因为它们通常有双方都需要的不同指标。
|
||||
|
||||
|
||||
### Ansible 怎么工作
|
||||
|
||||
从一个网络自动化的角度理解了 Ansible 能做什么之后,我们现在看一下 Ansible 是怎么工作的。你将学习到从一个 Ansible 管理主机到一个被自动化的节点的全部通讯流。首先,我们回顾一下,Ansible 是怎么 _开箱即用的(out of the box)_,然后,我们看一下 Ansible 怎么去做到的,具体说就是,当网络设备自动化时,Ansible _模块_是怎么去工作的。
|
||||
|
||||
### 开箱即用
|
||||
|
||||
到目前为止,你已经明白了,Ansible 是一个自动化平台。实际上,它是一个安装在一台单个服务器上或者企业中任何一位管理员的笔记本中的轻量级的自动化平台。当然,(安装在哪里?)这是由你来决定的。在基于 Linux 的机器上,使用一些实用程序(比如 pip、apt、和 yum)安装 Ansible 是非常容易的。
|
||||
|
||||
###### 注意
|
||||
|
||||
在本报告的其余部分,安装 Ansible 的机器被称为 _控制主机_。
|
||||
|
||||
控制主机将执行在 Ansible 的 playbook (不用担心,稍后我们将讲到 playbook 和其它的 Ansible 术语)中定义的所有自动化任务。现在,我们只需要知道,一个 playbook 是简单的一组自动化任务和在给定数量的主机上执行的指令。
|
||||
|
||||
当一个 playbook 创建之后,你还需要去定义它要自动化的主机。映射一个 playbook 和要自动化运行的主机,是通过一个被称为 Ansible 清单的文件。这是一个前面展示的示例,但是,这里是同一个清单文件的另外两个组:`cisco` 和 `arista`:
|
||||
|
||||
```
|
||||
[cisco]
|
||||
nyc1.acme.com
|
||||
nyc2.acme.com
|
||||
|
||||
[arista]
|
||||
sfo1.acme.com
|
||||
sfo2.acme.com
|
||||
```
|
||||
|
||||
###### 注意
|
||||
|
||||
你也可以在清单文件中使用 IP 地址,而不是主机名。对于这样的示例,主机名将是通过 DNS 可解析的。
|
||||
|
||||
正如你所看到的,Ansible 清单文件是一个文本文件,它列出了主机和主机组。然后,你可以在 playbook 中引用一个具体的主机或者组,以此去决定对给定的 play 和 playbook 在哪台主机上进行自动化。下面展示了两个示例。
|
||||
|
||||
展示的第一个示例它看上去像是,你想去自动化 `cisco` 组中所有的主机,而展示的第二个示例只对 _nyc1.acme.com_ 主机进行自动化:
|
||||
|
||||
```
|
||||
---
|
||||
|
||||
- name: TEST PLAYBOOK
|
||||
hosts: cisco
|
||||
|
||||
tasks:
|
||||
- TASKS YOU WANT TO AUTOMATE
|
||||
```
|
||||
|
||||
```
|
||||
---
|
||||
|
||||
- name: TEST PLAYBOOK
|
||||
hosts: nyc1.acme.com
|
||||
|
||||
tasks:
|
||||
- TASKS YOU WANT TO AUTOMATE
|
||||
```
|
||||
|
||||
现在,我们已经理解了基本的清单文件,我们可以看一下(在控制主机上的)Ansible 是怎么与 _开箱即用_ 的设备通讯的,和在 Linux 终端上自动化的任务。这里需要明白一个重要的观点就是,需要去自动化的网络设备通常是不一样的。(译者注:指的是设备的类型、品牌、型号等等)
|
||||
|
||||
Ansible 对基于 Linux 的系统去开箱即用自动化工作有两个要求。它们是 SSH 和 Python。
|
||||
|
||||
首先,终端必须支持 SSH 传输,因为 Ansible 使用 SSH 去连接到每个目标节点。因为 Ansible 支持一个可拔插的连接架构,也有各种类型的插件去实现不同类型的 SSH。
|
||||
|
||||
第二个要求是,Ansible 并不要求在目标节点上预先存在一个 _代理_,Ansible 并不要求一个软件代理,它仅需要一个内置的 Python 执行引擎。这个执行引擎用于去执行从 Ansible 管理主机发送到被自动化的目标节点的 Python 代码。
|
||||
|
||||
如果我们详细解释这个开箱即用工作流,它将分解成如下的步骤:
|
||||
|
||||
1. 当一个 Ansible play 被执行,控制主机使用 SSH 连接到基于 Linux 的目标节点。
|
||||
|
||||
2. 对于每个任务,也就是说,Ansible 模块将在这个 play 中被执行,通过 SSH 发送 Python 代码并直接在远程系统中执行。
|
||||
|
||||
3. 在远程系统上运行的每个 Ansible 模块将返回 JSON 数据到控制主机。这些数据包含有信息,比如,配置改变、任务成功/失败、以及其它模块特定的数据。
|
||||
|
||||
4. JSON 数据返回给 Ansible,然后被用于去生成报告,或者被用作接下来模块的输入。
|
||||
|
||||
5. 在 play 中为每个任务重复第 3 步。
|
||||
|
||||
6. 在 playbook 中为每个 play 重复第 1 步。
|
||||
|
||||
是不是意味着每个网络设备都可以被 Ansible 开箱即用?因为它们也都支持 SSH,确实,网络设备都支持 SSH,但是,它是第一个和第二要求的组合限制了网络设备可能的功能。
|
||||
|
||||
刚开始时,大多数网络设备并不支持 Python,因此,使用默认的 Ansible 连接机制是无法进行的。换句话说,在过去的几年里,供应商在几个不同的设备平台上增加了 Python 支持。但是,这些平台中的大多数仍然缺乏必要的集成,以允许 Ansible 去直接通过 SSH 访问一个 Linux shell,并以适当的权限去拷贝所需的代码、创建临时目录和文件、以及在设备中执行代码。尽管 Ansible 中所有的这些部分都可以在基于 Linux 的网络设备上使用 SSH/Python 在本地运行,它仍然需要网络设备供应商去更进一步开放他们的系统。
|
||||
|
||||
###### 注意
|
||||
|
||||
值的注意的是,Arista 确实也提供了原生的集成,因为它可以放弃 SSH 用户,直接进入到一个 Linux shell 中访问 Python 引擎,它可以允许 Ansible 去使用默认连接机制。因为我们调用了 Arista,我们也需要着重强调与 Ansible 默认连接机制一起工作的 Cumulus。这是因为 Cumulus Linux 是原生 Linux,并且它并不需要为 Cumulus Linux 操作系统使用供应商 API。
|
||||
|
||||
### Ansible 网络集成
|
||||
|
||||
前面的节讲到过 Ansible 默认的工作方式。我们看一下,在开始一个 _play_ 之后,Ansible 是怎么去设置一个到设备的连接、通过执行拷贝 Python 代码到设备去运行任务、运行代码、和返回结果给 Ansible 控制主机。
|
||||
|
||||
在这一节中,我们将看一看,当使用 Ansible 进行自动化网络设备时都做了什么。正如前面讲过的,Ansible 是一个可拔插的连接架构。对于 _大多数_ 的网络集成, `connection` 参数设置为 `local`。在 playbook 中大多数的连接类型都设置为 `local`,如下面的示例所展示的:
|
||||
|
||||
```
|
||||
---
|
||||
|
||||
- name: TEST PLAYBOOK
|
||||
hosts: cisco
|
||||
connection: local
|
||||
|
||||
tasks:
|
||||
- TASKS YOU WANT TO AUTOMATE
|
||||
```
|
||||
|
||||
注意在 play 中是怎么定义的,这个示例增加 `connection` 参数去和前面节中的示例进行比较。
|
||||
|
||||
这告诉 Ansible 不要通过 SSH 去连接到目标设备,而是连接到本地机器运行这个 playbook。基本上,这是把连接职责委托给 playbook 中 _任务_ 节中使用的真实的 Ansible 模块。每个模块类型的委托权利允许这个模块在必要时以各种形式去连接到设备。这可能是 Juniper 和 HP Comware7 的 NETCONF、Arista 的 eAPI、Cisco Nexus 的 NX-API、或者甚至是基于传统系统的 SNMP,它们没有可编程的 API。
|
||||
|
||||
###### 注意
|
||||
|
||||
网络集成在 Ansible 中是以 Ansible 模块的形式带来的。尽管我们持续使用术语来吊你的胃口,比如,playbooks、plays、任务、和讲到的关键概念 `模块`,这些术语中的每一个都会在 [Ansible 术语和入门][3] 和 [动手实践使用 Ansible 去进行网络自动化][4] 中详细解释。
|
||||
|
||||
让我们看一看另外一个 playbook 的示例:
|
||||
|
||||
```
|
||||
---
|
||||
|
||||
- name: TEST PLAYBOOK
|
||||
hosts: cisco
|
||||
connection: local
|
||||
|
||||
tasks:
|
||||
- nxos_vlan: vlan_id=10 name=WEB_VLAN
|
||||
```
|
||||
|
||||
你注意到了吗,这个 playbook 现在包含一个任务,并且这个任务使用了 `nxos_vlan` 模块。`nxos_vlan` 模块是一个 Python 文件,并且,在这个文件中它是使用 NX-API 连接到 Cisco 的 NX-OS 设备。可是,这个连接可能是使用其它设备 API 设置的,这就是为什么供应商和用户像我们这样能够去建立自己的集成的原因。集成(模块)通常是以每特性(per-feature)为基础完成的,虽然,你已经看到了像 `napalm_install_config` 这样的模块,它们也可以被用来 _推送_ 一个完整的配置文件。
|
||||
|
||||
主要区别之一是使用的默认连接机制,Ansible 启动一个持久的 SSH 连接到设备,并且这个连接为一个给定的 play 持续。当在一个模块中发生连接设置和拆除时,与许多使用 `connection=local` 的网络模块一样,对发生在 play 级别上的 _每个_ 任务,Ansible 将登入/登出设备。
|
||||
|
||||
而在传统的 Ansible 形式下,每个网络模块返回 JSON 数据。仅有的区别是相对于目标节点,数据的推取发生在本地的 Ansible 控制主机上。相对于每供应商(per vendor)和模块类型,数据返回到 playbook,但是作为一个示例,许多的 Cisco NX-OS 模块返回已存在的状态、建议状态、和最终状态,以及发送到设备的命令(如果有的话)。
|
||||
|
||||
作为使用 Ansible 进行网络自动化的开始,最重要的是,为 Ansible 的连接设备/拆除过程,记着去设置连接参数为 `local`,并且将它留在模块中。这就是为什么模块支持不同类型的供应商平台,它将与设备使用不同的方式进行通讯。
|
||||
|
||||
|
||||
### Ansible 术语和入门
|
||||
|
||||
这一章我们将介绍许多 Ansible 的术语和报告中前面部分出现过的关键概念。比如, _清单文件_、_playbook_、_play_、_任务_、和 _模块_。我们也会去回顾一些其它的概念,这些术语和概念对我们学习使用 Ansible 去进行网络自动化非常有帮助。
|
||||
|
||||
在这一节中,我们将引用如下的一个简单的清单文件和 playbook 的示例,它们将在后面的章节中持续出现。
|
||||
|
||||
_清单示例_:
|
||||
|
||||
```
|
||||
# sample inventory file
|
||||
# filename inventory
|
||||
|
||||
[all:vars]
|
||||
user=admin
|
||||
pwd=admin
|
||||
|
||||
[tor]
|
||||
rack1-tor1 vendor=nxos
|
||||
rack1-tor2 vendor=nxos
|
||||
rack2-tor1 vendor=arista
|
||||
rack2-tor2 vendor=arista
|
||||
|
||||
[core]
|
||||
core1
|
||||
core2
|
||||
```
|
||||
|
||||
_playbook 示例_:
|
||||
|
||||
```
|
||||
---
|
||||
# sample playbook
|
||||
# filename site.yml
|
||||
|
||||
- name: PLAY 1 - Top of Rack (TOR) Switches
|
||||
hosts: tor
|
||||
connection: local
|
||||
|
||||
tasks:
|
||||
- name: ENSURE VLAN 10 EXISTS ON CISCO TOR SWITCHES
|
||||
nxos_vlan:
|
||||
vlan_id=10
|
||||
name=WEB_VLAN
|
||||
host={{ inventory_hostname }}
|
||||
username=admin
|
||||
password=admin
|
||||
when: vendor == "nxos"
|
||||
|
||||
- name: ENSURE VLAN 10 EXISTS ON ARISTA TOR SWITCHES
|
||||
eos_vlan:
|
||||
vlanid=10
|
||||
name=WEB_VLAN
|
||||
host={{ inventory_hostname }}
|
||||
username={{ user }}
|
||||
password={{ pwd }}
|
||||
when: vendor == "arista"
|
||||
|
||||
- name: PLAY 2 - Core (TOR) Switches
|
||||
hosts: core
|
||||
connection: local
|
||||
|
||||
tasks:
|
||||
- name: ENSURE VLANS EXIST IN CORE
|
||||
nxos_vlan:
|
||||
vlan_id={{ item }}
|
||||
host={{ inventory_hostname }}
|
||||
username={{ user }}
|
||||
password={{ pwd }}
|
||||
with_items:
|
||||
- 10
|
||||
- 20
|
||||
- 30
|
||||
- 40
|
||||
- 50
|
||||
```
|
||||
|
||||
### 清单文件
|
||||
|
||||
使用一个清单文件,比如前面提到的那个,允许我们去为自动化任务指定主机、和使用每个 play 顶部节中(如果存在)的参数 `hosts` 所引用的主机/组指定的主机组。
|
||||
|
||||
它也可能在一个清单文件中存储变量。如这个示例中展示的那样。如果变量在同一行视为一台主机,它是一个具体主机变量。如果变量定义在方括号中(“[ ]”),比如,`[all:vars]`,它的意思是指变量在组中的范围 `all`,它是一个默认组,包含了清单文件中的 _所有_ 主机。
|
||||
|
||||
###### 注意
|
||||
|
||||
清单文件是使用 Ansible 开始自动化的快速方法,但是,你应该已经有一个真实的网络设备源,比如一个网络管理工具或者 CMDB,它可以去创建和使用一个动态的清单脚本,而不是一个静态的清单文件。
|
||||
|
||||
### Playbook
|
||||
|
||||
playbook 是去运行自动化网络设备的顶级对象。在我们的示例中,它是一个 _site.yml_ 文件,如前面的示例所展示的。一个 playbook 使用 YAML 去定义一组自动化任务,并且,每个 playbook 由一个或多个 plays 组成。这类似于一个橄榄球的剧本。就像在橄榄球赛中,团队有剧集组成的剧本,Ansible 的 playbooks 也是由 play 组成的。
|
||||
|
||||
###### 注意
|
||||
|
||||
YAML 是一种被所有编程语言支持的数据格式。YAML 本身就是 JSON 的超集,并且,YAML 文件非常易于识别,因为它总是三个破折号(连字符)开始,比如,`---`。
|
||||
|
||||
|
||||
### Play
|
||||
|
||||
一个 Ansible playbook 可以存在一个或多个 plays。在前面的示例中,它在 playbook 中有两个 plays。每个 play 开始的地方都有一个 _header_ 节,它定义了具体的参数。
|
||||
|
||||
示例中两个 plays 都定义了下面的参数:
|
||||
|
||||
`name`
|
||||
|
||||
文件 `PLAY 1 - Top of Rack (TOR) Switches` 是任意内容的,它在 playbook 运行的时候,去改善 playbook 运行和报告期间的可读性。这是一个可选参数。
|
||||
|
||||
`hosts`
|
||||
|
||||
正如前面讲过的,这是在特定的 play 中要去进行自动化的主机或主机组。这是一个必需参数。
|
||||
|
||||
`connection`
|
||||
|
||||
正如前面讲过的,这是 play 连接机制的类型。这是个可选参数,但是,对于网络自动化 plays,一般设置为 `local`。
|
||||
|
||||
|
||||
|
||||
每个 play 都是由一个或多个任务组成。
|
||||
|
||||
|
||||
|
||||
### 任务
|
||||
|
||||
任务是以声明的方式去表示自动化的内容,而不用担心底层的语法或者操作是怎么执行的。
|
||||
|
||||
在我们的示例中,第一个 play 有两个任务。每个任务确保存在 10 个 VLAN。第一个任务是为 Cisco Nexus 设备的,而第二个任务是为 Arista 设备的:
|
||||
|
||||
```
|
||||
tasks:
|
||||
- name: ENSURE VLAN 10 EXISTS ON CISCO TOR SWITCHES
|
||||
nxos_vlan:
|
||||
vlan_id=10
|
||||
name=WEB_VLAN
|
||||
host={{ inventory_hostname }}
|
||||
username=admin
|
||||
password=admin
|
||||
when: vendor == "nxos"
|
||||
```
|
||||
|
||||
任务也可以使用 `name` 参数,就像 plays 一样。和 plays 一样,文本内容是任意的,并且当 playbook 运行时显示,去改善 playbook 运行和报告期间的可读性。它对每个任务都是可选参数。
|
||||
|
||||
示例任务中的下一行是以 `nxos_vlan` 开始的。它告诉我们这个任务将运行一个叫 `nxos_vlan` 的 Ansible 模块。
|
||||
|
||||
现在,我们将进入到模块中。
|
||||
|
||||
|
||||
|
||||
### 模块
|
||||
|
||||
在 Ansible 中理解模块的概念是至关重要的。虽然任何编辑语言都可以用来写 Ansible 模块,只要它们能够返回 JSON 键 — 值对即可,但是,几乎所有的模块都是用 Python 写的。在我们示例中,我们看到有两个模块被运行: `nxos_vlan` 和 `eos_vlan`。这两个模块都是 Python 文件;而事实上,在你不能看到 playbook 的时候,真实的文件名分别是 _eos_vlan.py_ 和 _nxos_vlan.py_。
|
||||
|
||||
让我们看一下前面的示例中第一个 play 中的第一个 任务:
|
||||
|
||||
```
|
||||
- name: ENSURE VLAN 10 EXISTS ON CISCO TOR SWITCHES
|
||||
nxos_vlan:
|
||||
vlan_id=10
|
||||
name=WEB_VLAN
|
||||
host={{ inventory_hostname }}
|
||||
username=admin
|
||||
password=admin
|
||||
when: vendor == "nxos"
|
||||
```
|
||||
|
||||
这个任务运行 `nxos_vlan`,它是一个自动配置 VLAN 的模块。为了使用这个模块,包含它,你需要为设备指定期望的状态或者配置策略。这个示例中的状态是:VLAN 10 将被配置一个名字 `WEB_VLAN`,并且,它将被自动配置到每个交换机上。我们可以看到,使用 `vlan_id` 和 `name` 参数很容易做到。模块中还有三个其它的参数,它们分别是:`host`、`username`、和 `password`:
|
||||
|
||||
`host`
|
||||
|
||||
这是将要被自动化的主机名(或者 IP 地址)。因为,我们希望去自动化的设备已经被定义在清单文件中,我们可以使用内置的 Ansible 变量 `inventory_hostname`。这个变量等价于清单文件中的内容。例如,在第一个循环中,在清单文件中的主机是 `rack1-tor1`,然后,在第二个循环中,它是 `rack1-tor2`。这些名字是进入到模块的,并且包含在模块中的,在每个名字到 IP 地址的解析中,都发生一个 DNS 查询。然后与这个设备进行通讯。
|
||||
|
||||
`username`
|
||||
|
||||
用于登入到交换机的用户名。
|
||||
|
||||
|
||||
`password`
|
||||
|
||||
用于登入到交换机的密码。
|
||||
|
||||
|
||||
示例中最后的片断部分使用了一个 `when` 语句。这是在一个 play 中使用的 Ansible 的执行条件任务。正如我们所了解的,在这个 play 的 `tor` 组中有多个设备和设备类型。使用 `when` 基于任意标准去提供更多的选择。这里我们仅自动化 Cisco 设备,因为,我们在这个任务中使用了 `nxos_vlan` 模块,在下一个任务中,我们仅自动化 Arista 设备,因为,我们使用了 `eos_vlan` 模块。
|
||||
|
||||
###### 注意
|
||||
|
||||
这并不是区分设备的唯一方法。这里仅是演示如何使用 `when`,并且可以在清单文件中定义变量。
|
||||
|
||||
在清单文件中定义变量是一个很好的开端,但是,如果你继续使用 Ansible,你将会为了扩展性、版本控制、对给定文件的改变最小化而去使用基于 YAML 的变量。这也将简化和改善清单文件和每个使用的变量的可读性。在设备准备的构建/推送方法中讲过一个变量文件的示例。
|
||||
|
||||
在最后的示例中,关于任务有几点需要去搞清楚:
|
||||
|
||||
* Play 1 任务 1 展示了硬编码了 `username` 和 `password` 作为参数进入到具体的模块中(`nxos_vlan`)。
|
||||
|
||||
* Play 1 任务 1 和 play 2 在模块中使用了变量,而不是硬编码它们。这掩饰了 `username` 和 `password` 参数,但是,需要值得注意的是,(在这个示例中)这些变量是从清单文件中提取出现的。
|
||||
|
||||
* Play 1 中为进入到模块中的参数使用了一个 _水平的(horizontal)_ 的 key=value 语法,虽然 play 2 使用了垂直的(vertical) key=value 语法。它们都工作的非常好。你也可以使用垂直的 YAML “key: value” 语法。
|
||||
|
||||
* 最后的任务也介绍了在 Ansible 中怎么去使用一个 _loop_ 循环。它通过使用 `with_items` 来完成,并且它类似于一个 for 循环。那个特定的任务是循环进入五个 VLANs 中去确保在交换机中它们都存在。注意:它也可能被保存在一个外部的 YAML 变量文件中。还需要注意的一点是,不使用 `with_items` 的替代方案是,每个 VLAN 都有一个任务 —— 如果这样做,它就失去了弹性!
|
||||
|
||||
|
||||
### 动手实践使用 Ansible 去进行网络自动化
|
||||
|
||||
在前面的章节中,提供了 Ansible 术语的一个概述。它已经覆盖了大多数具体的 Ansible 术语,比如 playbooks、plays、任务、模块、和清单文件。这一节将继续提供示例去讲解使用 Ansible 实现网络自动化,而且将提供在不同类型的设备中自动化工作的模块的更多细节。示例中的将要进行自动化设备由多个供应商提供,包括 Cisco、Arista、Cumulus、和 Juniper。
|
||||
|
||||
在本节中的示例,假设的前提条件如下:
|
||||
|
||||
* Ansible 已经安装。
|
||||
|
||||
* 在设备中(NX-API、eAPI、NETCONF)适合的 APIs 已经启用。
|
||||
|
||||
* 用户在系统上有通过 API 去产生改变的适当权限。
|
||||
|
||||
* 所有的 Ansible 模块已经在系统中存在,并且也在库的路径变量中。
|
||||
|
||||
###### 注意
|
||||
|
||||
可以在 _ansible.cfg_ 文件中设置模块和库路径。在你运行一个 playbook 时,你也可以使用 `-M` 标志从命令行中去改变它。
|
||||
|
||||
在本节中示例使用的清单如下。(删除了密码,IP 地址也发生了变化)。在这个示例中,(和前面的示例一样)某些主机名并不是完全合格域名(FQDNs)。
|
||||
|
||||
|
||||
### 清单文件
|
||||
|
||||
```
|
||||
[cumulus]
|
||||
cvx ansible_ssh_host=1.2.3.4 ansible_ssh_pass=PASSWORD
|
||||
|
||||
[arista]
|
||||
veos1
|
||||
|
||||
[cisco]
|
||||
nx1 hostip=5.6.7.8 un=USERNAME pwd=PASSWORD
|
||||
|
||||
[juniper]
|
||||
vsrx hostip=9.10.11.12 un=USERNAME pwd=PASSWORD
|
||||
```
|
||||
|
||||
###### 注意
|
||||
|
||||
正如你所知道的,Ansible 支持将密码存储在一个加密文件中的功能。如果你想学习关于这个特性的更多内容,请查看在 Ansible 网站上的文档中的 [Ansible Vault][5] 部分。
|
||||
|
||||
这个清单文件有四个组,每个组定义了一台单个的主机。让我们详细回顾一下每一节:
|
||||
|
||||
Cumulus
|
||||
|
||||
主机 `cvx` 是一个 Cumulus Linux (CL) 交换机,并且它是 `cumulus` 组中的唯一设备。记住,CL 是原生 Linux,因此,这意味着它是使用默认连接机制(SSH)连到到需要自动化的 CL 交换机。因为 `cvx` 在 DNS 或者 _/etc/hosts_ 文件中没有定义,我们将让 Ansible 知道不要在清单文件中定义主机名,而是在 `ansible_ssh_host` 中定义的名字/IP。登陆到 CL 交换机的用户名被定义在 playbook 中,但是,你可以看到密码使用变量 `ansible_ssh_pass` 定义在清单文件中。
|
||||
|
||||
Arista
|
||||
|
||||
被称为 `veos1` 的是一台运行 EOS 的 Arista 交换机。它是在 `arista` 组中唯一的主机。正如你在 Arista 中看到的,在清单文件中并没有其它的参数存在。这是因为 Arista 为它们的设备使用了一个特定的配置文件。在我们的示例中它的名字为 _.eapi.conf_,它存在在 home 目录中。下面是正确使用配置文件的这个功能的示例:
|
||||
|
||||
```
|
||||
[connection:veos1]
|
||||
host: 2.4.3.4
|
||||
username: unadmin
|
||||
password: pwadmin
|
||||
```
|
||||
|
||||
这个文件包含了定义在配置文件中的 Ansible 连接到设备(和 Arista 的被称为 _pyeapi_ 的 Python 库)所需要的全部信息。
|
||||
|
||||
Cisco
|
||||
|
||||
和 Cumulus 和 Arista 一样,这里仅有一台主机(`nx1`)存在于 `cisco` 组中。这是一台 NX-OS-based Cisco Nexus 交换机。注意在这里为 `nx1` 定义了三个变量。它们包括 `un` 和 `pwd`,这是为了在 playbook 中访问和为了进入到 Cisco 模块去连接到设备。另外,这里有一个称为 `hostip` 的参数,它是必需的,因为,`nx1` 没有在 DNS 中或者是 _/etc/hosts_ 配置文件中定义。
|
||||
|
||||
|
||||
###### 注意
|
||||
|
||||
如果自动化一个原生的 Linux 设备,我们可以将这个参数命名为任何东西。`ansible_ssh_host` 被用于到如我们看到的那个 Cumulus 示例(如果在清单文件中的定义不能被解析)。在这个示例中,我们将一直使用 `ansible_ssh_host`,但是,它并不是必需的,因为我们将这个变量作为一个参数进入到 Cisco 模块,而 `ansible_ssh_host` 是在使用默认的 SSH 连接机制时自动检查的。
|
||||
|
||||
Juniper
|
||||
|
||||
和前面的三个组和主机一样,在 `juniper` 组中有一个单个的主机 `vsrx`。它在清单文件中的设置与 Cisco 相同,因为两者在 playbook 中使用了相同的方式。
|
||||
|
||||
|
||||
### Playbook
|
||||
|
||||
接下来的 playbook 有四个不同的 plays。每个 play 是基于特定的供应商类型的设备组的自动化构建的。注意,那是在一个单个的 playbook 中执行这些任务的唯一的方法。这里还有其它的方法,它可以使用条件(`when` 语句)或者创建 Ansible 角色(它在这个报告中没有介绍)。
|
||||
|
||||
这里有一个 playbook 的示例:
|
||||
|
||||
```
|
||||
---
|
||||
|
||||
- name: PLAY 1 - CISCO NXOS
|
||||
hosts: cisco
|
||||
connection: local
|
||||
|
||||
tasks:
|
||||
- name: ENSURE VLAN 100 exists on Cisco Nexus switches
|
||||
nxos_vlan:
|
||||
vlan_id=100
|
||||
name=web_vlan
|
||||
host={{ hostip }}
|
||||
username={{ un }}
|
||||
password={{ pwd }}
|
||||
|
||||
- name: PLAY 2 - ARISTA EOS
|
||||
hosts: arista
|
||||
connection: local
|
||||
|
||||
tasks:
|
||||
- name: ENSURE VLAN 100 exists on Arista switches
|
||||
eos_vlan:
|
||||
vlanid=100
|
||||
name=web_vlan
|
||||
connection={{ inventory_hostname }}
|
||||
|
||||
- name: PLAY 3 - CUMULUS
|
||||
remote_user: cumulus
|
||||
sudo: true
|
||||
hosts: cumulus
|
||||
|
||||
tasks:
|
||||
- name: ENSURE 100.10.10.1 is configured on swp1
|
||||
cl_interface: name=swp1 ipv4=100.10.10.1/24
|
||||
|
||||
- name: restart networking without disruption
|
||||
shell: ifreload -a
|
||||
|
||||
- name: PLAY 4 - JUNIPER SRX changes
|
||||
hosts: juniper
|
||||
connection: local
|
||||
|
||||
tasks:
|
||||
- name: INSTALL JUNOS CONFIG
|
||||
junos_install_config:
|
||||
host={{ hostip }}
|
||||
file=srx_demo.conf
|
||||
user={{ un }}
|
||||
passwd={{ pwd }}
|
||||
logfile=deploysite.log
|
||||
overwrite=yes
|
||||
diffs_file=junpr.diff
|
||||
```
|
||||
|
||||
你将注意到,前面的两个 plays 是非常类似的,我们已经在最初的 Cisco 和 Arista 示例中讲过了。唯一的区别是每个要自动化的组(`cisco` and `arista`) 定义了它们自己的 play,我们在前面介绍使用 `when` 条件时比较过。
|
||||
|
||||
这里有一个不正确的或者是错误的方式去做这些。这取决于你预先知道的信息是什么和适合你的环境和使用的最佳案例是什么,但我们的目的是为了展示做同一件事的几种不同的方法。
|
||||
|
||||
第三个 play 是在 Cumulus Linux 交换机的 `swp1` 接口上进行自动化配置。在这个 play 中的第一个任务是去确认 `swp1` 是一个三层接口,并且它配置的 IP 地址是 100.10.10.1。因为 Cumulus Linux 是原生的 Linux,网络服务在改变后需要重启才能生效。这也可以使用 Ansible 的操作来达到这个目的(这已经超出了本报告讨论的范围),这里有一个被称为 `service` 的 Ansible 核心模块来做这些,但它会中断交换机上的网络;使用 `ifreload` 重新启动则不会中断。
|
||||
|
||||
本节到现在为止,我们已经讲解了专注于特定任务的 Ansible 模块,比如,配置接口和 VLANs。第四个 play 使用了另外的选项。我们将看到一个 _pushes_ 模块,它是一个完整的配置文件并且立即激活它作为正在运行的新的配置。这里将使用 `napalm_install_config`来展示前面的示例,但是,这个示例使用了一个 Juniper 专用的模块。
|
||||
|
||||
`junos_install_config` 模块接受几个参数,如下面的示例中所展示的。到现在为止,你应该理解了什么是 `user`、`passwd`、和 `host`。其它的参数定义如下:
|
||||
|
||||
`file`
|
||||
|
||||
这是一个从 Ansible 控制主机拷贝到 Juniper 设备的配置文件。
|
||||
|
||||
`logfile`
|
||||
|
||||
这是可选的,但是,如果你指定它,它会被用于存储运行这个模块时生成的信息。
|
||||
|
||||
`overwrite`
|
||||
|
||||
当你设置为 yes/true 时,完整的配置将被发送的配置覆盖。(默认是 false)
|
||||
|
||||
`diffs_file`
|
||||
|
||||
这是可选的,但是,如果你指定它,当应用配置时,它将存储生成的差异。当应用配置时将存储一个生成的差异。当正好更改了主机名,但是,仍然发送了一个完整的配置文件时会生成一个差异,如下的示例:
|
||||
|
||||
```
|
||||
# filename: junpr.diff
|
||||
[edit system]
|
||||
- host-name vsrx;
|
||||
+ host-name vsrx-demo;
|
||||
```
|
||||
|
||||
|
||||
上面已经介绍了 playbook 概述的细节。现在,让我们看看当 playbook 运行时发生了什么:
|
||||
|
||||
###### 注意
|
||||
|
||||
注意:`-i` 标志是用于指定使用的清单文件。也可以设置环境变量 `ANSIBLE_HOSTS`,而不用每次运行 playbook 时都去使用一个 `-i` 标志。
|
||||
|
||||
```
|
||||
ntc@ntc:~/ansible/multivendor$ ansible-playbook -i inventory demo.yml
|
||||
|
||||
PLAY [PLAY 1 - CISCO NXOS] *************************************************
|
||||
|
||||
TASK: [ENSURE VLAN 100 exists on Cisco Nexus switches] *********************
|
||||
changed: [nx1]
|
||||
|
||||
PLAY [PLAY 2 - ARISTA EOS] *************************************************
|
||||
|
||||
TASK: [ENSURE VLAN 100 exists on Arista switches] **************************
|
||||
changed: [veos1]
|
||||
|
||||
PLAY [PLAY 3 - CUMULUS] ****************************************************
|
||||
|
||||
GATHERING FACTS ************************************************************
|
||||
ok: [cvx]
|
||||
|
||||
TASK: [ENSURE 100.10.10.1 is configured on swp1] ***************************
|
||||
changed: [cvx]
|
||||
|
||||
TASK: [restart networking without disruption] ******************************
|
||||
changed: [cvx]
|
||||
|
||||
PLAY [PLAY 4 - JUNIPER SRX changes] ****************************************
|
||||
|
||||
TASK: [INSTALL JUNOS CONFIG] ***********************************************
|
||||
changed: [vsrx]
|
||||
|
||||
PLAY RECAP ***************************************************************
|
||||
to retry, use: --limit @/home/ansible/demo.retry
|
||||
|
||||
cvx : ok=3 changed=2 unreachable=0 failed=0
|
||||
nx1 : ok=1 changed=1 unreachable=0 failed=0
|
||||
veos1 : ok=1 changed=1 unreachable=0 failed=0
|
||||
vsrx : ok=1 changed=1 unreachable=0 failed=0
|
||||
```
|
||||
|
||||
你可以看到每个任务成功完成;如果你是在终端上运行,你将看到用琥珀色显示的每个改变的任务。
|
||||
|
||||
让我们再次运行 playbook。通过再次运行,我们可以校验所有模块的 _幂等性_;当我们这样做的时候,我们看到设备上 _没有_ 产生变化,并且所有的东西都是绿色的:
|
||||
|
||||
```
|
||||
PLAY [PLAY 1 - CISCO NXOS] ***************************************************
|
||||
|
||||
TASK: [ENSURE VLAN 100 exists on Cisco Nexus switches] ***********************
|
||||
ok: [nx1]
|
||||
|
||||
PLAY [PLAY 2 - ARISTA EOS] ***************************************************
|
||||
|
||||
TASK: [ENSURE VLAN 100 exists on Arista switches] ****************************
|
||||
ok: [veos1]
|
||||
|
||||
PLAY [PLAY 3 - CUMULUS] ******************************************************
|
||||
|
||||
GATHERING FACTS **************************************************************
|
||||
ok: [cvx]
|
||||
|
||||
TASK: [ENSURE 100.10.10.1 is configured on swp1] *****************************
|
||||
ok: [cvx]
|
||||
|
||||
TASK: [restart networking without disruption] ********************************
|
||||
skipping: [cvx]
|
||||
|
||||
PLAY [PLAY 4 - JUNIPER SRX changes] ******************************************
|
||||
|
||||
TASK: [INSTALL JUNOS CONFIG] *************************************************
|
||||
ok: [vsrx]
|
||||
|
||||
PLAY RECAP ***************************************************************
|
||||
cvx : ok=2 changed=0 unreachable=0 failed=0
|
||||
nx1 : ok=1 changed=0 unreachable=0 failed=0
|
||||
veos1 : ok=1 changed=0 unreachable=0 failed=0
|
||||
vsrx : ok=1 changed=0 unreachable=0 failed=0
|
||||
```
|
||||
|
||||
注意:这里有 0 个改变,但是,每次运行任务,正如期望的那样,它们都返回 “ok”。说明在这个 playbook 中的每个模块都是幂等的。
|
||||
|
||||
|
||||
### 总结
|
||||
|
||||
Ansible 是一个超级简单的、无代理和可扩展的自动化平台。网络社区持续不断地围绕 Ansible 去重整它作为一个能够执行一些自动化网络任务的平台,比如,做配置管理、数据收集和报告,等等。你可以使用 Ansible 去推送完整的配置文件,配置具体的使用幂等模块的网络资源,比如,接口、VLANs,或者,简单地自动收集信息,比如,领居、序列号、启动时间、和接口状态,以及按你的需要定制一个报告。
|
||||
|
||||
因为它的架构,Ansible 被证明是一个在这里可用的、非常好的工具,它可以帮助你实现从传统的基于 _CLI/SNMP_ 的网络设备到基于 _API 驱动_ 的现代化网络设备的自动化。
|
||||
|
||||
在网络社区中,易于使用和无代理架构的 Ansible 的占比持续增加,它将使没有 APIs 的设备(CLI/SNMP)的自动化成为可能。包括独立的交换机、路由器、和 4-7 层的服务应用程序;甚至是提供了 RESTful API 的那些软件定义网络控制器。
|
||||
|
||||
当使用 Ansible 实现网络自动化时,不会落下任何设备。
|
||||
|
||||
-----------
|
||||
|
||||
作者简介:
|
||||
|
||||

|
||||
|
||||
Jason Edelman,CCIE 15394 & VCDX-NV 167,出生并成长于新泽西州的一位网络工程师。他是一位典型的 “CLI 爱好者” 和 “路由器小子”。在几年前,他决定更多地关注于软件、开发实践、以及怎么与网络工程融合。Jason 目前经营着一个小的咨询公司,公司名为:Network to Code(http://networktocode.com/),帮助供应商和终端用户利用新的工具和技术来减少他们的低效率操作...
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://www.oreilly.com/learning/network-automation-with-ansible
|
||||
|
||||
作者:[Jason Edelman][a]
|
||||
译者:[qhwdw](https://github.com/qhwdw)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]:https://www.oreilly.com/people/ee4fd-jason-edelman
|
||||
[1]:https://www.oreilly.com/learning/network-automation-with-ansible#ansible_terminology_and_getting_started
|
||||
[2]:https://www.oreilly.com/learning/network-automation-with-ansible#ansible_network_integrations
|
||||
[3]:https://www.oreilly.com/learning/network-automation-with-ansible#ansible_terminology_and_getting_started
|
||||
[4]:https://www.oreilly.com/learning/network-automation-with-ansible#handson_look_at_using_ansible_for_network_automation
|
||||
[5]:http://docs.ansible.com/ansible/playbooks_vault.html
|
||||
[6]:https://www.oreilly.com/people/ee4fd-jason-edelman
|
||||
[7]:https://www.oreilly.com/people/ee4fd-jason-edelman
|
94
translated/tech/20171017 Image Processing on Linux.md
Normal file
94
translated/tech/20171017 Image Processing on Linux.md
Normal file
@ -0,0 +1,94 @@
|
||||
Linux上的图像处理
|
||||
============================================================
|
||||
|
||||
|
||||
我发现了很多生成图像表示你的数据和工作的系统软件,但是我不能写太多其他东西。因此在这篇文章中,包含了一款叫 ImageJ 的热门图像处理软件。特别的,我注意到了 [Fiji][4], 一例绑定了科学性图像处理的系列插件的 ImageJ 版本。
|
||||
|
||||
Fiji这个名字是一个循环缩略词,很像 GNU 。代表着 "Fiji Is Just ImageJ"。 ImageJ 是科学研究领域进行图像分析的实用工具——例如你可以用它来辨认航拍风景图中树的种类。 ImageJ 能划分物品种类。它以插件架构制成,海量插件供选择以提升使用灵活度。
|
||||
|
||||
首先是安装 ImageJ (或 Fiji). 大多数的 ImageJ 发行版都可使用软件包。你愿意的话,可以以这种方式安装它然后为你的研究安装所需的独立插件。另一种选择是安装 Fiji 的同时获取最常用的插件。不幸的是,大多数 Linux 发行版的软件中心不会有可用的 Fiji 安装包。幸而,官网上的简单安装文件是可以使用的。包含了运行 Fiji 需要的所有文件目录。第一次启动时,会给一个有菜单项列表的工具栏。(图1)
|
||||
|
||||

|
||||
|
||||
图 1\.第一次打开 Fiji 有一个最小化的界面。
|
||||
|
||||
如果你没有备好图片来练习使用 ImageJ ,Fiji 安装包包含了一些示例图片。点击文件->打开示例图片的下拉菜单选项(图2)。这些案例包含了许多你可能有兴趣做的任务。
|
||||
|
||||

|
||||
|
||||
图 2\. 案例图片可供学习使用 ImageJ。
|
||||
|
||||
安装了 Fiji,而不是单纯的 ImageJ ,大量插件也会被安装。首先要注意的是自动更新插件。每次打开 ImageJ ,该插件联网检验 ImageJ 和已安装插件的更新。所有已安装的插件都在插件菜单项中可选。一旦你安装了很多插件,列表会变得冗杂,所以需要精简你的插件选项。你想手动更新的话,点击帮助->更新 Fiji 菜单项强制检测获取可用更新列表(图3)。
|
||||
|
||||

|
||||
|
||||
图 3\. 强制手动检测可用更新。
|
||||
|
||||
那么,Now,用 Fiji/ImageJ 可以做什么呢?举一例,图片中的物品数。你可以通过点击文件->打开示例->胚芽来载入一例。
|
||||
|
||||

|
||||
|
||||
图 4\. 用 ImageJ算出图中的物品数。
|
||||
|
||||
第一步设定图片的范围这样你可以告诉 ImageJ 如何判别物品。首先,选择在工具栏选择线条按钮。然后选择分析->设定范围,然后就会设置范围内包含的像素点个数(图 5)。你可以设置已知距离为100,单元为“um”。
|
||||
|
||||

|
||||
|
||||
图 5\. 很多图片分析任务需要对图片设定一个范围。
|
||||
|
||||
接下来的步骤是简化图片内的信息。点击图片->类型->8比特来减少信息量到8比特灰度图片。点击处理->二进制->图片定界, 以分隔独立物体。点击处理->二进制->设置二进制来自动给图片定界(图 6)。
|
||||
|
||||

|
||||
|
||||
图 6\. 有些像开关一样完成自动任务的工具。
|
||||
|
||||
图片内的物品计数前,你需要移除像范围轮廓之类的人工操作。可以用三角选择工具来选中它并点击编辑->清空来完成这项操作。现在你可以分析图片看看这里是啥物体。
|
||||
|
||||
确保图中没有区域被选中,点击分析->分析最小粒子弹出窗口来选择最小尺寸,这决定了最后的图片会展示什么(图7)。
|
||||
|
||||

|
||||
|
||||
图 7\.你可以通过确定最小尺寸生成一个缩减过的图片。
|
||||
|
||||
图 8 在总结窗口展示了一个概览。每个最小点也有独立的细节窗口。
|
||||
|
||||

|
||||
|
||||
图 8\. 包含了已知最小点总览清单的输出结果。
|
||||
|
||||
只要你有一个分析程序来给定图片类型,相同的程序往往需要被应用到一系列图片当中。可能数以千计,你当然不会想对每张图片手动重复操作。这时候,你可以集中必要步骤到宏这样它们可以被应用多次。点击插件->宏- >记录弹出一个新的窗口记录你随后的所有命令。所有步骤一完成,你可以将之保存为一个宏文件并且通过点击插件->宏->运行来在其他图片上重复运行。
|
||||
|
||||
如果你有特定的工作步骤,你可以轻易打开宏文件并手动编辑它,因为它是一个简单的文本文件。事实上有一套完整的宏语言可供你更加充分地控制图片处理过程。
|
||||
|
||||
然而,如果你有真的非常多的系列图片需要处理,这也将是冗长乏味的工作。这种情况下,前往过程->批量->宏弹出一个新窗口你可以批量处理工作(图9)。
|
||||
|
||||

|
||||
|
||||
图 9\. 在批量输出图片时用简单命令运行宏。
|
||||
|
||||
这个窗口中,你能选择应用哪个宏文件,输入图片所在的源目录和你想写入输出图片的输出目录。也可以设置输出文件格式及通过文件名筛选输入图片中需要使用的。万事具备,点击窗口下方的的处理按钮开始批量操作。
|
||||
|
||||
若这是会重复多次的工作,你可以点击窗口底部的保存按钮保存批量处理到一个文本文件。点击也在窗口底部的开始按钮重载相同的工作。所有的应用都使得研究中最冗余部分自动化,这样你就可以在重点放在实际的科学研究中。
|
||||
考虑到单单是 ImageJ 主页就有超过500个插件和超过300种宏可供使用,简短起见,我只能在这篇短文中提出最基本的话题。幸运的是,有很多专业领域的教程可供使用,项目主页上还有关于 ImageJ 核心的非常棒的文档。如果觉得这个工具对研究有用,你研究的专业领域也会有很多信息指引你。
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
作者简介:
|
||||
|
||||
Joey Bernard 有物理学和计算机科学的相关背景。这对他在新不伦瑞克大学当计算研究顾问的日常工作大有裨益。他也教计算物理和并行程序规划。
|
||||
|
||||
--------------------------------
|
||||
|
||||
via: https://www.linuxjournal.com/content/image-processing-linux
|
||||
|
||||
作者:[Joey Bernard][a]
|
||||
译者:[XYenChi](https://github.com/XYenChi)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]:https://www.linuxjournal.com/users/joey-bernard
|
||||
[1]:https://www.linuxjournal.com/tag/science
|
||||
[2]:https://www.linuxjournal.com/tag/statistics
|
||||
[3]:https://www.linuxjournal.com/users/joey-bernard
|
||||
[4]:https://imagej.net/Fiji
|
Loading…
Reference in New Issue
Block a user