Merge remote-tracking branch 'LCTT/master'

This commit is contained in:
Xingyu Wang 2019-08-14 09:35:24 +08:00
commit 1f452d2b8b
9 changed files with 1008 additions and 94 deletions

View File

@ -0,0 +1,103 @@
[#]: collector: (lujun9972)
[#]: translator: ( )
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (WAN Transformation: Its More Than SD-WAN)
[#]: via: (https://www.networkworld.com/article/3430638/wan-transformation-it-s-more-than-sd-wan.html)
[#]: author: (Cato Networks https://www.networkworld.com/author/Matt-Conran/)
WAN Transformation: Its More Than SD-WAN
======
Tomorrows networking challenges will extend beyond the capabilities of SD-WAN alone. Heres why and how you can prepare your network.
![metamorworks][1]
As an IT leader, youre expected to be the technology vanguard of your organization. It is you who must deflate technology hype and devise the technology plan to keep the organization competitive.
Addressing the WAN is, of course, essential to that plan. The high costs and limited agility of legacy MPLS-based networks are well known. Whats less clear is how to transform the enterprise network in a way that will remain agile and efficient for decades to come.
Many mistakenly assume [SD-WAN][2] to be that transformation. After all, SD-WAN brings agility, scalability, and cost efficiencies lacking in telco-managed MPLS services.  But while a critical step, SD-WAN alone is insufficient to address the networking challenges youre likely to face today — and tomorrow. Heres why.
### SD-WAN Cannot Fix the Unpredictability of the Global Internet
Enterprise networks are nothing but predictable. Yet to realize their benefits, SD-WANs must rely on the unpredictable public Internet, a crapshoot, meeting enterprise requirements one day, wildly amiss the next. Theres simply no way to anticipate [exceptional Internet events][3]. And with global Internet connections, you dont even need to wait for unusual. At Cato, we routinely see [how latency across our private backbone can halve the latency of similar Internet routes][4], a fact confirmed by numerous third-party sources.
![][5]
SD-WAN Lacks Security, Yet Security is Required _Everywhere_Is it any wonder [SD-WAN vendors][6] partner with legacy telcos? But telcos too often come with a last-mile agenda, locking you into specific providers. Cost and support models are also designed for the legacy business, not the digital one.
Its no secret that with Internet access you need the advanced security protection of a next-generation firewall, IPS, and the rest of todays security stack. It's also no secret SD-WAN lacks advanced security. How, then, will you provide branch locations with [secure Direct Internet Access (DIA)?][7]
Deploying branch security appliances will complicate the network, running counter to your goal of creating a leaner, more agile infrastructure. Appliances, whether physical or virtual ([VNFs in an NFV architecture][8]), must be maintained. New software patches must be tested, staged, and deployed. As traffic loads grow or compute-intensive features are enabled, such as TLS inspection, the security appliances compute requirements increase, ultimately forcing unplanned hardware upgrades.  
Cloud security services can avoid those problems. But too often they only inspect Internet traffic, not site-to-site traffic, forcing IT to maintain and coordinate separate security policies, complicating troubleshooting and deployment.
### SD-WAN Does Not Extend Well to the Cloud, Mobile Users, or the Tools of Tomorrow
Then theres the problem of the new tenants of the enterprise. SD-WAN is an MPLS replacement; it doesnt extend naturally to the cloud, todays destination for most enterprise traffic. And mobile users are completely beyond SD-WANs scope, requiring separate connectivity and security infrastructure that too often disrupts the mobile experience and fragments visibility, complicating troubleshooting and management.
Just over the horizon are IoT devices, not to mention the developments we cant even foresee.  In many cases, installing appliances wont be possible. How will your SD-WAN accommodate these developments without compromising on the operational agility and efficiencies demanded by the digital business?
### Its Time to Evolve the Network Architecture
Continuing to solve network challenges in parts —MPLS service here, remote access VPN there, and a sprinkling of cloud access solutions, routers, firewalls, WAN optimizers, and sensors — only persists in complicating the enterprise network, ultimately restricting how much cost can be saved or operational efficiency gained. SD-WAN-only solutions are symptomatic of this segmented thinking, solving only a small part of the enterprises far bigger networking challenge.
Whats needed isnt another point appliance or another network. Whats needed is **one network** that connects **and** secures **all** company resources worldwide without compromising on cost or performance. This is an architectural issue, one that cant be solved by repackaging multiple appliances as a network service. Such approaches lead to inconsistent services, poor manageability, and high latency — a fact that Gartner notes in its recent [Hype Cycle for Enterprise Networking][9]. 
# Picture the Network of the Future
What might this architecture look like? At its basis, think of collapsing MPLS, VPN, and all other networks with **one** global, private, managed backbone available from anywhere to anywhere. Such as network would connect all edges — sites, cloud resources, and mobile devices — with far better performance than the Internet at far lower cost than MPLS services.
Such a vision is possible today, in fact, due to two trends — the massive investment in global IP capacity and advancements in high-performance, commercial off-the-shelf (COTS) hardware. 
#### Connect
The Points of Presence (PoPs) comprising such a backbone would interconnect using SLA-backed IP connections across multiple provider networks. By connecting PoPs across multiple networks, the backbone would offer better performance and resiliency than any one underlying network. It would, in effect, bring the power of SD-WAN to the backbone core.
The cloud-native software would execute all major networking and security functions normally running in edge appliances. WAN optimization, dynamic path selection, policy-based routing, and more would move to the cloud. The PoPs would also monitor the real-time conditions of the underlying networks, routing traffic, including cloud traffic, along the optimum path to the PoP closest to the destination. 
With most processing done by the PoP, connecting any type of “edge” — site, cloud resources, mobile devices, IoT devices, and more — would become simple. All thats needed is a small client, primarily to establish an encrypted tunnel across an Internet connection to the nearest PoP.  By colocating PoP and cloud IXPs in the same physical data centers, cloud resources would implicitly become part of your new optimized, corporate network. All without deploying additional software or hardware.
#### Secure
To ensure security, all traffic would only reach an “edge” after security inspection that runs as part of the cloud-native software. Security services would include next-generation firewall, secure web gateway, and advanced threat protection. By running in the PoPs, security services benefit from scalability and elasticity of the cloud, which was never available as appliances.
With the provider running the security stack, IT would be freed from its security burden. Security services would always be current without the operational overhead of appliances or investing in specialized security skills. Inspecting all enterprise traffic by one platform means IT needs only one set of security policies to protect all users. Overall, security is made simpler, and mobile users and cloud resources no longer need to remain second-class citizens.
#### Run
Without deploying tons of specialized appliances, the network would be much easier to run and manage. A single pane of glass would give the IT manager end-to-end visibility across networking and security domains without the myriads of sensors, agents, normalization tools, and more needed today for that kind of capability.
### One Architecture Many Benefits
Such an approach addresses the gamut of networking challenges facing todays enterprises. Connectivity costs would be slashed. Latency would rival those of global MPLS but with far better throughput thanks to built-in network optimization, which would be available inside — and outside — sites. Security would be pervasive, easy to maintain, and effective.
This architecture isnt just a pipe dream. Hundreds of companies across the globe today realize these benefits every day by relying on such a platform from [Cato Networks][10]. Its a secure, global, managed SD-WAN service powered by the scalability, self-service, and agility of the cloud.
### The Time is Now
WAN transformation is a rare opportunity for IT leaders to profoundly impact the business ability to do business better tomorrow and for decades to come. SD-WAN is a piece of that vision, but only a piece.  Addressing the entire network challenge (not just a part of it) to accommodate the needs you can anticipate — and the ones you cant -- will go a long way towards measuring the effectiveness of your IT leadership.
--------------------------------------------------------------------------------
via: https://www.networkworld.com/article/3430638/wan-transformation-it-s-more-than-sd-wan.html
作者:[Cato Networks][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://www.networkworld.com/author/Matt-Conran/
[b]: https://github.com/lujun9972
[1]: https://images.idgesg.net/images/article/2019/08/istock-1127447341-100807620-large.jpg
[2]: https://www.catonetworks.com/glossary-use-cases/sd-wan?utm_source=IDG&utm_campaign=IDG
[3]: https://arstechnica.com/information-technology/2019/07/facebook-cloudflare-microsoft-and-twitter-suffer-outages/
[4]: https://www.catonetworks.com/blog/the-internet-is-broken-heres-why?utm_source=IDG&utm_campaign=IDG
[5]: https://images.idgesg.net/images/article/2019/08/capture-100807619-large.jpg
[6]: https://www.topsdwanvendors.com?utm_source=IDG&utm_campaign=IDG
[7]: https://www.catonetworks.com/glossary-use-cases/secure-direct-internet-access?utm_source=IDG&utm_campaign=IDG
[8]: https://www.catonetworks.com/blog/the-pains-and-problems-of-nfv?utm_source=IDG&utm_campaign=IDG
[9]: https://www.gartner.com/en/documents/3947237/hype-cycle-for-enterprise-networking-2019
[10]: https://www.catonetworks.com?utm_source=IDG&utm_campaign=IDG

View File

@ -1,5 +1,5 @@
[#]: collector: (lujun9972)
[#]: translator: ( )
[#]: translator: (geekpi)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )

View File

@ -1,93 +0,0 @@
[#]: collector: (lujun9972)
[#]: translator: (geekpi)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (How to manipulate PDFs on Linux)
[#]: via: (https://www.networkworld.com/article/3430781/how-to-manipulate-pdfs-on-linux.html)
[#]: author: (Sandra Henry-Stocker https://www.networkworld.com/author/Sandra-Henry_Stocker/)
How to manipulate PDFs on Linux
======
The pdftk command provides many options for working with PDFs, including merging pages, encrypting files, applying watermarks, compressing files, and even repairing PDFs -- easily and on the command line.
![Toshiyuki IMAI \(CC BY-SA 2.0\)][1]
While PDFs are generally regarded as fairly stable files, theres a lot you can do with them on both Linux and other systems. This includes merging, splitting, rotating, breaking into single pages, encrypting and decrypting, applying watermarks, compressing and uncompressing, and even repairing. The **pdftk** command does all this and more.
The name “pdftk” stands for “PDF tool kit,” and the command is surprisingly easy to use and does a good job of manipulating PDFs. For example, to pull separate files into a single PDF file, you would use a command like this:
```
$ pdftk pg1.pdf pg2.pdf pg3.pdf pg4.pdf pg5.pdf cat output OneDoc.pdf
```
That OneDoc.pdf file will contain all five of the documents shown and the command will run in a matter of seconds. Note that the **cat** option directs the files to be joined together and the **output** option specifies the name of the new file.
**[ Two-Minute Linux Tips: [Learn how to master a host of Linux commands in these 2-minute video tutorials][2] ]**
You can also pull select pages from a PDF to create a separate PDF file. For example, if you wanted to create a new PDF with only pages 1, 2, 3, and 5 of the document created above, you could do this:
```
$ pdftk OneDoc.pdf cat 1-3 5 output 4pgs.pdf
```
If, on the other hand, you wanted pages 1, 3, 4, and 5, we might use this syntax instead:
```
$ pdftk OneDoc.pdf cat 1 3-end output 4pgs.pdf
```
You have the option of specifying all individual pages or using page ranges as shown in the examples above.
This next command will create a collated document from one that contains the odd pages (1, 3, etc.) and one that contains the even pages (2, 4, etc.):
```
$ pdftk A=odd.pdf B=even.pdf shuffle A B output collated.pdf
```
Notice that the **shuffle** option make this collation possible and dictates the order in which the documents are used. Note also: While the odd/even pages example might suggest otherwise, you are not restricted to using only two input files.
If you want to create an encrypted PDF that can only be opened by a recipient who knows the password, you could use a command like this one:
```
$ pdftk prep.pdf output report.pdf user_pw AsK4n0thingGeTn0thing
```
The options provide for 40 (**encrypt_40bit**) and 128 (**encrypt_128bit**) bit encryption. The 128 bit encryption is used by default.
You can also break a PDF file into individual pages using the **burst** option:
```
$ pdftk allpgs.pdf burst
$ ls -ltr *.pdf | tail -5
-rw-rw-r-- 1 shs shs 22933 Aug 8 08:18 pg_0001.pdf
-rw-rw-r-- 1 shs shs 23773 Aug 8 08:18 pg_0002.pdf
-rw-rw-r-- 1 shs shs 23260 Aug 8 08:18 pg_0003.pdf
-rw-rw-r-- 1 shs shs 23435 Aug 8 08:18 pg_0004.pdf
-rw-rw-r-- 1 shs shs 23136 Aug 8 08:18 pg_0005.pdf
```
The **pdftk** command makes pulling together, tearing apart, rebuilding and encrypting PDF files surprisingly easy. To learn more about its many options, I check out the examples page from [PDF Labs][3].
**[ Also see: [Invaluable tips and tricks for troubleshooting Linux][4] ]**
Join the Network World communities on [Facebook][5] and [LinkedIn][6] to comment on topics that are top of mind.
--------------------------------------------------------------------------------
via: https://www.networkworld.com/article/3430781/how-to-manipulate-pdfs-on-linux.html
作者:[Sandra Henry-Stocker][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://www.networkworld.com/author/Sandra-Henry_Stocker/
[b]: https://github.com/lujun9972
[1]: https://images.idgesg.net/images/article/2019/08/book-pages-100807709-large.jpg
[2]: https://www.youtube.com/playlist?list=PL7D2RMSmRO9J8OTpjFECi8DJiTQdd4hua
[3]: https://www.pdflabs.com/docs/pdftk-cli-examples/
[4]: https://www.networkworld.com/article/3242170/linux/invaluable-tips-and-tricks-for-troubleshooting-linux.html
[5]: https://www.facebook.com/NetworkWorld/
[6]: https://www.linkedin.com/company/network-world

View File

@ -0,0 +1,143 @@
[#]: collector: (lujun9972)
[#]: translator: ( )
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (A comprehensive guide to agile project management)
[#]: via: (https://opensource.com/article/19/8/guide-agile-project-management)
[#]: author: (Matt ShealyDaniel OhLeigh Griffin https://opensource.com/users/mshealyhttps://opensource.com/users/jkriegerhttps://opensource.com/users/daniel-ohhttps://opensource.com/users/mtakanehttps://opensource.com/users/ahmadnassrihttps://opensource.com/users/agagancarczykhttps://opensource.com/users/lgriffin)
A comprehensive guide to agile project management
======
Agile project management's 12 guiding principles can help your team move
faster together.
![A diagram of a branching process][1]
With a focus on continuous improvements, agile project management upends the traditional linear way of developing products and services. Increasingly, organizations are adopting agile project management because it utilizes a series of shorter development cycles to deliver features and improve continually. This management style allows for rapid development, continuous integration (CI), and continuous delivery (CD).
Agile project management allows cross-functional teams to work on chunks of projects, solving problems and moving projects forward in shorter phases. This enables them to iterate more quickly and deliver more frequent updates.
The agile methodology provides a higher level of quality improvements on an incremental basis instead of waiting to distribute finished projects. And agile project management works. For example, PWC reports [agile projects are 28% more successful][2] than traditional project methodologies.
### Adopting agile methodology
When agile methodology was introduced, it met skepticism and resistance. With today's rapid pace of innovation, it has become an accepted standard. The Project Management Institute's annual Pulse of the Profession survey finds that [71% of organizations report using an agile approach][3] in project management, whether it is a fully agile project or hybrid model.
"We cannot afford anymore to have projects taking two to five years to deliver," says Michelin's Phillippe Husser in the Pulse of the Profession report. "During this time, the initial requirements have changed."
### The 12 agile principles
While agile project management is very different from traditional project management, it doesn't have to be daunting to make the switch. Agile project management relies on [12 guiding principles][4] that can help your team move faster together.
#### 1\. Customer-first
One of the first principles for groups using agile management is that the "highest priority is to [satisfy the customer through early and continuous delivery][5]." This means that above all else, the team works to solve problems for the customer, not to build features and tools that are cool but hard to use. This strategy encourages all product decisions to be data-driven from a customer's perspective. It may mean that many team members regularly interact with end users (including with interviews) or have access to data that shows usage.
The agile methodology drastically reduces the time from project initiation to customer feedback. As customers' needs or how they interact with the product change, the team is flexible in responding to these needs to build customer-focused technology. This process creates a feedback loop for continuous improvement.
#### 2\. The only thing constant is change
While it may seem radical to change requirements, the agile methodology allows for changing requirements, even late into development.
This principle is closely tied to the first. If the end goal of the team is to serve the end user best, the team must be flexible and able to make changes based on customers' behaviors and needs. Flexibility also allows an organization to capitalize on an emerging technology or new trends and gain competitive advantage.
#### 3\. Deliver faster
Instead of annual or semi-annual product updates and patches, agile encourages regular updates when a need is identified or to improve operations. Waiting to do significant releases can bloat the technology and create unforeseen issues, no matter how much it has been tested.
Agile encourages the team to deliver working software frequently within a short time frame. Smaller, more frequent releases allow for regular updates to the technology without huge risk. If something goes out and doesn't work, it requires a slight pullback. The agile methodology also encourages automation to help push out updates continuously.
#### 4\. Build cross-functional teams
Agile methodology believes that the most well-thought-out, usable, and sellable technologies require cross-functional teams working towards a shared goal. DevOps (development and operations) and DevSecOps (development, security, and operations) teams work in concert instead of in a linear progression. This allows the business team, the developers, QA, and other essential teams to work together from start to finish.
This change in perspective means all teams have skin in the game and makes it harder to push errors or low-quality tech onto the next team. Rather than making excuses, everyone works together on the same goals.
For cross-functional teams to work, it takes involvement from the top. A third of projects [fail because of a lack of participation from senior management][6].
#### 5\. Encourage independent work
Another tenet of agile management is that individuals can stretch their job and learn new skills while working on projects. Because the teams are cross-functional, individuals are exposed to different abilities, roles, and styles. This exposure creates better-rounded workers who can attack problems from different perspectives.
Agile teams are typically self-directed. It takes the right team with a focused goal.
Agile allows managers to (per the Agile Manifesto) "build projects around motivated individuals. Give them the environment and support they need and trust them to get the job done."
#### 6\. Meet in person
While this principle may seem strange in the era of increased remote workers, agile management does encourage in-person meetings. This is because many managers believe the most efficient and effective method of conveying information is a face-to-face conversation.
For non-remote teams, this can mean having different team members sitting close together or even creating war rooms of different groups to communicate more effectively. Co-location means faster interactions. Instead of waiting for an email or call to be returned, talk to each other.
This goal can still be accomplished for remote teams. By using tools like Slack or Zoom, you can simulate in-person meetings and find the right answers quickly.
#### 7\. Go live
Organizations may have several ways to document the plan and measure success against goals. However, one of the best ways to measure a team's success in agile is via working software. Agile teams don't look at future forecasts to see how they are doing. Instead, live code is the primary measure of progress.
Planning and documentation are great, but without software that does the job, everything else is irrelevant.
#### 8\. Sustainable development
While agile development encourages fast releases, it is still vital that the team makes sustainable and scalable code. Because the first principle is to serve the customer, the team must think about creating technology and tools that can be used for the long haul.
The team should also be managed in a way that supports individuals. While long hours may be required for a short time, maintaining overall work-life balance is essential to avoid burnout.
#### 9\. Technical excellence
Agile methodology also believes that every member of the team is responsible for continuous attention to technical excellence. Even those without technical ability should QA work and ensure it is being built in a simple and accessible way. While bells and whistles may be nice, an agile methodology believes that good design enhances agility.
Additionally, code should improve with each iteration. Everyone is responsible for providing clear code or instructions throughout the process—not just at the end.
#### 10\. Simplify
Agile teams believe that simplicity is essential. There's a saying in agile circles: "maximize the amount of work not done." Eliminate and automate anything you can, and build tools that are straightforward for the end user.
#### 11\. Let teams self-organize
"The best architectures, requirements, and designs emerge from self-organizing teams," says the Agile Manifesto. While management is needed for oversight, the best agile teams figure out what needs to be done—and how it gets done—themselves.
#### 12\. Take time to reflect
At regular intervals, the best teams reflect on how to become more effective, then adjust accordingly.
Agile teams are introspective and evaluate their efficiency. When they discover a better way, they evolve.
### Agile evolves with automation
Agile management has many benefits for the team and the end user. While the basic principles (like the nature of agile) have been established, the strategy is always evolving. Agile project management is evolving now by leveraging different types of automation.
For example, IT teams are leveraging [IT process automation to manage repetitive tasks][7] that used to take significant human resources. This allows teams to work more efficiently and focus on the bigger picture rather than monitoring, managing, and maintaining the software, hardware, infrastructure, and cloud services.
The more tasks that can be handled efficiently by your process automation rules, the quicker you will be able to iterate, test, and improve.
### Getting started
Overall, agile project management presents many benefits. It provides a faster way for teams to deliver a better product with fewer bugs. It can encourage diverse teams to work together and learn from each other. It fosters better team communication, both in-person and remotely. And it can ultimately create a better experience for the end user.
Agile, however, has some drawbacks. If a team is still exploring what technology or solutions to build or doesn't have a firm grasp of the target customer, agile may not be the best methodology. Agile may also have too many requirements for very small teams and may be too flexible for extremely large teams.
Always complete due diligence to identify which management style is best for your team, and consider combining agile with other methodologies to create the best structure for you.
The open development method is better for business. Scrum is dead.
--------------------------------------------------------------------------------
via: https://opensource.com/article/19/8/guide-agile-project-management
作者:[Matt ShealyDaniel OhLeigh Griffin][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/mshealyhttps://opensource.com/users/jkriegerhttps://opensource.com/users/daniel-ohhttps://opensource.com/users/mtakanehttps://opensource.com/users/ahmadnassrihttps://opensource.com/users/agagancarczykhttps://opensource.com/users/lgriffin
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/freesoftwareway_law3.png?itok=wyze_0fV (A diagram of a branching process)
[2]: https://www.pwc.com/gx/en/actuarial-insurance-services/assets/agile-project-delivery-confidence.pdf
[3]: https://www.pmi.org/-/media/pmi/documents/public/pdf/learning/thought-leadership/pulse/pulse-of-the-profession-2017.pdf
[4]: http://agilemanifesto.org/principles.html
[5]: https://heleo.com/ericries-5-reasons-continuously-update-product/5110/
[6]: https://www.business2community.com/strategy/project-management-statistics-45-stats-you-cant-ignore-02168819
[7]: https://www.atera.com/blog/it-automation/

View File

@ -0,0 +1,245 @@
[#]: collector: (lujun9972)
[#]: translator: ( )
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Apache Hive vs. Apache HBase: Which is the query performance champion?)
[#]: via: (https://opensource.com/article/19/8/apache-hive-vs-apache-hbase)
[#]: author: (Alex Bekker https://opensource.com/users/egor14https://opensource.com/users/sachinpb)
Apache Hive vs. Apache HBase: Which is the query performance champion?
======
Let's look closely at the Apache Hive and Apache HBase to understand
which one can cope better with query performance.
![computer screen ][1]
It's super easy to get lost in the world of big data technologies. There are so many of them that it seems a day never passes without the advent of a new one. Still, such fast development is only half the trouble. The real problem is that it's difficult to understand the functionality and the intended use of the existing technologies.
To find out what technology suits their needs, IT managers often contrast them. We've also conducted an academic study to make a clear distinction between Apache Hive and Apache HBase—two important technologies that are frequently used in [Hadoop implementation projects][2].
### Data model comparison
#### Apache Hive's data model
To understand Apache Hive's data model, you should get familiar with its three main components: a table, a partition, and a bucket.
Hive's **table** doesn't differ a lot from a relational database table (the main difference is that there are no relations between the tables). Hive's tables can be managed or external. To understand the difference between these two types, let's look at the _load data_ and _drop a table_ operations. When you load data into a **managed table**, you actually move the data from Hadoop Distributed File System's (HDFS) inner data structures into the Hive directory (which is also in HDFS). And when you drop such a table, you delete the data it contains from the directory. In the case of **external tables**, Hive doesn't load the data into the Hive directory but creates a "ghost-table" that indicates where actual data is physically stored in HDFS. So, when you drop an external table, the data is not affected.
Both managed and external tables can be further broken down to **partitions**. A partition represents the rows of the table grouped together based on a **partition key**. Each partition is stored as a separate folder in the Hive directory. For instance, the table below can be partitioned based on a country, and the rows for each country will be stored together. Of course, this example is simplified. In real life, you'll deal with more than three partitions and much more than four rows in each, and partitioning will help you significantly reduce your partition key query execution time.
**Customer ID** | **Country** | **State/Province** | **City** | **Gender** | **Family status** | …
---|---|---|---|---|---|---
00001 | US | Nebraska | Beatrice | F | Single | …
00002 | Canada | Ontario | Toronto | F | Married | …
00003 | Brasil | Para | Belem | M | Married | …
00004 | Canada | Ontario | Toronto | M | Married | …
00005 | US | Nebraska | Aurora | M | Single | …
00006 | US | Arizona | Phoenix | F | Single | …
| | | | | |
00007 | US | Texas | Austin | F | Married |
… | … | | | … | … | …
You can break your data further into **buckets**, which are even easier to manage and enable faster query execution. Let's take the partition with the US data from our previous example and cluster it into buckets based on the Customer ID column. When you specify the number of buckets, Hive applies a hash function to the chosen column, which assigns a hash value to each row in the partition and then "packs" the rows into a certain number of buckets. So, if we have 10 million Customer IDs in the partition and specify the number of buckets as 50, each bucket will contain about 200,000 rows. As a result, if you need to find the data about a particular customer, Hive will directly go to the relevant bucket to find the info.
#### Apache HBase's data model
HBase also stores data in **tables**. The cells in an HBase table are organized by **row keys** and **column families*****.*** Each column family has a set of storage properties (for example, row keys encryption and data compression rules). In addition, there are **column qualifiers** to ease data management. Neither row keys nor column qualifiers have a data type assigned (they are always treated as bytes).
**Row key**
**Geography**
**Demographics**
**Customer ID**
**Country**
**State**
**City**
**Gender**
**Family status**
00001
US
Texas
Austin
F
Single
00002
Canada
Ontario
Toronto
F
Married
00003
Brasil
Para
Belem
M
Married
00004
Canada
Ontario
Toronto
M
Married
00005
US
Arizona
Phoenix
M
Single
00006
US
Nebraska
Aurora
F
Single
00007
US
Nebraska
Beatrice
F
Married
 
 
 
 
Every **cell** has a timestamp, or, in other words, bears the mark of when it was created. This info is crucial during the read operations, as it allows identifying the most recent (and therefore more up-to-date) data versions. You can specify a timestamp during a write operation, otherwise, HBase gives the cell a current timestamp automatically.
Data in a table is **lexicographically sorted based on row keys**, and to store closely related data together, a developer needs to design a good algorithm for row key composition.
_As to_ **partitioning**, HBase does it automatically based on the row keys. Still, you can manage the process by changing the start and end row keys for each partition.
#### Key takeaways on data models
1. Both Hive and HBase are capable of organizing data in a way to enable quick access to the required data and reduce query execution time (though their approach to partitioning is different).
2. Both Hive and HBase act as data management agents. When somebody says that Hive or HBase stores data, it really means the data is stored in a data store (usually in HDFS). This means the success of your Hadoop endeavor goes beyond either/or technology choices and strongly depends on other [important factors][3], such as calculating the required cluster size correctly and integrating all the architectural components seamlessly.
### Query performance
#### Hive as an analytical query engine
Hive is specifically designed to enable data analytics. To successfully perform this task, it uses its dedicated **Hive Query Language** (HiveQL), which is very similar to analytics-tuned SQL.
Initially, Hive converted HiveQL queries into Hadoop MapReduce jobs, simplifying the lives of developers who could bypass more complicated MapReduce code. Running queries in Hive usually took some time, since Hive scanned all the available data sets, if not specified otherwise. It was possible to limit the volume of scanned data by specifying the partitions and buckets that Hive had to address. Anyway, that was batch processing. Nowadays, Apache Hive is also able to convert queries into Apache Tez or Apache Spark jobs.
The earliest versions of Hive did not provide **record-level updates, inserts, and deletes**, which was one of the most serious limitations in Hive. This functionality appeared only in version 0.14.0 (though with some [constraints][4]: for example, your table's file format should be [ORC][5]).
#### HBase as a data manager that supports queries
Being a data manager, HBase alone is not intended for analytical queries. It doesn't have a dedicated query language. To run CRUD (create, read, update, and delete) and search queries, it has a JRuby-based shell, which offers **simple data manipulation possibilities**, such as Get, Put, and Scan. For the first two operations, you should specify the row key, while scans run over a whole range of rows.
HBase's primary purpose is to offer a random data input/output for HDFS. At the same time, one can surely say that HBase contributes to fast analytics by enabling consistent reads. This is possible due to the fact that HBase writes data to only one server, which doesn't require comparing multiple data versions from different nodes. Besides, HBase **handles append operations very well.** It also enables **updates and deletes**, but copes with these two not so perfectly.
#### Indexing
In Hive 3.0.0, indexing was removed. Prior to that, it was possible to create indexes on columns, though the advantages of faster queries should have been weighted against the cost of indexing during write operations and extra space for storing the indexes. Anyway, Hive's data model, with its ability to group data into buckets (which can be created for any column, not only for the keyed one), offers an approach similar to the one that indexing provides.
HBase enables multi-layered indexing. But again, you have to think about the trade-off between gaining read query response vs. slower writes and the costs associated with storing indexes.
#### Key takeaways on query performance
1. Running analytical queries is exactly the task for Hive. HBase's initial task is to ingest data as well as run CRUD and search queries.
2. While HBase handles row-level updates, deletes, and inserts well, the Hive community is working to eliminate this stumbling block.
### To sum it up
There are many similarities between Hive and HBase. Both are data management agents, and both are strongly interconnected with HDFS. The main difference between these two is that HBase is tailored to perform CRUD and search queries while Hive does analytical ones. These two technologies complement each other and are frequently used together in Hadoop consulting projects so businesses can make the most of both applications' strengths.
Introduction to Apache Hadoop, an open source software framework for storage and large scale...
--------------------------------------------------------------------------------
via: https://opensource.com/article/19/8/apache-hive-vs-apache-hbase
作者:[Alex Bekker][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/egor14https://opensource.com/users/sachinpb
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/features_solutions_command_data.png?itok=4_VQN3RK (computer screen )
[2]: https://www.scnsoft.com/services/big-data/hadoop
[3]: https://www.scnsoft.com/blog/hadoop-implementation-milestones
[4]: http://community.cloudera.com/t5/Batch-SQL-Apache-Hive/Update-and-Delete-are-not-working-in-Hive/td-p/57358/page/3
[5]: https://orc.apache.org/

View File

@ -0,0 +1,238 @@
[#]: collector: (lujun9972)
[#]: translator: ( )
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Building a non-breaking breakpoint for Python debugging)
[#]: via: (https://opensource.com/article/19/8/debug-python)
[#]: author: (Liran Haimovitch https://opensource.com/users/liranhaimovitch)
Building a non-breaking breakpoint for Python debugging
======
Have you ever wondered how to speed up a debugger? Here are some lessons
learned while building one for Python.
![Real python in the graphic jungle][1]
This is the story of how our team at [Rookout][2] built non-breaking breakpoints for Python and some of the lessons we learned along the way. I'll be presenting all about the nuts and bolts of debugging in Python at [PyBay 2019][3] in San Francisco this month. Let's dig in.
### The heart of Python debugging: sys.set_trace
There are many Python debuggers out there. Some of the more popular include:
* **pdb**, part of the Python standard library
* **PyDev**, the debugger behind the Eclipse and PyCharm IDEs
* **ipdb**, the IPython debugger
Despite the range of choices, almost every Python debugger is based on just one function: **sys.set_trace**. And let me tell you, **[sys.settrace][4]** might just be the most complex function in the Python standard library.
![set_trace Python 2 docs page][5]
In simpler terms, **settrace** registers a trace function for the interpreter, which may be called in any of the following cases:
* Function call
* Line execution
* Function return
* Exception raised
A simple trace function might look like this:
```
def simple_tracer(frame, event, arg):
  co = frame.f_code
  func_name = co.co_name
  line_no = frame.f_lineno
  print("{e} {f} {l}".format(
e=event, f=func_name, l=line_no))
  return simple_tracer
```
When looking at this function, the first things that come to mind are its arguments and return values. The trace function arguments are:
* **frame** object, which is the full state of the interpreter at the point of the function's execution
* **event** string, which can be **call**, **line**, **return**, or **exception**
* **arg** object, which is optional and depends on the event type
The trace function returns itself because the interpreter keeps track of two kinds of trace functions:
* **Global trace function (per thread):** This trace function is set for the current thread by **sys.settrace** and is invoked whenever a new **frame** is created by the interpreter (essentially on every function call). While there's no documented way to set the trace function for a different thread, you can call **threading.settrace** to set the trace function for all newly created **threading** module threads.
* **Local trace function (per frame):** This trace function is set by the interpreter to the value returned by the global trace function upon frame creation. There's no documented way to set the local trace function once the frame has been created.
This mechanism is designed to allow the debugger to have more granular control over which frames are traced to reduce performance impact.
### Building our debugger in three easy steps (or so we thought)
With all that background, writing your own debugger using a custom trace function looks like a daunting task. Luckily, **pdb**, the standard Python debugger, is built on top of **Bdb**, a base class for building debuggers.
A naive breakpoints debugger based on **Bdb** might look like this:
```
import bdb
import inspect
class Debugger(bdb.Bdb):
  def __init__(self):
      Bdb.__init__(self)
      self.breakpoints = dict()
      self.set_trace()
def set_breakpoint(self, filename, lineno, method):
  self.set_break(filename, lineno)
  try :
      self.breakpoints[(filename, lineno)].add(method)
  except KeyError:
      self.breakpoints[(filename, lineno)] = [method]
def user_line(self, frame):
  if not self.break_here(frame):
      return
  # Get filename and lineno from frame
  (filename, lineno, _, _, _) = inspect.getframeinfo(frame)
  methods = self.breakpoints[(filename, lineno)]
  for method in methods:
      method(frame)
```
All this does is:
1. Inherits from **Bdb** and write a simple constructor initializing the base class and tracing.
2. Adds a **set_breakpoint** method that uses **Bdb** to set the breakpoint and keeps track of our breakpoints.
3. Overrides the **user_line** method that is called by **Bdb** on certain user lines. The function makes sure it is being called for a breakpoint, gets the source location, and invokes the registered breakpoints
### How well did the simple Bdb debugger work?
Rookout is about bringing a debugger-like user experience to production-grade performance and use cases. So, how well did our naive breakpoint debugger perform?
To test it and measure the global performance overhead, we wrote two simple test methods and executed each of them 16 million times under multiple scenarios. Keep in mind that no breakpoint was executed in any of the cases.
```
def empty_method():
   pass
def simple_method():
   a = 1
   b = 2
   c = 3
   d = 4
   e = 5
   f = 6
   g = 7
   h = 8
   i = 9
   j = 10
```
Using the debugger takes a shocking amount of time to complete. The bad results make it clear that our naive **Bdb** debugger is not yet production-ready.
![First Bdb debugger results][6]
### Optimizing the debugger
There are three main ways to reduce debugger overhead:
1. **Limit local tracing as much as possible:** Local tracing is very costly compared to global tracing due to the much larger number of events per line of code.
2. **Optimize "call" events and return control to the interpreter faster:** The main work in **call** events is deciding whether or not to trace.
3. **Optimize "line" events and return control to the interpreter faster:** The main work in **line** events is deciding whether or not we hit a breakpoint.
So we forked **Bdb**, reduced the feature set, simplified the code, optimized for hot code paths, and got impressive results. However, we were still not satisfied. So, we took another stab at it, migrated and optimized our code to **.pyx**, and compiled it using [Cython][7]. The final results (as you can see below) were still not good enough. So, we ended up diving into CPython's source code and realizing we could not make tracing fast enough for production use.
![Second Bdb debugger results][8]
### Rejecting Bdb in favor of bytecode manipulation
After our initial disappointment from the trial-and-error cycles of standard debugging methods, we decided to look into a less obvious option: bytecode manipulation.
The Python interpreter works in two main stages:
1. **Compiling Python source code into Python bytecode:** This unreadable (for humans) format is optimized for efficient execution and is often cached in those **.pyc** files we have all come to love.
2. **Iterating through the bytecode in the _interpreter loop_:** This executes one instruction at a time.
This is the pattern we chose: use **bytecode manipulation** to set **non-breaking breakpoints** with no global overhead. This is done by finding the bytecode in memory that represents the source line we are interested in and inserting a function call just before the relevant instruction. This way, the interpreter does not have to do any extra work to support our breakpoints.
This approach is not magic. Here's a quick example.
We start with a very simple function:
```
def multiply(a, b):
   result = a * b
   return result
```
In documentation hidden in the **[inspect][9]** module (which has several useful utilities), we learn we can get the function's bytecode by accessing **multiply.func_code.co_code**:
```
`'|\x00\x00|\x01\x00\x14}\x02\x00|\x02\x00S'`
```
This unreadable string can be improved using the **[dis][10]** module in the Python standard library. By calling **dis.dis(multiply.func_code.co_code)**, we get:
```
  4          0 LOAD_FAST               0 (a)
             3 LOAD_FAST               1 (b)
             6 BINARY_MULTIPLY    
             7 STORE_FAST              2 (result)
  5         10 LOAD_FAST               2 (result)
            13 RETURN_VALUE      
```
This gets us closer to understanding what happens behind the scenes of debugging but not to a straightforward solution. Unfortunately, Python does not offer a method for changing a function's bytecode from within the interpreter. You can overwrite the function object, but that's not good enough for the majority of real-world debugging scenarios. You have to go about it in a roundabout way using a native extension.
### Conclusion
When building a new tool, you invariably end up learning a lot about how stuff works. It also makes you think out of the box and keep your mind open to unexpected solutions.
Working on non-breaking breakpoints for Rookout has taught me a lot about compilers, debuggers, server frameworks, concurrency models, and much much more. If you are interested in learning more about bytecode manipulation, Google's open source **[cloud-debug-python][11]** has tools for editing bytecode.
* * *
_Liran Haimovitch will present "[Understanding Pythons Debugging Internals][12]" at [PyBay][3], which will be held August 17-18 in San Francisco. Use code [OpenSource35][13] for a discount when you purchase your ticket to let them know you found out about the event from our community._
--------------------------------------------------------------------------------
via: https://opensource.com/article/19/8/debug-python
作者:[Liran Haimovitch][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/liranhaimovitch
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/python_jungle_lead.jpeg?itok=pFKKEvT- (Real python in the graphic jungle)
[2]: https://rookout.com/
[3]: https://pybay.com/
[4]: https://docs.python.org/3/library/sys.html#sys.settrace
[5]: https://opensource.com/sites/default/files/uploads/python2docs.png (set_trace Python 2 docs page)
[6]: https://opensource.com/sites/default/files/uploads/debuggerresults1.png (First Bdb debugger results)
[7]: https://cython.org/
[8]: https://opensource.com/sites/default/files/uploads/debuggerresults2.png (Second Bdb debugger results)
[9]: https://docs.python.org/2/library/inspect.html
[10]: https://docs.python.org/2/library/dis.html
[11]: https://github.com/GoogleCloudPlatform/cloud-debug-python
[12]: https://pybay.com/speaker/liran-haimovitch/
[13]: https://ti.to/sf-python/pybay2019/discount/OpenSource35

View File

@ -0,0 +1,86 @@
[#]: collector: (lujun9972)
[#]: translator: ( )
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (To equip tomorrow's cybersecurity experts, we'll need an open approach)
[#]: via: (https://opensource.com/open-organization/19/8/open-cybersecurity-education)
[#]: author: (rjoyce https://opensource.com/users/rjoycehttps://opensource.com/users/hypercoyotehttps://opensource.com/users/debbryant)
To equip tomorrow's cybersecurity experts, we'll need an open approach
======
An open approach to training the next generation of cybersecurity
experts can fully equip them to combat a constantly shifting threat
landscape.
![Locks and a fingerprint][1]
Today's world—marked by an increase of Internet-connected devices, digital assets, and information systems infrastructure—demands more cybersecurity professionals. Cybersecurity is the practice of defending these devices, assets, and systems against malicious cyberattacks from both internal and external entities. Often these cyberattacks are linked to cybercrimes, or crimes committed using a computer to generate profit or to affect the integrity, availability, and confidentiality of the data or system. In 2016, cybercrimes [cost the global economy more than $450 billion][2].
Developing a robust cybersecurity workforce is therefore essential for mitigating the effects of cybercrime on the global economy. The United States Bureau of Labor Statistics has predicted [a shortage of 1.8 million cybersecurity professionals by the year 2022][3]. The United States has already developed a working group, the National Initiative for Cybersecurity Education (NICE), to promote cybersecurity education. Educators play a critical role helping promote cybersecurity as early as possible in academic organizations. And they should take an open approach to doing it.
It's critical for students to not only become acquainted with the advantages of open source software but also to develop strong skills working openly, since open source software is not only common in the IT industry in general, but is specifically necessary in the field of cybersecurity. With this approach, students can learn within the safety and guidance of the classroom while also naturally acquiring research and troubleshooting skills by facing challenges that are presented or arise during exercises.
In this article, we'll explain how experiencing these challenges in the classroom environment is imperative for preparing students for the industry and equipping them to face the unforgiving challenges that await them in the IT industry—especially in the rapidly evolving cybersecurity field.
### Developing an open approach to cybersecurity education
Open source software, open source communities, and open source principles have been pivotal in the adoption of computer automation that is so common today. For instance, most smart devices are running a version of the Linux kernel. In the cybersecurity field, it's common to find Linux at the heart of most operating systems that are running on security appliances. But going beyond the operating system, Ansible has taken the management scene by storm, allowing for simplified automation of management tasks that even professionals without programming or scripting experience can quickly grasp and begin to implement. In addition to the benefits of automation, a variety of open source applications provide seemingly limitless capabilities for computer users—such as the ability to create video, music, games, or graphic designs on par with proprietary software. Open source software has often been the creative spark that has enabled countless individuals to pursue goals that would have otherwise been unobtainable.
Open source has had the same democratizing effect for cybersecurity professionals. Like other open source projects, open source cybersecurity tools receive extensive community support, so they're often some of the most-used security tools in existence today. Such tools include Nmap, OpenVAS, OSSEC, Metasploit Framework, Wireshark, and the Kali Linux distribution, to name a few. These open source tools are an invaluable asset for educators, as they provide an opportunity for students to use the same cybersecurity tools currently being used in industry—but within a safe learning environment, a factor that is critical for student growth in the field.
Open source software has often been the creative spark that has enabled countless individuals to pursue goals that would have otherwise been unobtainable.
In Murray State University's Telecommunications Systems Management (TSM) program, we're developing curricula and resources aimed at getting students excited about cybersecurity and motivated to pursue it. But students often enter the program with little or no understanding of open source principles or software, so bringing participants up to speed has been one of our biggest challenges. That's why we've partnered with [Red Hat Academy][4] to supplement our materials and instill fundamental Linux skills and knowledge into our students. This foundation not only prepares students to use the open source security tools that are based on Linux operating systems but also equips them to experiment with a wider variety of Linux-based open source cybersecurity tools, giving them valuable, hands-on experience. And since these tools are freely available, they can continue practicing their skills outside the classroom.
### Equipping students for a collaborative industry
As we've said, open source software's ubiquity and ample community support makes it critical to the field of cybersecurity. In the TSM program, our courses incorporate open tools and open practices to simulate the environments students should expect to find if they choose to enter the cybersecurity industry. By creating this type of learning experience in the classroom—a place where instructors can offer immediate guidance and the stakes are low—we're able to help students can gain the critical thinking skills needed for the variety of challenges they'll encounter in the field.
Chief among these, for example, are the skills associated with seeking, assessing, understanding resources from cybersecurity communities. In our courses, we emphasize the process of researching community forums and reading software documentation. Because no one could ever hope to prepare students for every situation they might encounter in the field, we help students _train themselves_ how to use the tools at their disposal to resolve different situations that may arise. Because open source cybersecurity tools often give rise to engaged and supportive communities, students have the opportunity to develop troubleshooting skills when they encounter challenges by discovering solutions in conversation with people outside the classroom. Developing the ability to quickly and efficiently research problems and solutions is critical for a cybersecurity student, since technology (and the threat landscape) is always evolving.
### A more authentic operating system experience
Because no one could ever hope to prepare students for every situation they might encounter in the field, we help students train themselves how to use the tools at their disposal to resolve different situations that may arise.
Most operating systems courses take a narrow approach focused on proprietary software, which is an injustice to students as it denies them access to the diversity of the operating systems found in the IT industry. For instance, as companies are moving their services to the cloud, they are increasingly running on open source, Linux-based operating systems. Additionally, since open source software enables developers to repackage the software and customize distributions, many are adopting these varying distributions of Linux simply because they are a better fit for a particular application. Still others are moving their servers from proprietary platforms to Linux due to the attraction of the accountability that comes with open source software—especially in light of frustrations that occur when proprietary vendors push updates that cause major issues in their infrastructure.
In the TSM courses, our students gain a strong understanding of foundational Linux concepts. In particular, the curricula from Red Hat Academy gives students granular experience with many of the foundational commands, and it allows them to gain an understanding of a popular open source system design. Linux has a well-developed community of other users, developers, and tinkerers that provide an excellent forum for students to engage other open source users for help. Having students develop a strong foundational knowledge in Linux is critical as they progress through the TSM program. As students work through their courses, they naturally develop their knowledge and skills, and by obtaining this hand-on experience they also gain a foundation that prepares the student for a variety of careers—becoming traditional security analysts, for example, or pursuing careers in penetration testing using Kali Linux. No matter their path, having a strong Linux background is essential for students.
### Embracing community-driven development
One of the major frustrations in the IT field is being forced to use tools that simply do not work or quickly become unusable. Often, software purchased to accomplish some particular task will quickly become obsolete as the vendor offers "upgrades" and "add-ons" to accommodate the changing needs of their customer—at a price. This experience isn't limited to IT experts; end users also experience this frustration. Driving this practice this is, naturally, a desire to maintain long-term profits, as companies must continue to sell software to survive or must lock their users into subscription models.
The fact that much of the open source software in use today is provided free of charge is enough to draw industry experts to use it. However, open source software is more than just freeware. Because the users of those tools have formed such large communities, they receive proportional support from their communities as well. It's not unusual to see small projects grow into full software suites as users submit feedback to community driven development. This type of feedback often creates products that are superior to their paid counterparts, which do not have such a direct line into the community they seek to serve. This is absolutely true in the case of cybersecurity tools, where the majority of the most popular tools are all open source, community-driven projects. In the TSM program, students are well-versed in tools such as these, thanks to the availability and free distribution model that open source software affords. The result is that through hands-on use, students gain a firm understanding of how to utilize these types of tools.
### Future proofing
Open source software provides students, who come from a variety of socio-economic backgrounds, with the opportunity to expand their experience without needing to be employed in a particular field.
Staying relevant in the IT industry is a constant battle, especially when dealing with the many products and solutions that are always seeking to gain market share. This battle extends as well to the "soldiers on the ground," who may find keeping a diversified toolset difficult when many of the solutions are kept out of their hands due to a price ceiling.
Open source software provides students, who come from a variety of socio-economic backgrounds, with the opportunity to expand their experience without needing to be employed in a particular field, as the software is readily available to them through open source distribution channels. Similarly, graduates who find jobs in one particular segment of the market still have the opportunity to train their skills in _other_ areas in which they may be interested, thanks to the breadth of open source software commonly used in the IT industry.
As we train these students how to train themselves, expose them to the variety of tools at their disposal, and educate them on how widely used these tools are, the students are not only equipped to enter the workforce, but are also empowered to stay ahead of the game as well.
_(This article is part of the_ [Open Organization Guide for Educators][5] _project.)_
Working on cybersecurity and looking for support for your project? The Homeland Open Security...
--------------------------------------------------------------------------------
via: https://opensource.com/open-organization/19/8/open-cybersecurity-education
作者:[rjoyce][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/rjoycehttps://opensource.com/users/hypercoyotehttps://opensource.com/users/debbryant
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/security_privacy_lock.png?itok=ZWjrpFzx (Locks and a fingerprint)
[2]: http://www.hiscox.com/cyber-readiness-report.pdf
[3]: https://iamcybersafe.org/wpcontent/uploads/2017/06/Europe-GISWS-Report.pdf
[4]: https://www.redhat.com/en/services/training/red-hat-academy
[5]: https://github.com/open-organization-ambassadors/open-org-educators-guide

View File

@ -0,0 +1,105 @@
[#]: collector: (lujun9972)
[#]: translator: ( )
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (How to Reinstall Ubuntu in Dual Boot or Single Boot Mode)
[#]: via: (https://itsfoss.com/reinstall-ubuntu/)
[#]: author: (Abhishek Prakash https://itsfoss.com/author/abhishek/)
How to Reinstall Ubuntu in Dual Boot or Single Boot Mode
======
If you have messed up your Ubuntu system and after trying numerous ways to fix it, you finally give up and take the easy way out: you reinstall Ubuntu.
We have all been in a situation when reinstalling Linux seems a better idea than try to troubleshoot and fix the issue for good. Troubleshooting a Linux system teaches you a lot but you cannot always afford to spend more time fixing a broken system.
There is no Windows like recovery drive system in Ubuntu as far as I know. So, the question then arises: how to reinstall Ubuntu? Let me show you how can you reinstall Ubuntu.
Warning!
Playing with disk partitions is always a risky task. I strongly recommend to make a backup of your data on an external disk.
### How to reinstall Ubuntu Linux
![][1]
Here are the steps to follow for reinstalling Ubuntu.
#### Step 1: Create a live USB
First, download Ubuntu from its website. You can download [whichever Ubuntu version][2] you want to use.
[Download Ubuntu][3]
Once you have got the ISO image, its time to create a live USB from it. If your Ubuntu system is still accessible, you can create a live disk using the startup disk creator tool provided by Ubuntu.
If you cannot access your Ubuntu system, youll have to use another system. You can refer to this article to learn [how to create live USB of Ubuntu in Windows][4].
#### Step 2: Reinstall Ubuntu
Once you have got the live USB of Ubuntu, plugin the USB. Reboot your system. At boot time, press F2/10/F12 key to go into the BIOS settings and make sure that you have set Boot from Removable Devices/USB option at the top. Save and exit BIOS. This will allow you to boot into live USB.
Once you are in the live USB, choose to install Ubuntu. Youll get the usual option for choosing your language and keyboard layout. Youll also get the option to download updates etc.
![Go ahead with regular installation option][5]
The important steps comes now. You should see an “Installation Type” screen. What you see on your screen here depends heavily on how Ubuntu sees the disk partitioning and installed operating systems on your system.
[][6]
Suggested read  How to Update Ubuntu Linux [Beginner's Tip]
Be very careful in reading the options and its details at this step. Pay attention to what each options says. The screen options may look different in different systems.
![Reinstall Ubuntu option in dual boot mode][7]
In my case, it finds that I have Ubuntu 18.04.2 and Windows installed on my system and it gives me a few options.
The first option here is to erase Ubuntu 18.04.2 and reinstall it. It tells me that it will delete my personal data but it says nothing about deleting all the operating systems (i.e. Windows).
If you are super lucky or in single boot mode, you may see an option where you can see a “Reinstall Ubuntu”. This option will keep your existing data and even tries to keep the installed software. If you see this option, you should go for it it.
Attention for Dual Boot System
If you are dual booting Ubuntu and Windows and during reinstall, your Ubuntu system doesnt see Windows, you must go for Something else option and install Ubuntu from there. I have described the [process of reinstalling Linux in dual boot in this tutorial][8].
For me, there was no reinstall and keep the data option so I went for “Erase Ubuntu and reinstall”option. This will install Ubuntu afresh even if it is in dual boot mode with Windows.
The reinstalling part is why I recommend using separate partitions for root and home. With that, you can keep your data in home partition safe even if you reinstall Linux. I have already demonstrated it in this video:
Once you have chosen the reinstall Ubuntu option, the rest of the process is just clicking next. Select your location and when asked, create your user account.
![Just go on with the installation options][9]
Once the procedure finishes, youll have your Ubuntu reinstalled afresh.
In this tutorial, I have assumed that you know things because you already has Ubuntu installed before. If you need clarification at any step, please feel free to ask in the comment section.
[][10]
Suggested read  How To Fix No Wireless Network In Ubuntu
--------------------------------------------------------------------------------
via: https://itsfoss.com/reinstall-ubuntu/
作者:[Abhishek Prakash][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://itsfoss.com/author/abhishek/
[b]: https://github.com/lujun9972
[1]: https://i0.wp.com/itsfoss.com/wp-content/uploads/2019/08/Reinstall-Ubuntu.png?resize=800%2C450&ssl=1
[2]: https://itsfoss.com/which-ubuntu-install/
[3]: https://ubuntu.com/download/desktop
[4]: https://itsfoss.com/create-live-usb-of-ubuntu-in-windows/
[5]: https://i0.wp.com/itsfoss.com/wp-content/uploads/2019/08/reinstall-ubuntu-1.jpg?resize=800%2C473&ssl=1
[6]: https://itsfoss.com/update-ubuntu/
[7]: https://i1.wp.com/itsfoss.com/wp-content/uploads/2019/08/reinstall-ubuntu-dual-boot.jpg?ssl=1
[8]: https://itsfoss.com/replace-linux-from-dual-boot/
[9]: https://i1.wp.com/itsfoss.com/wp-content/uploads/2019/08/reinstall-ubuntu-3.jpg?ssl=1
[10]: https://itsfoss.com/fix-no-wireless-network-ubuntu/

View File

@ -0,0 +1,87 @@
[#]: collector: (lujun9972)
[#]: translator: (geekpi)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (How to manipulate PDFs on Linux)
[#]: via: (https://www.networkworld.com/article/3430781/how-to-manipulate-pdfs-on-linux.html)
[#]: author: (Sandra Henry-Stocker https://www.networkworld.com/author/Sandra-Henry_Stocker/)
如何在 Linux 上操作 PDF
======
pdftk 命令提供了许多命令行处理 PDF 的操作,包括合并页面、加密文件、添加水印、压缩文件,甚至还有修复 PDF 。
![Toshiyuki IMAI \(CC BY-SA 2.0\)][1]
虽然 PDF 通常被认为是相当稳定的文件,但在 Linux 和其他系统上你可以做很多处理。包括合并、拆分、旋转、拆分单页、加密和解密、添加水印、压缩和解压缩,甚至还有修复。 **pdftk** 命令能执行所有甚至更多操作。
“pdftk” 代表 “PDF 工具包”PDF tool kit这个命令非常易于使用并且可以很好地操作 PDF。例如要将独立的文件合并成一个文件你可以使用以下命令
```
$ pdftk pg1.pdf pg2.pdf pg3.pdf pg4.pdf pg5.pdf cat output OneDoc.pdf
```
OneDoc.pdf 将包含上面显示的所有五个文档,命令将在几秒钟内运行完毕。请注意,**cat** 选项表示将文件连接在一起,**output** 选项指定新文件的名称。
你还可以从 PDF 中提取选定页面来创建单独的 PDF 文件。例如,如果要创建仅包含上面创建的文档的第 1、2、3 和 5 页的新 PDF那么可以执行以下操作
```
$ pdftk OneDoc.pdf cat 1-3 5 output 4pgs.pdf
```
另外,如果你想要第 1、3、4 和 5 页,我们可以使用以下命令:
```
$ pdftk OneDoc.pdf cat 1 3-end output 4pgs.pdf
```
你可以选择单独页面或者页面范围,如上例所示。
下一个命令将创建一个包含奇数页1、3等和一个包含偶数页2、4等的整合文档
```
$ pdftk A=odd.pdf B=even.pdf shuffle A B output collated.pdf
```
请注意,**shuffle** 选项使得能够完成整合,并指示文档的使用顺序。另请注意:虽然上面建议用的是奇数/偶数页,但你不限于仅使用两个文件。
如果要创建只能由知道密码的收件人打开的加密 PDF可以使用如下命令
```
$ pdftk prep.pdf output report.pdf user_pw AsK4n0thingGeTn0thing
```
选项提供 40**encrypt_40bit**)和 128**encrypt_128bit**)位加密。默认情况下使用 128 位加密。
你还可以使用 **burst** 选项将 PDF 文件分成单个页面:
```
$ pdftk allpgs.pdf burst
$ ls -ltr *.pdf | tail -5
-rw-rw-r-- 1 shs shs 22933 Aug 8 08:18 pg_0001.pdf
-rw-rw-r-- 1 shs shs 23773 Aug 8 08:18 pg_0002.pdf
-rw-rw-r-- 1 shs shs 23260 Aug 8 08:18 pg_0003.pdf
-rw-rw-r-- 1 shs shs 23435 Aug 8 08:18 pg_0004.pdf
-rw-rw-r-- 1 shs shs 23136 Aug 8 08:18 pg_0005.pdf
```
**pdftk** 命令使得合并、拆分、重建、加密 PDF 文件非常容易。要了解更多选项,请查看 [PDF 实验室][3]中的示例页面
在 [Facebook][5] 和 [LinkedIn][6] 上加入 Network World 社区,发表你对热门主题的评论。
--------------------------------------------------------------------------------
via: https://www.networkworld.com/article/3430781/how-to-manipulate-pdfs-on-linux.html
作者:[Sandra Henry-Stocker][a]
选题:[lujun9972][b]
译者:[geekpi](https://github.com/geekpi)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://www.networkworld.com/author/Sandra-Henry_Stocker/
[b]: https://github.com/lujun9972
[1]: https://images.idgesg.net/images/article/2019/08/book-pages-100807709-large.jpg
[3]: https://www.pdflabs.com/docs/pdftk-cli-examples/
[5]: https://www.facebook.com/NetworkWorld/
[6]: https://www.linkedin.com/company/network-world