From fd942107e16fd7fade468c39e233c160eb302c31 Mon Sep 17 00:00:00 2001 From: darksun Date: Mon, 25 Mar 2019 12:13:00 +0800 Subject: [PATCH 1/6] =?UTF-8?q?=E9=80=89=E9=A2=98:=2020190322=20How=20to?= =?UTF-8?q?=20Install=20OpenLDAP=20on=20Ubuntu=20Server=2018.04=20sources/?= =?UTF-8?q?tech/20190322=20How=20to=20Install=20OpenLDAP=20on=20Ubuntu=20S?= =?UTF-8?q?erver=2018.04.md?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...Install OpenLDAP on Ubuntu Server 18.04.md | 205 ++++++++++++++++++ 1 file changed, 205 insertions(+) create mode 100644 sources/tech/20190322 How to Install OpenLDAP on Ubuntu Server 18.04.md diff --git a/sources/tech/20190322 How to Install OpenLDAP on Ubuntu Server 18.04.md b/sources/tech/20190322 How to Install OpenLDAP on Ubuntu Server 18.04.md new file mode 100644 index 0000000000..a4325fe74b --- /dev/null +++ b/sources/tech/20190322 How to Install OpenLDAP on Ubuntu Server 18.04.md @@ -0,0 +1,205 @@ +[#]: collector: (lujun9972) +[#]: translator: ( ) +[#]: reviewer: ( ) +[#]: publisher: ( ) +[#]: url: ( ) +[#]: subject: (How to Install OpenLDAP on Ubuntu Server 18.04) +[#]: via: (https://www.linux.com/blog/2019/3/how-install-openldap-ubuntu-server-1804) +[#]: author: (Jack Wallen https://www.linux.com/users/jlwallen) + +How to Install OpenLDAP on Ubuntu Server 18.04 +====== + +![OpenLDAP][1] + +In part one of this short tutorial series, Jack Wallen explains how to install OpenLDAP. + +[Creative Commons Zero][2] + +The Lightweight Directory Access Protocol (LDAP) allows for the querying and modification of an X.500-based directory service. In other words, LDAP is used over a Local Area Network (LAN) to manage and access a distributed directory service. LDAPs primary purpose is to provide a set of records in a hierarchical structure. What can you do with those records? The best use-case is for user validation/authentication against desktops. If both server and client are set up properly, you can have all your Linux desktops authenticating against your LDAP server. This makes for a great single point of entry so that you can better manage (and control) user accounts. + +The most popular iteration of LDAP for Linux is [OpenLDAP][3]. OpenLDAP is a free, open-source implementation of the Lightweight Directory Access Protocol, and makes it incredibly easy to get your LDAP server up and running. + +In this three-part series, I’ll be walking you through the steps of: + + 1. Installing OpenLDAP server. + + 2. Installing the web-based LDAP Account Manager. + + 3. Configuring Linux desktops, such that they can communicate with your LDAP server. + + + + +In the end, all of your Linux desktop machines (that have been configured properly) will be able to authenticate against a centralized location, which means you (as the administrator) have much more control over the management of users on your network. + +In this first piece, I’ll be demonstrating the installation and configuration of OpenLDAP on Ubuntu Server 18.04. All you will need to make this work is a running instance of Ubuntu Server 18.04 and a user account with sudo privileges. +Let’s get to work. + +### Update/Upgrade + +The first thing you’ll want to do is update and upgrade your server. Do note, if the kernel gets updated, the server will need to be rebooted (unless you have Live Patch, or a similar service running). Because of this, run the update/upgrade at a time when the server can be rebooted. +To update and upgrade Ubuntu, log into your server and run the following commands: + +``` +sudo apt-get update + +sudo apt-get upgrade -y +``` + +When the upgrade completes, reboot the server (if necessary), and get ready to install and configure OpenLDAP. + +### Installing OpenLDAP + +Since we’ll be using OpenLDAP as our LDAP server software, it can be installed from the standard repository. To install the necessary pieces, log into your Ubuntu Server and issue the following command: + +### sudo apt-get instal slapd ldap-utils -y + +During the installation, you’ll be first asked to create an administrator password for the LDAP directory. Type and verify that password (Figure 1). + +![password][4] + +Figure 1: Creating an administrator password for LDAP. + +[Used with permission][5] + +Configuring LDAP + +With the installation of the components complete, it’s time to configure LDAP. Fortunately, there’s a handy tool we can use to make this happen. From the terminal window, issue the command: + +``` +sudo dpkg-reconfigure slapd +``` + +In the first window, hit Enter to select No and continue on. In the second window of the configuration tool (Figure 2), you must type the DNS domain name for your server. This will serve as the base DN (the point from where a server will search for users) for your LDAP directory. In my example, I’ve used example.com (you’ll want to change this to fit your needs). + +![domain name][6] + +Figure 2: Configuring the domain name for LDAP. + +[Used with permission][5] + +In the next window, type your Organizational name (ie the name of your company or department). You will then be prompted to (once again) create an administrator password (you can use the same one as you did during the installation). Once you’ve taken care of that, you’ll be asked the following questions: + + * Database backend to use - select **MDB**. + + * Do you want the database to be removed with slapd is purged? - Select **No.** + + * Move old database? - Select **Yes.** + + + + +OpenLDAP is now ready for data. + +### Adding Initial Data + +Now that OpenLDAP is installed and running, it’s time to populate the directory with a bit of initial data. In the second piece of this series, we’ll be installing a web-based GUI that makes it much easier to handle this task, but it’s always good to know how to add data the manual way. + +One of the best ways to add data to the LDAP directory is via text file, which can then be imported in with the __ldapadd__ command. Create a new file with the command: + +``` +nano ldap_data.ldif +``` + +In that file, paste the following contents: + +``` +dn: ou=People,dc=example,dc=com + +objectClass: organizationalUnit + +ou: People + + +dn: ou=Groups,dc=EXAMPLE,dc=COM + +objectClass: organizationalUnit + +ou: Groups + + +dn: cn=DEPARTMENT,ou=Groups,dc=EXAMPLE,dc=COM + +objectClass: posixGroup + +cn: SUBGROUP + +gidNumber: 5000 + + +dn: uid=USER,ou=People,dc=EXAMPLE,dc=COM + +objectClass: inetOrgPerson + +objectClass: posixAccount + +objectClass: shadowAccount + +uid: USER + +sn: LASTNAME + +givenName: FIRSTNAME + +cn: FULLNAME + +displayName: DISPLAYNAME + +uidNumber: 10000 + +gidNumber: 5000 + +userPassword: PASSWORD + +gecos: FULLNAME + +loginShell: /bin/bash + +homeDirectory: USERDIRECTORY +``` + +In the above file, every entry in all caps needs to be modified to fit your company needs. Once you’ve modified the above file, save and close it with the [Ctrl]+[x] key combination. + +To add the data from the file to the LDAP directory, issue the command: + +``` +ldapadd -x -D cn=admin,dc=EXAMPLE,dc=COM -W -f ldap_data.ldif +``` + +Remember to alter the dc entries (EXAMPLE and COM) in the above command to match your domain name. After running the command, you will be prompted for the LDAP admin password. When you successfully authentication to the LDAP server, the data will be added. You can then ensure the data is there, by running a search like so: + +``` +ldapsearch -x -LLL -b dc=EXAMPLE,dc=COM 'uid=USER' cn gidNumber +``` + +Where EXAMPLE and COM is your domain name and USER is the user to search for. The command should report the entry you searched for (Figure 3). + +![search][7] + +Figure 3: Our search was successful. + +[Used with permission][5] + +Now that you have your first entry into your LDAP directory, you can edit the above file to create even more. Or, you can wait until the next entry into the series (installing LDAP Account Manager) and take care of the process with the web-based GUI. Either way, you’re one step closer to having LDAP authentication on your network. + +-------------------------------------------------------------------------------- + +via: https://www.linux.com/blog/2019/3/how-install-openldap-ubuntu-server-1804 + +作者:[Jack Wallen][a] +选题:[lujun9972][b] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]: https://www.linux.com/users/jlwallen +[b]: https://github.com/lujun9972 +[1]: https://www.linux.com/sites/lcom/files/styles/rendered_file/public/ldap.png?itok=r9viT8n6 (OpenLDAP) +[2]: /LICENSES/CATEGORY/CREATIVE-COMMONS-ZERO +[3]: https://www.openldap.org/ +[4]: https://www.linux.com/sites/lcom/files/styles/rendered_file/public/ldap_1.jpg?itok=vbWScztB (password) +[5]: /LICENSES/CATEGORY/USED-PERMISSION +[6]: https://www.linux.com/sites/lcom/files/styles/rendered_file/public/ldap_2.jpg?itok=10CSCm6Z (domain name) +[7]: https://www.linux.com/sites/lcom/files/styles/rendered_file/public/ldap_3.jpg?itok=df2Y65Dv (search) From 0b0a6c24a05a2c3addd6f4a5411220e71443e944 Mon Sep 17 00:00:00 2001 From: darksun Date: Mon, 25 Mar 2019 12:15:20 +0800 Subject: [PATCH 2/6] =?UTF-8?q?=E9=80=89=E9=A2=98:=2020190323=20How=20to?= =?UTF-8?q?=20transition=20into=20a=20Developer=20Relations=20career=20sou?= =?UTF-8?q?rces/talk/20190323=20How=20to=20transition=20into=20a=20Develop?= =?UTF-8?q?er=20Relations=20career.md?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...ition into a Developer Relations career.md | 74 +++++++++++++++++++ 1 file changed, 74 insertions(+) create mode 100644 sources/talk/20190323 How to transition into a Developer Relations career.md diff --git a/sources/talk/20190323 How to transition into a Developer Relations career.md b/sources/talk/20190323 How to transition into a Developer Relations career.md new file mode 100644 index 0000000000..40de0edf21 --- /dev/null +++ b/sources/talk/20190323 How to transition into a Developer Relations career.md @@ -0,0 +1,74 @@ +[#]: collector: (lujun9972) +[#]: translator: ( ) +[#]: reviewer: ( ) +[#]: publisher: ( ) +[#]: url: ( ) +[#]: subject: (How to transition into a Developer Relations career) +[#]: via: (https://opensource.com/article/19/3/developer-relations-career) +[#]: author: (Mary Thengvall https://opensource.com/users/marygrace-0) + +How to transition into a Developer Relations career +====== + +Combine your love for open source software with your love for the community in a way that allows you to invest your time in both. + +![][1] + +Let's say you've found an open source project you really love and you want to do more than just contribute. Or you love coding, but you don't want to spend the rest of your life interacting more with your computer than you do with people. How do you combine your love for open source software with your love for the community in a way that allows you to invest your time in both? + +### Developer Relations: A symbiotic relationship + +Enter community management, or as it's more commonly called in the tech industry, Developer Relations (DevRel for short). The goal of DevRel is, at its core, to empower developers. From writing content and creating documentation to supporting local meetups and bubbling up developer feedback internally, everything that a Developer Relations professional does on a day-to-day basis is for the benefit of the community. That's not to say that it doesn't benefit the company as well! After all, as Developer Relations professionals understand, if the community succeeds, so will the company. It's the best kind of symbiotic relationship! + +These hybrid roles have been around since shortly after the open source and free software movements started, but the Developer Relations industry—and the Developer Advocate role, in particular—have exploded over the past few years. So what is Developer Relations exactly? Let's start by defining "community" so that we're all on the same page: + +> **Community:** A group of people who not only share common principles, but also develop and share practices that help individuals in the group thrive. + +This could be a group of people who have gathered around an open source project, a particular topic such as email, or who are all in a similar job function—the DevOps community, for instance. + +As I mentioned, the role of a DevRel team is to empower the community by building up, encouraging, and amplifying the voice of the community members. While this will look slightly different at every company, depending on its goals, priorities, and direction, there are a few themes that are consistent throughout the industry. + + 1. **Listen:** Before making any plans or goals, take the time to listen. + * _Listen to your company stakeholders:_ What do they expect of your team? What do they think you should be responsible for? What metrics are they accustomed to? And what business needs do they care most about? + * _Listen to your customer community:_ What are customers' biggest pain points with your product? Where do they struggle with onboarding? Where does the documentation fail them? + * _Listen to your product's technical audience:_ What problems are they trying to solve? What could be done to make their work life easier? Where do they get their content? What technological advances are they most excited about? + + + 2. **Gather information** +Based on these answers, you can start making your plan. Find the overlapping areas where you can make your product a better fit for the larger technical audience and also make it easier for your customers to use. Figure out what content you can provide that not only answers your community's questions but also solves problems for your company's stakeholders. Learn about the areas where your co-workers struggle and see where your strengths can supplement those needs. + + + 3. **Make connections** +Above all, community managers are responsible for making connections within the community as well as between community members and coworkers. These connections, or "DevRel qualified leads," are what ultimately shows the business value of a community manager's work. By making connections between community members Marie and Bob, who are both interested in the latest developments in Python, or between Marie and your coworker Phil, who's responsible for developer-focused content on your website, you're making your community a valuable source of information for everyone around you. + + + +By getting to know your technical community, you become an expert on what customer needs your product can meet. With great power comes great responsibility. As the expert, you are now responsible for advocating internally for those needs, and you have the potential to make a big difference for your community. + +### Getting started + +So now what? If you're still with me, congratulations! You might just be a good fit for a Community Manager or Developer Advocate role. I'd encourage you to take community work for a test drive and see if you like the pace and the work. There's a lot of context switching and moving around between tasks, which can be a bit of an adjustment for some folks. + +Volunteer to write a blog post for your marketing team (or for [Opensource.com][2]) or help out at an upcoming conference. Apply to speak at a local meetup or offer to advise on a few technical support cases. Get to know your community members on a deeper level. + +Above all, Community Managers are 100% driven by a passion for building technical communities and bringing people together. If that resonates with you, it may be time for a career change! + +I love talking to professionals that help others grow through community and Developer Relations practices. Don't hesitate to [reach out to me][3] if you have any questions or send me a [DM on Twitter][4]. + +-------------------------------------------------------------------------------- + +via: https://opensource.com/article/19/3/developer-relations-career + +作者:[Mary Thengvall][a] +选题:[lujun9972][b] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]: https://opensource.com/users/marygrace-0 +[b]: https://github.com/lujun9972 +[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/resume_career_document_general.png?itok=JEaFL2XI +[2]: https://opensource.com/how-submit-article +[3]: https://www.marythengvall.com/about +[4]: http://twitter.com/mary_grace From f84e1390cdd0b10e31647519fb3814bbfc07d5d9 Mon Sep 17 00:00:00 2001 From: darksun Date: Mon, 25 Mar 2019 12:16:43 +0800 Subject: [PATCH 3/6] =?UTF-8?q?=E9=80=89=E9=A2=98:=2020190322=20How=20to?= =?UTF-8?q?=20save=20time=20with=20TiDB=20sources/talk/20190322=20How=20to?= =?UTF-8?q?=20save=20time=20with=20TiDB.md?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../20190322 How to save time with TiDB.md | 143 ++++++++++++++++++ 1 file changed, 143 insertions(+) create mode 100644 sources/talk/20190322 How to save time with TiDB.md diff --git a/sources/talk/20190322 How to save time with TiDB.md b/sources/talk/20190322 How to save time with TiDB.md new file mode 100644 index 0000000000..534c04de1f --- /dev/null +++ b/sources/talk/20190322 How to save time with TiDB.md @@ -0,0 +1,143 @@ +[#]: collector: (lujun9972) +[#]: translator: ( ) +[#]: reviewer: ( ) +[#]: publisher: ( ) +[#]: url: ( ) +[#]: subject: (How to save time with TiDB) +[#]: via: (https://opensource.com/article/19/3/how-save-time-tidb) +[#]: author: (Morgan Tocker https://opensource.com/users/morgo) + +How to save time with TiDB +====== + +TiDB, an open source-compatible, cloud-based database engine, simplifies many of MySQL database administrators' common tasks. + +![Team checklist][1] + +Last November, I wrote about key [differences between MySQL and TiDB][2], an open source-compatible, cloud-based database engine, from the perspective of scaling both solutions in the cloud. In this follow-up article, I'll dive deeper into the ways [TiDB][3] streamlines and simplifies administration. + +If you come from a MySQL background, you may be used to doing a lot of manual tasks that are either not required or much simpler with TiDB. + +The inspiration for TiDB came from the founders managing sharded MySQL at scale at some of China's largest internet companies. Since requirements for operating a large system at scale are a key concern, I'll look at some typical MySQL database administrator (DBA) tasks and how they translate to TiDB. + +[![TiDB architecture][4]][5] + +In [TiDB's architecture][5]: + + * SQL processing is separated from data storage. The SQL processing (TiDB) and storage (TiKV) components independently scale horizontally. + * PD (Placement Driver) acts as the cluster manager and stores metadata. + * All components natively provide high availability, with PD and TiKV using the [Raft consensus algorithm][6]. + * You can access your data via either MySQL (TiDB) or Spark (TiSpark) protocols. + + + +### Adding/fixing replication slaves + +**tl;dr:** It doesn't happen in the same way as in MySQL. + +Replication and redundancy of data are automatically managed by TiKV. You also don't need to worry about creating initial backups to seed replicas, as _both_ the provisioning and replication are handled for you. + +Replication is also quorum-based using the Raft consensus algorithm, so you don't have to worry about the inconsistency problems surrounding failures that you do with asynchronous replication (the default in MySQL and what many users are using). + +TiDB does support its own binary log, so it can be used for asynchronous replication between clusters. + +### Optimizing slow queries + +**tl;dr:** Still happens in TiDB + +There is no real way out of optimizing slow queries that have been introduced by development teams. + +As a mitigating factor though, if you need to add breathing room to your database's capacity while you work on optimization, the TiDB's architecture allows you to horizontally scale. + +### Upgrades and maintenance + +**tl;dr:** Still required, but generally easier + +Because the TiDB server is stateless, you can roll through an upgrade and deploy new TiDB servers. Then you can remove the older TiDB servers from the load balancer pool, shutting down them once connections have drained. + +Upgrading PD is also quite straightforward since only the PD leader actively answers requests at a time. You can perform a rolling upgrade and upgrade PD's non-leader peers one at a time, and then change the leader before upgrading the final PD server. + +For TiKV, the upgrade is marginally more complex. If you want to remove a node, I recommend first setting it to be a follower on each of the regions where it is currently a leader. After that, you can bring down the node without impacting your application. If the downtime is brief, TiKV will recover with its regional peers from the Raft log. In a longer downtime, it will need to re-copy data. This can all be managed for you, though, if you choose to deploy using Ansible or Kubernetes. + +### Manual sharding + +**tl;dr:** Not required + +Manual sharding is mainly a pain on the part of the application developers, but as a DBA, you might have to get involved if the sharding is naive or has problems such as hotspots (many workloads do) that require re-balancing. + +In TiDB, re-sharding or re-balancing happens automatically in the background. The PD server observes when data regions (TiKV's term for chunks of data in key-value form) get too small, too big, or too frequently accessed. + +You can also explicitly configure PD to store regions on certain TiKV servers. This works really well when combined with MySQL partitioning. + +### Capacity planning + +**tl;dr:** Much easier + +Capacity planning on a MySQL database can be a little bit hard because you need to plan your physical infrastructure requirements two to three years from now. As data grows (and the working set changes), this can be a difficult task. I wouldn't say it completely goes away in the cloud either, since changing a master server's hardware is always hard. + +TiDB splits data into approximately 100MiB chunks that it distributes among TiKV servers. Because this increment is much smaller than a full server, it's much easier to move around and redistribute data. It's also possible to add new servers in smaller increments, which is easier on planning. + +### Scaling + +**tl;dr:** Much easier + +This is related to capacity planning and sharding. When we talk about scaling, many people think about very large _systems,_ but that is not exclusively how I think of the problem: + + * Scaling is being able to start with something very small, without having to make huge investments upfront on the chance it could become very large. + * Scaling is also a people problem. If a system requires too much internal knowledge to operate, it can become hard to grow as an engineering organization. The barrier to entry for new hires can become very high. + + + +Thus, by providing automatic sharding, TiDB can scale much easier. + +### Schema changes (DDL) + +**tl;dr:** Mostly better + +The data definition language (DDL) supported in TiDB is all online, which means it doesn't block other reads or writes to the system. It also doesn't block the replication stream. + +That's the good news, but there are a couple of limitations to be aware of: + + * TiDB does not currently support all DDL operations, such as changing the primary key or some "change data type" operations. + * TiDB does not currently allow you to chain multiple DDL changes in the same command, e.g., _ALTER TABLE t1 ADD INDEX (x), ADD INDEX (y)_. You will need to break these queries up into individual DDL queries. + + + +This is an area that we're looking to improve in [TiDB 3.0][7]. + +### Creating one-off data dumps for the reporting team + +**tl;dr:** May not be required + +DBAs loathe manual tasks that create one-off exports of data to be consumed by another team, perhaps in an analytics tool or data warehouse. + +This is often required when the types of queries that are be executed on the dataset are analytical. TiDB has hybrid transactional/analytical processing (HTAP) capabilities, so in many cases, these queries should work fine. If your analytics team is using Spark, you can also use the [TiSpark][8] connector to allow them to connect directly to TiKV. + +This is another area we are improving with [TiFlash][7], a column store accelerator. We are also working on a plugin system to support external authentication. This will make it easier to manage access by the reporting team. + +### Conclusion + +In this post, I looked at some common MySQL DBA tasks and how they translate to TiDB. If you would like to learn more, check out our [TiDB Academy course][9] designed for MySQL DBAs (it's free!). + +-------------------------------------------------------------------------------- + +via: https://opensource.com/article/19/3/how-save-time-tidb + +作者:[Morgan Tocker][a] +选题:[lujun9972][b] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]: https://opensource.com/users/morgo +[b]: https://github.com/lujun9972 +[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/checklist_todo_clock_time_team.png?itok=1z528Q0y (Team checklist) +[2]: https://opensource.com/article/18/11/key-differences-between-mysql-and-tidb +[3]: https://github.com/pingcap/tidb +[4]: https://opensource.com/sites/default/files/uploads/tidb_architecture.png (TiDB architecture) +[5]: https://pingcap.com/docs/architecture/ +[6]: https://raft.github.io/ +[7]: https://pingcap.com/blog/tidb-3.0-beta-stability-at-scale/ +[8]: https://github.com/pingcap/tispark +[9]: https://pingcap.com/tidb-academy/ From 5725249daa64eb95303b89b6a357152406afe8f0 Mon Sep 17 00:00:00 2001 From: darksun Date: Mon, 25 Mar 2019 12:17:50 +0800 Subject: [PATCH 4/6] =?UTF-8?q?=E9=80=89=E9=A2=98:=2020190322=2012=20open?= =?UTF-8?q?=20source=20tools=20for=20natural=20language=20processing=20sou?= =?UTF-8?q?rces/tech/20190322=2012=20open=20source=20tools=20for=20natural?= =?UTF-8?q?=20language=20processing.md?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...e tools for natural language processing.md | 113 ++++++++++++++++++ 1 file changed, 113 insertions(+) create mode 100644 sources/tech/20190322 12 open source tools for natural language processing.md diff --git a/sources/tech/20190322 12 open source tools for natural language processing.md b/sources/tech/20190322 12 open source tools for natural language processing.md new file mode 100644 index 0000000000..99031acf68 --- /dev/null +++ b/sources/tech/20190322 12 open source tools for natural language processing.md @@ -0,0 +1,113 @@ +[#]: collector: (lujun9972) +[#]: translator: ( ) +[#]: reviewer: ( ) +[#]: publisher: ( ) +[#]: url: ( ) +[#]: subject: (12 open source tools for natural language processing) +[#]: via: (https://opensource.com/article/19/3/natural-language-processing-tools) +[#]: author: (Dan Barker (Community Moderator) https://opensource.com/users/barkerd427) + +12 open source tools for natural language processing +====== + +Take a look at a dozen options for your next NLP application. + +![Chat bubbles][1] + +Natural language processing (NLP), the technology that powers all the chatbots, voice assistants, predictive text, and other speech/text applications that permeate our lives, has evolved significantly in the last few years. There are a wide variety of open source NLP tools out there, so I decided to survey the landscape to help you plan your next voice- or text-based application. + +For this review, I focused on tools that use languages I'm familiar with, even though I'm not familiar with all the tools. (I didn't find a great selection of tools in the languages I'm not familiar with anyway.) That said, I excluded tools in three languages I am familiar with, for various reasons. + +The most obvious language I didn't include might be R, but most of the libraries I found hadn't been updated in over a year. That doesn't always mean they aren't being maintained well, but I think they should be getting updates more often to compete with other tools in the same space. I also chose languages and tools that are most likely to be used in production scenarios (rather than academia and research), and I have mostly used R as a research and discovery tool. + +I was also surprised to see that the Scala libraries are fairly stagnant. It has been a couple of years since I last used Scala, when it was pretty popular. Most of the libraries haven't been updated since that time—or they've only had a few updates. + +Finally, I excluded C++. This is mostly because it's been many years since I last wrote in C++, and the organizations I've worked in have not used C++ for NLP or any data science work. + +### Python tools + +#### Natural Language Toolkit (NLTK) + +It would be easy to argue that [Natural Language Toolkit (NLTK)][2] is the most full-featured tool of the ones I surveyed. It implements pretty much any component of NLP you would need, like classification, tokenization, stemming, tagging, parsing, and semantic reasoning. And there's often more than one implementation for each, so you can choose the exact algorithm or methodology you'd like to use. It also supports many languages. However, it represents all data in the form of strings, which is fine for simple constructs but makes it hard to use some advanced functionality. The documentation is also quite dense, but there is a lot of it, as well as [a great book][3]. The library is also a bit slow compared to other tools. Overall, this is a great toolkit for experimentation, exploration, and applications that need a particular combination of algorithms. + +#### SpaCy + +[SpaCy][4] is probably the main competitor to NLTK. It is faster in most cases, but it only has a single implementation for each NLP component. Also, it represents everything as an object rather than a string, which simplifies the interface for building applications. This also helps it integrate with many other frameworks and data science tools, so you can do more once you have a better understanding of your text data. However, SpaCy doesn't support as many languages as NLTK. It does have a simple interface with a simplified set of choices and great documentation, as well as multiple neural models for various components of language processing and analysis. Overall, this is a great tool for new applications that need to be performant in production and don't require a specific algorithm. + +#### TextBlob + +[TextBlob][5] is kind of an extension of NLTK. You can access many of NLTK's functions in a simplified manner through TextBlob, and TextBlob also includes functionality from the Pattern library. If you're just starting out, this might be a good tool to use while learning, and it can be used in production for applications that don't need to be overly performant. Overall, TextBlob is used all over the place and is great for smaller projects. + +#### Textacy + +This tool may have the best name of any library I've ever used. Say "[Textacy][6]" a few times while emphasizing the "ex" and drawing out the "cy." Not only is it great to say, but it's also a great tool. It uses SpaCy for its core NLP functionality, but it handles a lot of the work before and after the processing. If you were planning to use SpaCy, you might as well use Textacy so you can easily bring in many types of data without having to write extra helper code. + +#### PyTorch-NLP + +[PyTorch-NLP][7] has been out for just a little over a year, but it has already gained a tremendous community. It is a great tool for rapid prototyping. It's also updated often with the latest research, and top companies and researchers have released many other tools to do all sorts of amazing processing, like image transformations. Overall, PyTorch is targeted at researchers, but it can also be used for prototypes and initial production workloads with the most advanced algorithms available. The libraries being created on top of it might also be worth looking into. + +### Node tools + +#### Retext + +[Retext][8] is part of the [unified collective][9]. Unified is an interface that allows multiple tools and plugins to integrate and work together effectively. Retext is one of three syntaxes used by the unified tool; the others are Remark for markdown and Rehype for HTML. This is a very interesting idea, and I'm excited to see this community grow. Retext doesn't expose a lot of its underlying techniques, but instead uses plugins to achieve the results you might be aiming for with NLP. It's easy to do things like checking spelling, fixing typography, detecting sentiment, or making sure text is readable with simple plugins. Overall, this is an excellent tool and community if you just need to get something done without having to understand everything in the underlying process. + +#### Compromise + +[Compromise][10] certainly isn't the most sophisticated tool. If you're looking for the most advanced algorithms or the most complete system, this probably isn't the right tool for you. However, if you want a performant tool that has a wide breadth of features and can function on the client side, you should take a look at Compromise. Overall, its name is accurate in that the creators compromised on functionality and accuracy by focusing on a small package with much more specific functionality that benefits from the user understanding more of the context surrounding the usage. + +#### Natural + +[Natural][11] includes most functions you might expect in a general NLP library. It is mostly focused on English, but some other languages have been contributed, and the community is open to additional contributions. It supports tokenizing, stemming, classification, phonetics, term frequency–inverse document frequency, WordNet, string similarity, and some inflections. It might be most comparable to NLTK, in that it tries to include everything in one package, but it is easier to use and isn't necessarily focused around research. Overall, this is a pretty full library, but it is still in active development and may require additional knowledge of underlying implementations to be fully effective. + +#### Nlp.js + +[Nlp.js][12] is built on top of several other NLP libraries, including Franc and Brain.js. It provides a nice interface into many components of NLP, like classification, sentiment analysis, stemming, named entity recognition, and natural language generation. It also supports quite a few languages, which is helpful if you plan to work in something other than English. Overall, this is a great general tool with a simplified interface into several other great tools. This will likely take you a long way in your applications before you need something more powerful or more flexible. + +### Java tools + +#### OpenNLP + +[OpenNLP][13] is hosted by the Apache Foundation, so it's easy to integrate it into other Apache projects, like Apache Flink, Apache NiFi, and Apache Spark. It is a general NLP tool that covers all the common processing components of NLP, and it can be used from the command line or within an application as a library. It also has wide support for multiple languages. Overall, OpenNLP is a powerful tool with a lot of features and ready for production workloads if you're using Java. + +#### StanfordNLP + +[Stanford CoreNLP][14] is a set of tools that provides statistical NLP, deep learning NLP, and rule-based NLP functionality. Many other programming language bindings have been created so this tool can be used outside of Java. It is a very powerful tool created by an elite research institution, but it may not be the best thing for production workloads. This tool is dual-licensed with a special license for commercial purposes. Overall, this is a great tool for research and experimentation, but it may incur additional costs in a production system. The Python implementation might also interest many readers more than the Java version. Also, one of the best Machine Learning courses is taught by a Stanford professor on Coursera. [Check it out][15] along with other great resources. + +#### CogCompNLP + +[CogCompNLP][16], developed by the University of Illinois, also has a Python library with similar functionality. It can be used to process text, either locally or on remote systems, which can remove a tremendous burden from your local device. It provides processing functions such as tokenization, part-of-speech tagging, chunking, named-entity tagging, lemmatization, dependency and constituency parsing, and semantic role labeling. Overall, this is a great tool for research, and it has a lot of components that you can explore. I'm not sure it's great for production workloads, but it's worth trying if you plan to use Java. + +* * * + +What are your favorite open source tools and libraries for NLP? Please share in the comments—especially if there's one I didn't include. + +-------------------------------------------------------------------------------- + +via: https://opensource.com/article/19/3/natural-language-processing-tools + +作者:[Dan Barker (Community Moderator)][a] +选题:[lujun9972][b] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]: https://opensource.com/users/barkerd427 +[b]: https://github.com/lujun9972 +[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/talk_chat_communication_team.png?itok=CYfZ_gE7 (Chat bubbles) +[2]: http://www.nltk.org/ +[3]: http://www.nltk.org/book_1ed/ +[4]: https://spacy.io/ +[5]: https://textblob.readthedocs.io/en/dev/ +[6]: https://readthedocs.org/projects/textacy/ +[7]: https://pytorchnlp.readthedocs.io/en/latest/ +[8]: https://www.npmjs.com/package/retext +[9]: https://unified.js.org/ +[10]: https://www.npmjs.com/package/compromise +[11]: https://www.npmjs.com/package/natural +[12]: https://www.npmjs.com/package/node-nlp +[13]: https://opennlp.apache.org/ +[14]: https://stanfordnlp.github.io/CoreNLP/ +[15]: https://opensource.com/article/19/2/learn-data-science-ai +[16]: https://github.com/CogComp/cogcomp-nlp From bd5dd5d2456c70380b24982b3874316a89a36dac Mon Sep 17 00:00:00 2001 From: darksun Date: Mon, 25 Mar 2019 12:56:16 +0800 Subject: [PATCH 5/6] =?UTF-8?q?=E9=80=89=E9=A2=98:=2020190322=20Easy=20mea?= =?UTF-8?q?ns=20easy=20to=20debug=20sources/tech/20190322=20Easy=20means?= =?UTF-8?q?=20easy=20to=20debug.md?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../tech/20190322 Easy means easy to debug.md | 83 +++++++++++++++++++ 1 file changed, 83 insertions(+) create mode 100644 sources/tech/20190322 Easy means easy to debug.md diff --git a/sources/tech/20190322 Easy means easy to debug.md b/sources/tech/20190322 Easy means easy to debug.md new file mode 100644 index 0000000000..4b0b4d52d2 --- /dev/null +++ b/sources/tech/20190322 Easy means easy to debug.md @@ -0,0 +1,83 @@ +[#]: collector: (lujun9972) +[#]: translator: ( ) +[#]: reviewer: ( ) +[#]: publisher: ( ) +[#]: url: ( ) +[#]: subject: (Easy means easy to debug) +[#]: via: (https://arp242.net/weblog/easy.html) +[#]: author: (Martin Tournoij https://arp242.net/) + + +What does it mean for a framework, library, or tool to be “easy”? There are many possible definitions one could use, but my definition is usually that it’s easy to debug. I often see people advertise a particular program, framework, library, file format, or something else as easy because “look with how little effort I can do task X, this is so easy!” That’s great, but an incomplete picture. + +You only write software once, but will almost always go through several debugging cycles. With debugging cycle I don’t mean “there is a bug in the code you need to fix”, but rather “I need to look at this code to fix the bug”. To debug code, you need to understand it, so “easy to debug” by extension means “easy to understand”. + +Abstractions which make something easier to write often come at the cost of make things harder to understand. Sometimes this is a good trade-off, but often it’s not. In general I will happily spend a little but more effort writing something now if that makes things easier to understand and debug later on, as it’s often a net time-saver. + +Simplicity isn’t the only thing that makes programs easier to debug, but it is probably the most important. Good documentation helps too, but unfortunately good documentation is uncommon (note that quality is not measured by word count!) + +This is not exactly a novel insight; from the 1974 The Elements of Programming Style by Brian W. Kernighan and P. J. Plauger: + +> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it? + +A lot of stuff I see seems to be written “as clever as can be” and is consequently hard to debug. I’ll list a few examples of this pattern below. It’s not my intention to argue that any of these things are bad per se, I just want to highlight the trade-offs in “easy to use” vs. “easy to debug”. + + * When I tried running [Let’s Encrypt][1] a few years ago it required running a daemon as root(!) to automatically rewrite nginx files. I looked at the source a bit to understand how it worked and it was all pretty complex, so I was “let’s not” and opted to just pay €10 to the CA mafia, as not much can go wrong with putting a file in /etc/nginx/, whereas a lot can go wrong with complex Python daemons running as root. + +(I don’t know the current state/options for Let’s Encrypt; at a quick glance there may be better/alternative ACME clients that suck less now.) + + * Some people claim that systemd is easier than SysV init.d scripts because it’s easier to write systemd unit files than it is to write shell scripts. In particular, this is the argument Lennart Poettering used in his [systemd myths][2] post (point 5). + +I think is completely missing the point. I agree with Poettering that shell scripts are hard – [I wrote an entire post about that][3] – but by making the interface easier doesn’t mean the entire system becomes easier. Look at [this issue][4] I encountered and [the fix][5] for it. Does that look easy to you? + + * Many JavaScript frameworks I’ve used can be hard to fully understand. Clever state keeping logic is great and all, until that state won’t work as you expect, and then you better hope there’s a Stack Overflow post or GitHub issue to help you out. + + * Docker is great, right up to the point you get: + +``` + ERROR: for elasticsearch Cannot start service elasticsearch: +oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:258: +applying cgroup configuration for process caused \"failed to write 898 to cgroup.procs: write +/sys/fs/cgroup/cpu,cpuacct/docker/b13312efc203e518e3864fc3f9d00b4561168ebd4d9aad590cc56da610b8dd0e/cgroup.procs: +invalid argument\"" +``` + +or + +``` +ERROR: for elasticsearch Cannot start service elasticsearch: EOF +``` + +And … now what? + + * Many testing libraries can make things harder to debug. Ruby’s rspec is a good example where I’ve occasionally used the library wrong by accident and had to spend quite a long time figuring out what exactly went wrong (as the errors it gave me were very confusing!) + +I wrote a bit more about that in my [Testing isn’t everything][6] post. + + * ORM libraries can make database queries a lot easier, at the cost of making things a lot harder to understand once you want to solve a problem. + + + + + +-------------------------------------------------------------------------------- + +via: https://arp242.net/weblog/easy.html + +作者:[Martin Tournoij][a] +选题:[lujun9972][b] +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]: https://arp242.net/ +[b]: https://github.com/lujun9972 +[1]: https://en.wikipedia.org/wiki/Let%27s_Encrypt +[2]: http://0pointer.de/blog/projects/the-biggest-myths.html +[3]: https://arp242.net/weblog/shell-scripting-trap.html +[4]: https://unix.stackexchange.com/q/185495/33645 +[5]: https://cgit.freedesktop.org/systemd/systemd/commit/?id=6e392c9c45643d106673c6643ac8bf4e65da13c1 +[6]: /weblog/testing.html +[7]: mailto:martin@arp242.net +[8]: https://github.com/Carpetsmoker/arp242.net/issues/new From 3b4d94eccce5920fa7762cc1998636a3caddf658 Mon Sep 17 00:00:00 2001 From: Hansong Zhang Date: Mon, 25 Mar 2019 15:01:59 +0800 Subject: [PATCH 6/6] Translating Translating: How to host your own webfonts --- sources/tech/20190318 How to host your own webfonts.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sources/tech/20190318 How to host your own webfonts.md b/sources/tech/20190318 How to host your own webfonts.md index 78fba8389d..97cc7f0ba8 100644 --- a/sources/tech/20190318 How to host your own webfonts.md +++ b/sources/tech/20190318 How to host your own webfonts.md @@ -1,5 +1,5 @@ [#]: collector: (lujun9972) -[#]: translator: ( ) +[#]: translator: (zhs852) [#]: reviewer: ( ) [#]: publisher: ( ) [#]: url: ( )