* 翻译完成 删除 * Add files via upload * 翻译完成,待校对 Translated by Chao-zhi * 开始翻译 开始翻译
10 KiB
translating by Chao-zhi
15 Top Open Source Artificial Intelligence Tools
Artificial Intelligence (AI) is one of the hottest areas of technology research. Companies like IBM, Google, Microsoft, Facebook and Amazon are investing heavily in their own R&D, as well as buying up startups that have made progress in areas like machine learning, neural networks, natural language and image processing. Given the level of interest, it should come as no surprise that a recent artificial intelligence report from experts at Stanford University concluded that "increasingly useful applications of AI, with potentially profound positive impacts on our society and economy are likely to emerge between now and 2030."
In a recent article, we provided an overview of 45 AI projects that seem particularly promising or interesting. In this slideshow, we're focusing in on open source artificial intelligence tools, with a closer look at fifteen of the best-known open source AI projects.
Open Source Artificial Intelligence
These open source AI applications are on the cutting edge of artificial intelligence research.
1. Caffe
The brainchild of a UC Berkeley PhD candidate, Caffe is a deep learning framework based on expressive architecture and extensible code. It's claim to fame is its speed, which makes it popular with both researchers and enterprise users. According to its website, it can process more than 60 million images in a single day using just one NVIDIA K40 GPU. It is managed by the Berkeley Vision and Learning Center (BVLC), and companies like NVIDIA and Amazon have made grants to support its development.
2. CNTK
Short for Computational Network Toolkit, CNTK is one of Microsoft's open source artificial intelligence tools. It boasts outstanding performance whether it is running on a system with only CPUs, a single GPU, multiple GPUs or multiple machines with multiple GPUs. Microsoft has primarily utilized it for research into speech recognition, but it is also useful for applications like machine translation, image recognition, image captioning, text processing, language understanding and language modeling.
3. Deeplearning4j
Deeplearning4j is an open source deep learning library for the Java Virtual Machine (JVM). It runs in distributed environments and integrates with both Hadoop and Apache Spark. It makes it possible to configure deep neural networks, and it's compatible with Java, Scala and other JVM languages.
The project is managed by a commercial company called Skymind, which offers paid support, training and an enterprise distribution of Deeplearning4j.
4. Distributed Machine Learning Toolkit
Like CNTK, the Distributed Machine Learning Toolkit (DMTK) is one of Microsoft's open source artificial intelligence tools. Designed for use in big data applications, it aims to make it faster to train AI systems. It consists of three key components: the DMTK framework, the LightLDA topic model algorithm, and the Distributed (Multisense) Word Embedding algorithm. As proof of DMTK's speed, Microsoft says that on an eight-cluster machine, it can "train a topic model with 1 million topics and a 10-million-word vocabulary (for a total of 10 trillion parameters), on a document collection with over 100-billion tokens," a feat that is unparalleled by other tools.
5. H20
Focused more on enterprise uses for AI than on research, H2O has large companies like Capital One, Cisco, Nielsen Catalina, PayPal and Transamerica among its users. It claims to make is possible for anyone to use the power of machine learning and predictive analytics to solve business problems. It can be used for predictive modeling, risk and fraud analysis, insurance analytics, advertising technology, healthcare and customer intelligence.
It comes in two open source versions: standard H2O and Sparkling Water, which is integrated with Apache Spark. Paid enterprise support is also available.
6. Mahout
An Apache Foundation project, Mahout is an open source machine learning framework. According to its website, it offers three major features: a programming environment for building scalable algorithms, premade algorithms for tools like Spark and H2O, and a vector-math experimentation environment called Samsara. Companies using Mahout include Adobe, Accenture, Foursquare, Intel, LinkedIn, Twitter, Yahoo and many others. Professional support is available through third parties listed on the website.
7. MLlib
Known for its speed, Apache Spark has become one of the most popular tools for big data processing. MLlib is Spark's scalable machine learning library. It integrates with Hadoop and interoperates with both NumPy and R. It includes a host of machine learning algorithms for classification, regression, decision trees, recommendation, clustering, topic modeling, feature transformations, model evaluation, ML pipeline construction, ML persistence, survival analysis, frequent itemset and sequential pattern mining, distributed linear algebra and statistics.
8. NuPIC
Managed by a company called Numenta, NuPIC is an open source artificial intelligence project based on a theory called Hierarchical Temporal Memory, or HTM. Essentially, HTM is an attempt to create a computer system modeled after the human neocortex. The goal is to create machines that "approach or exceed human level performance for many cognitive tasks."
In addition to the open source license, Numenta also offers NuPic under a commercial license, and it also offers licenses on the patents that underlie the technology.
9. OpenNN
Designed for researchers and developers with advanced understanding of artificial intelligence, OpenNN is a C++ programming library for implementing neural networks. Its key features include deep architectures and fast performance. Extensive documentation is available on the website, including an introductory tutorial that explains the basics of neural networks. Paid support for OpenNNis available through Artelnics, a Spain-based firm that specializes in predictive analytics.
10. OpenCyc
Developed by a company called Cycorp, OpenCyc provides access to the Cyc knowledge base and commonsense reasoning engine. It includes more than 239,000 terms, about 2,093,000 triples, and about 69,000 owl:sameAs links to external semantic data namespaces. It is useful for rich domain modeling, semantic data integration, text understanding, domain-specific expert systems and game AIs. The company also offers two other versions of Cyc: one for researchers that is free but not open source and one for enterprise use that requires a fee.
11. Oryx 2
Built on top of Apache Spark and Kafka, Oryx 2 is a specialized application development framework for large-scale machine learning. It utilizes a unique lambda architecture with three tiers. Developers can use Oryx 2 to create new applications, and it also includes some pre-built applications for common big data tasks like collaborative filtering, classification, regression and clustering. The big data tool vendor Cloudera created the original Oryx 1 project and has been heavily involved in continuing development.
12. PredictionIO
In February this year, Salesforce bought PredictionIO, and then in July, it contributed the platform and its trademark to the Apache Foundation, which accepted it as an incubator project. So while Salesforce is using PredictionIO technology to advance its own machine learning capabilities, work will also continue on the open source version. It helps users create predictive engines with machine learning capabilities that can be used to deploy Web services that respond to dynamic queries in real time.
13. SystemML
First developed by IBM, SystemML is now an Apache big data project. It offers a highly-scalable platform that can implement high-level math and algorithms written in R or a Python-like syntax. Enterprises are already using it to track customer service on auto repairs, to direct airport traffic and to link social media data with banking customers. It can run on top of Spark or Hadoop.
14. TensorFlow
TensorFlow is one of Google's open source artificial intelligence tools. It offers a library for numerical computation using data flow graphs. It can run on a wide variety of different systems with single- or multi-CPUs and GPUs and even runs on mobile devices. It boasts deep flexibility, true portability, automatic differential capabilities and support for Python and C++. The website includes a very extensive list of tutorials and how-tos for developers or researchers interested in using or extending its capabilities.
15. Torch
Torch describes itself as "a scientific computing framework with wide support for machine learning algorithms that puts GPUs first." The emphasis here is on flexibility and speed. In addition, it's fairly easy to use with packages for machine learning, computer vision, signal processing, parallel processing, image, video, audio and networking. It relies on a scripting language called LuaJIT that is based on Lua.
作者:Cynthia Harvey 译者:译者ID 校对:校对者ID