mirror of https://github.com/LCTT/TranslateProject.git synced 2025-01-07 22:11:09 +08:00

陈亮 7ee8f631a9 [申请翻译] 20170823 How Machines Learn A Practical Guide (#6007 )

* 【申请翻译】How Machines Learn A Practical Guide

* tanslating by flowsnow

2017-08-29 15:01:33 +08:00

24 KiB

Raw Blame History

translating by flowsow

How Machines Learn: A Practical Guide

You may have heard about machine learning from interesting applications like spam filtering, optical character recognition, and computer vision.

Getting started with machine learning is long process that involves going through several resources. There are books for newbies, academic papers, guided exercises, and standalone projects. It’s easy to lose track of what you need to learn among all these options.

So in today’s post, I’ll list seven steps (and 50+ resources) that can help you get started in this exciting field of Computer Science, and ramp up toward becoming a machine learning hero.

Note that this list of resources is not exhaustive and is meant to get you started. There are many more resources around.

1. Get the necessary background knowledge

You might remember from DataCamp’s Learn Data Science infographic that mathematics and statistics are key to starting machine learning (ML). The foundations might seem quite easy because it’s just three topics. But don’t forget that these are in fact three broad topics.

There are two things that are very important to keep in mind here:

First, you’ll definitely want some further guidance on what exactly you need to cover to get started.
Second, these are the foundations of your further learning. Don’t be scared to take your time. Get the knowledge on which you’ll build everything.

The first point is simple: it’s a good idea to cover linear algebra and statistics. These two are the bare minimum that one should understand. But while you’re at it, you should also try to cover topics such as optimization and advanced calculus. They will come in handy when you’re getting deeper into ML.

Here are some pointers on where to get started if you are starting from zero:

Khan Academy is a good resource for beginners. Consider taking the Linear Algebra and Calculus courses.
Go to MIT OpenCourseWare and take the Linear Algebra course.
Take this Coursera course for an introduction to descriptive statistics, probability theory, and inferential statistics.

Statistics is one of the keys to learning ML

If you’re more into books, consider the following:

However, in most cases, you’ll start off already knowing some things about statistics and mathematics. Or maybe you have already gone through all the theory resources listed above.

In these cases, it’s a good idea to recap and assess your knowledge honestly. Are there any areas that you need to revise or are you good for now?

If you’re all set, it’s time to go ahead and apply all that knowledge with R or Python. As a general guideline, it’s a good idea to pick one and get started with that language. Later, you can still add the other programming language to your skill set.

Why is all this programming knowledge necessary?

Well, you’ll see that the courses listed above (or those you have taken in school or university) will provide you with a more theoretical (and not applied) introduction to mathematics and statistics topics. However, ML is very applied and you’ll need to be able to apply all the topics you have learned. So it’s a good idea to go over the materials again, but this time in an applied way.

If you want to master the basics of R and Python, consider the following courses:

DataCamp’s introductory Python or R courses: Intro to Python for Data Science or Introduction to R Programming.
Introductory Python and R courses from Edx: Introduction to Python for Data Science and Introduction to R for Data Science.
There are many other free courses out there. Check out Coursera or Codeacademy for more.

When you have nailed down the basics, check out DataCamp’s blog on the 40+ Python Statistics For Data Science Resources. This post offers 40+ resources on the statistics topics you need to know to get started with data science (and by extension also ML).

Also make sure you check out this SciPy tutorial on vectors and arrays and this workshop on Scientific Computing with Python.

To get hands-on with Python and calculus, you can check out the SymPy package.

2. Don’t be scared to invest in the “theory” of ML

A lot of people don’t make the effort to go through some more theoretical material because it’s “dry” or “boring.” But going through the theory and really investing your time in it is essential and invaluable in the long run. You’ll better understand new advancements in machine learning, and you’ll be able to link back to your background knowledge. This will help you stay motivated.

Additionally, the theory doesn’t need to be boring. As you read in the introduction, there are so many materials that will make it easier for you to get into it.

Books are one of the best ways to absorb the theoretical knowledge. They force you to stop and think once in a while. Of course, reading books is a very static thing to do and it might not agree with your learning style. Nonetheless, try out the following books and see if it might be something for you:

Machine Learning textbook , by Tom Mitchell might be old but it’s gold. This book goes over the most important topics in machine learning in a well-explained and step-by-step way.
_Machine Learning: The Art and Science of Algorithms that Make Sense of Data _ (you can see the slides of the book here): this book is great for beginners. There are many real-life applications discussed, which you might find lacking in Tom Mitchell’s book.
Machine Learning Yearning : this book by Andrew Ng is not yet complete, but it’s bound to be an excellent reference for those who are learning ML.
Algorithms and Data Structures by Jurg Nievergelt and Klaus Hinrichs
Also check out the Data Mining for the Masses by Matthew North. You’ll find that this book guides you through some of the most difficult topics.
Introduction to Machine Learning by Alex Smola and S.V.N. Vishwanathan.

Take your time to read books and to study the material covered in them

Videos / MOOCs are awesome for those who learn by watching and listening. There are a lot of MOOCs and videos out there, but it can also be hard to find your way through all those materials. Below is a list of the most notable ones:

This well-known Machine Learning MOOC, taught by Andrew Ng, introduces you to Machine Learning and the theory. Don’t worry — it’s well-explained and takes things step-by-step, so it’s excellent for beginners.
The playlist of the MIT Open Courseware 6034 course: already a bit more advanced. You’ll definitely need some previous work on ML theory before you start this series, but you won’t regret it.

At this point, it’s important for you to go over the separate techniques and grasp the whole picture. This starts with understanding key concepts: the distinction between supervised and unsupervised learning, classification and regression, and so on. Manual (written) exercises can come in handy. They can help you understand how algorithms work and how you should go about them. You’ll most often find these written exercises in courses from universities. Check out this ML course by Portland State University.

3. Get hands-on

Knowing the theory and understanding the algorithms by reading and watching is all good. But you also need to surpass this stage and get started with some exercises. You’ll learn to implement these algorithms and apply the theory that you’ve learned.

First, you have tutorials which introduce you to the basics of machine learning in Python and R. The best way is, of course, to go for interactive tutorials:

In Python Machine Learning: Scikit-Learn Tutorial, you will learn more about well-known algorithms KMeans and Support Vector Machines (SVM) to construct models with Scikit-Learn.
Machine Learning in R for beginners introduces you to ML in R with the class and caret packages.
Keras Tutorial: Deep Learning in Python covers how to build Multi-Layer Perceptrons (MLPs) for classification and regression tasks, step-by-step.

Also check out the following tutorials, which are static and will require you to work in an IDE:

Machine Learning in Python, Step By Step: step-by-step tutorial with Scikit-Learn.
Develop Your First Neural Network in Python With Keras Step-By-Step: learn how to develop your first neural network with Keras thanks to this tutorial.
There are many more that you can consider, but the tutorials of Machine Learning Mastery are very good.

Besides the tutorials, there are also courses. Taking courses will help you apply the concepts that you’ve learned in a focused way. Experienced instructors will help you. Here are some interactive courses for Python and ML:

Supervised Learning with scikit-learn: you’ll learn how to build predictive models, tune their parameters, and predict how well they will perform on unseen data. All while using real world datasets. You’ll do so with Scikit-Learn.
Unsupervised Learning in Python: shows you how to cluster, transform, visualize, and extract insights from unlabeled datasets. At the end of the course, you’ll build a recommender system.
Deep Learning in Python: you’ll gain hands-on, practical knowledge of how to use deep learning with Keras 2.0, the latest version of a cutting-edge library for deep learning in Python.
Applied Machine Learning in Python: introduces the learner to applied ML and focuses more on the techniques and methods than on the statistics behind these methods.

After the theory, take your time to apply the knowledge you have gained.

For those who are learning ML with R, there are also these interactive courses:

Introduction to Machine Learning gives you a broad overview of the discipline’s most common techniques and applications. You’ll gain more insight into the assessment and training of different ML models. The rest of the course focuses on an introduction to three of the most basic ML tasks: classification, regression, and clustering.
R: Unsupervised Learning provides a basic introduction to clustering and dimensionality reduction in R from a ML perspective. This allows you to get from data to insights as quickly as possible.
Practical Machine Learning covers the basic components of building and applying prediction functions with an emphasis on practical applications.

Lastly, there are also books that go over ML topics in a very applied way. If you’re looking to learn with the help of text and an IDE, check out these books:

The Python Machine Learning Book by Sebastian Raschka
The Introduction to Artificial Neural Networks and Deep Learning: A Practical Guide with Applications in Python by Sebastian Raschka
Machine Learning with R by Brett Lantz

4. Practice

Practice is even more important than getting hands-on and revising the material with Python. This step was probably the hardest one for me. Check out how other people have implemented ML algorithms when you have done some exercises. Then, get started on your own projects that illustrate your understanding of ML algorithms and theories.

One of the most straightforward ways is to see the exercises a tiny bit bigger. You want to do a bigger exercise which requires you to do more data cleaning and feature engineering.

Start with Kaggle. If you need additional help to conquer the so-called “data fear,” check out the Kaggle Python Tutorial on Machine Learningand Kaggle R Tutorial on Machine Learning. These will bring you up to speed in no time.
Afterwards, you can also start doing challenges by yourself. Check out these sites, where you can find lots of ML datasets: UCI Machine Learning Repository, Public datasets for machine learning, and data.world.

Practice makes perfect.

5. Projects

Doing small exercises is good. But in the end, you’ll want to make a project in which you can demonstrate your understanding of the ML algorithms with which you’ve been working.

The best exercise is to implement your own ML algorithm. You can read more about why you should do this exercise and what you can learn from it in the following pages:

Next, you can check out the following posts and repositories. They’ll give you some inspiration from others and will show how they have implemented ML algorithms.

Projects can be hard at start, but they’ll increase your understanding even more.

6. Don’t stop

Learning ML is something that should never stop. As many will confirm, there are always new things to learn — even when you’ve been working in this area for a decade.

There are, for example, ML trends such as deep learning which are very popular right now. You might also focus on other topics that aren’t central at this point but which might be in the future. Check out this interesting question and the answers if you want to know more.

Papers may not be the first thing that spring to mind when you’re worried about mastering the basics. But they are your way to get up to date with the latest research. Papers are not for those who are just starting out. They are definitely a good fit for those who are more advanced.

Other technologies are also something to consider. But don’t worry about them when you’re just starting out. You can, for example, focus on adding Python or R (depending on which one you already know) to your skill set. You can look through this post to find interesting resources.

If you also want to move towards big data, you could consider looking into Spark. Here are some interesting resources:

Other programming languages, such as Java, JavaScript, C, and C++ are gaining importance in ML. In the long run, you can consider also adding one of these languages to your to-do list. You can use these blog posts to guide your choice:

You’re never done learning.

7. Make use of all the material that is out there

Machine learning is a difficult topic which can make you lose your motivation at some point. Or maybe you feel you need a change. In such cases, remember that there’s a lot of material on which you can fall back. Check out the following resources:

Podcasts. Great resource for continuing your journey into ML and staying up-to-date with the latest developments in the field:

There are, of course, many more podcasts.

Documentation and package source code are two ways to get deeper into the implementation of the ML algorithms. Check out some of these repositories:

Scikit- Learn: Well-known Python ML package
Keras: Deep learning package for Python
caret: very popular R package for Classification and Regression Training

Visualizations are one of the newest and trendiest ways to get into the theory of ML. They’re fantastic for beginners, but also very interesting for more advanced learners. The following visualizations will intrigue you and will help you gain more understanding into the workings of ML:

A visual introduction to machine learning
Distill makes ML Research clear, dynamic and vivid.
Tensorflow — Neural Network Playground if you’re looking to play around with neural network architectures.
More here: What are the best visualizations of machine learning algorithms?

Some variety in your learning can and will motivate you even more.

You Can Get Started Now

Now it’s up to you. Learning ML is something that’s a continuous process, so the sooner you get started, the better. You have all of the tools in your hands now to get started. Good luck and make sure to let us know how you’re progressing.

_This post is based on an answer I gave to the Quora question _ How Does A Total Beginner Start To Learn Machine Learning .

作者简介：

Karlijn Willems

Data Science Journalist

via: https://medium.freecodecamp.org/how-machines-learn-a-practical-guide-203aae23cafb

作者： Karlijn Willems 译者：译者ID 校对：校对者ID

本文由 LCTT 原创编译，Linux中国荣誉推出

24 KiB Raw Blame History Unescape Escape