diff --git a/sources/tech/20170823 How Machines Learn A Practical Guide.md b/sources/tech/20170823 How Machines Learn A Practical Guide.md deleted file mode 100644 index 02de1c23b9..0000000000 --- a/sources/tech/20170823 How Machines Learn A Practical Guide.md +++ /dev/null @@ -1,389 +0,0 @@ -translating by flowsow - -How Machines Learn: A Practical Guide -============================================================ - -![](https://cdn-images-1.medium.com/max/1000/1*MxSBSJIqK19z2qhfspPL-g.png) - -You may have heard about machine learning from interesting applications like spam filtering, optical character recognition, and computer vision. - -Getting started with machine learning is long process that involves going through several resources. There are books for newbies, academic papers, guided exercises, and standalone projects. It’s easy to lose track of what you need to learn among all these options. - -So in today’s post, I’ll list seven steps (and 50+ resources) that can help you get started in this exciting field of Computer Science, and ramp up toward becoming a machine learning hero. - -Note that this list of resources is not exhaustive and is meant to get you started. There are many more resources around. - -### 1\. Get the necessary background knowledge - -You might remember from DataCamp’s [Learn Data Science][77] infographic that mathematics and statistics are key to starting machine learning (ML). The foundations might seem quite easy because it’s just three topics. But don’t forget that these are in fact three broad topics. - -There are two things that are very important to keep in mind here: - -* First, you’ll definitely want some further guidance on what exactly you need to cover to get started. - -* Second, these are the foundations of your further learning. Don’t be scared to take your time. Get the knowledge on which you’ll build everything. - -The first point is simple: it’s a good idea to cover linear algebra and statistics. These two are the bare minimum that one should understand. But while you’re at it, you should also try to cover topics such as optimization and advanced calculus. They will come in handy when you’re getting deeper into ML. - -Here are some pointers on where to get started if you are starting from zero: - -* [Khan Academy][1] is a good resource for beginners. Consider taking the Linear Algebra and Calculus courses. - -* Go to [MIT OpenCourseWare][2] and take the[ Linear Algebra][3] course. - -* Take [this Coursera course][4] for an introduction to descriptive statistics, probability theory, and inferential statistics. - - -![](https://cdn-images-1.medium.com/max/800/1*Uw8YXNlt5VGKTXFDbtFEig.png) -Statistics is one of the keys to learning ML - -If you’re more into books, consider the following: - -* [_Linear Algebra and Its Applications_][5] _,_ - -* [_Applied Linear Algebra_][6] , - -* [_3,000 Solved Problems in Linear Algebra_][7] _,_ - -* [MIT Online Texbooks][8] - -However, in most cases, you’ll start off already knowing some things about statistics and mathematics. Or maybe you have already gone through all the theory resources listed above. - -In these cases, it’s a good idea to recap and assess your knowledge honestly. Are there any areas that you need to revise or are you good for now? - -If you’re all set, it’s time to go ahead and apply all that knowledge with R or Python. As a general guideline, it’s a good idea to pick one and get started with that language. Later, you can still add the other programming language to your skill set. - -Why is all this programming knowledge necessary? - -Well, you’ll see that the courses listed above (or those you have taken in school or university) will provide you with a more theoretical (and not applied) introduction to mathematics and statistics topics. However, ML is very applied and you’ll need to be able to apply all the topics you have learned. So it’s a good idea to go over the materials again, but this time in an applied way. - -If you want to master the basics of R and Python, consider the following courses: - -* DataCamp’s introductory Python or R courses: [Intro to Python for Data Science][9] or [Introduction to R Programming][10]. - -* Introductory Python and R courses from Edx: [Introduction to Python for Data Science][11] and [Introduction to R for Data Science][12]. - -* There are many other free courses out there. Check out [Coursera][13] or [Codeacademy][14] for more. - -When you have nailed down the basics, check out DataCamp’s blog on the [40+ Python Statistics For Data Science Resources][78]. This post offers 40+ resources on the statistics topics you need to know to get started with data science (and by extension also ML). - -Also make sure you check out [this SciPy tutorial][79] on vectors and arrays and [this workshop][80] on Scientific Computing with Python. - -To get hands-on with Python and calculus, you can check out the [SymPy package][81]. - -### 2\. Don’t be scared to invest in the “theory” of ML - -A lot of people don’t make the effort to go through some more theoretical material because it’s “dry” or “boring.” But going through the theory and really investing your time in it is essential and invaluable in the long run. You’ll better understand new advancements in machine learning, and you’ll be able to link back to your background knowledge. This will help you stay motivated. - -Additionally, the theory doesn’t need to be boring. As you read in the introduction, there are so many materials that will make it easier for you to get into it. - -Books are one of the best ways to absorb the theoretical knowledge. They force you to stop and think once in a while. Of course, reading books is a very static thing to do and it might not agree with your learning style. Nonetheless, try out the following books and see if it might be something for you: - -* [_Machine Learning textbook_][15] , by Tom Mitchell might be old but it’s gold. This book goes over the most important topics in machine learning in a well-explained and step-by-step way. - -* _Machine Learning: The Art and Science of Algorithms that Make Sense of Data _ (you can see the slides of the book [here][16]): this book is great for beginners. There are many real-life applications discussed, which you might find lacking in Tom Mitchell’s book. - -* [_Machine Learning Yearning_][17] : this book by Andrew Ng is not yet complete, but it’s bound to be an excellent reference for those who are learning ML. - -* [_Algorithms and Data Structures_][18]  by Jurg Nievergelt and Klaus Hinrichs - -* Also check out the  [_Data Mining for the Masses_][19]  by Matthew North. You’ll find that this book guides you through some of the most difficult topics. - -* [_Introduction to Machine Learning_][20]  by Alex Smola and S.V.N. Vishwanathan. - - -![](https://cdn-images-1.medium.com/max/800/1*TpLLAIKIRVHq6VQs3Q9IJA.png) -Take your time to read books and to study the material covered in them - -Videos / MOOCs are awesome for those who learn by watching and listening. There are a lot of MOOCs and videos out there, but it can also be hard to find your way through all those materials. Below is a list of the most notable ones: - -* [This well-known Machine Learning MOOC][21], taught by Andrew Ng, introduces you to Machine Learning and the theory. Don’t worry — it’s well-explained and takes things step-by-step, so it’s excellent for beginners. - -* The [playlist of the MIT Open Courseware 6034 course][22]: already a bit more advanced. You’ll definitely need some previous work on ML theory before you start this series, but you won’t regret it. - -At this point, it’s important for you to go over the separate techniques and grasp the whole picture. This starts with understanding key concepts: the distinction between supervised and unsupervised learning, classification and regression, and so on. Manual (written) exercises can come in handy. They can help you understand how algorithms work and how you should go about them. You’ll most often find these written exercises in courses from universities. Check out [this ML course][82] by Portland State University. - -### 3\. Get hands-on - -Knowing the theory and understanding the algorithms by reading and watching is all good. But you also need to surpass this stage and get started with some exercises. You’ll learn to implement these algorithms and apply the theory that you’ve learned. - -First, you have tutorials which introduce you to the basics of machine learning in Python and R. The best way is, of course, to go for interactive tutorials: - -* In [Python Machine Learning: Scikit-Learn Tutorial][23], you will learn more about well-known algorithms KMeans and Support Vector Machines (SVM) to construct models with Scikit-Learn. - -* [Machine Learning in R for beginners][24] introduces you to ML in R with the class and caret packages. - -* [Keras Tutorial: Deep Learning in Python covers ][25]how to build Multi-Layer Perceptrons (MLPs) for classification and regression tasks, step-by-step. - -Also check out the following tutorials, which are static and will require you to work in an IDE: - -* [Machine Learning in Python, Step By Step][26]: step-by-step tutorial with Scikit-Learn. - -* [Develop Your First Neural Network in Python With Keras Step-By-Step][27]: learn how to develop your first neural network with Keras thanks to this tutorial. - -* There are many more that you can consider, but the tutorials of [Machine Learning Mastery][28] are very good. - -Besides the tutorials, there are also courses. Taking courses will help you apply the concepts that you’ve learned in a focused way. Experienced instructors will help you. Here are some interactive courses for Python and ML: - -* [Supervised Learning with scikit-learn][29]: you’ll learn how to build predictive models, tune their parameters, and predict how well they will perform on unseen data. All while using real world datasets. You’ll do so with Scikit-Learn. - -* [Unsupervised Learning in Python][30]: shows you how to cluster, transform, visualize, and extract insights from unlabeled datasets. At the end of the course, you’ll build a recommender system. - -* [Deep Learning in Python][31]: you’ll gain hands-on, practical knowledge of how to use deep learning with Keras 2.0, the latest version of a cutting-edge library for deep learning in Python. - -* [Applied Machine Learning in Python][32]: introduces the learner to applied ML and focuses more on the techniques and methods than on the statistics behind these methods. - - - -![](https://cdn-images-1.medium.com/max/800/1*xYFavqTjvPDUCfMVrfPr-A.png) -After the theory, take your time to apply the knowledge you have gained. - -For those who are learning ML with R, there are also these interactive courses: - -* [Introduction to Machine Learning][33] gives you a broad overview of the discipline’s most common techniques and applications. You’ll gain more insight into the assessment and training of different ML models. The rest of the course focuses on an introduction to three of the most basic ML tasks: classification, regression, and clustering. - -* [R: Unsupervised Learning][34] provides a basic introduction to clustering and dimensionality reduction in R from a ML perspective. This allows you to get from data to insights as quickly as possible. - -* [Practical Machine Learning][35] covers the basic components of building and applying prediction functions with an emphasis on practical applications. - -Lastly, there are also books that go over ML topics in a very applied way. If you’re looking to learn with the help of text and an IDE, check out these books: - -* The  [_Python Machine Learning Book_][36]  by Sebastian Raschka - -* The [Introduction to Artificial Neural Networks and Deep Learning: A Practical Guide with Applications in Python][37] by Sebastian Raschka - -* [_Machine Learning with R_][38]  by Brett Lantz - -### 4\. Practice - -Practice is even more important than getting hands-on and revising the material with Python. This step was probably the hardest one for me. Check out how other people have implemented ML algorithms when you have done some exercises. Then, get started on your own projects that illustrate your understanding of ML algorithms and theories. - -One of the most straightforward ways is to see the exercises a tiny bit bigger. You want to do a bigger exercise which requires you to do more data cleaning and feature engineering. - -* Start with[ Kaggle][39]. If you need additional help to conquer the so-called “data fear,” check out the [Kaggle Python Tutorial on Machine Learning][40]and[ Kaggle R Tutorial on Machine Learning][41]. These will bring you up to speed in no time. - -* Afterwards, you can also start doing challenges by yourself. Check out these sites, where you can find lots of ML datasets: [UCI Machine Learning Repository][42], [Public datasets for machine learning][43], and [data.world][44]. - - -![](https://cdn-images-1.medium.com/max/800/1*ZbZrcoYWENMQuKLbDkdG4A.png) -Practice makes perfect. - -### 5\. Projects - -Doing small exercises is good. But in the end, you’ll want to make a project in which you can demonstrate your understanding of the ML algorithms with which you’ve been working. - -The best exercise is to implement your own ML algorithm. You can read more about why you should do this exercise and what you can learn from it in the following pages: - -* [Why is there a need to manually implement machine learning algorithms when there are many advanced APIs like tensorflow available?][45] - -* [Why Implement Machine Learning Algorithms From Scratch?][46] - -* [What I Learned Implementing a Classifier from Scratch in Python][47] - -Next, you can check out the following posts and repositories. They’ll give you some inspiration from others and will show how they have implemented ML algorithms. - -* [How to Implement a Machine Learning Algorithm][48] - -* [ML From Scratch][49] - -* [Machine Learning Algorithms From Scratch][50] - - - -![](https://cdn-images-1.medium.com/max/800/1*k0vqKBz-LwnMElA0o2FhOg.png) -Projects can be hard at start, but they’ll increase your understanding even more. - -### 6\. Don’t stop - -Learning ML is something that should never stop. As many will confirm, there are always new things to learn — even when you’ve been working in this area for a decade. - -There are, for example, ML trends such as deep learning which are very popular right now. You might also focus on other topics that aren’t central at this point but which might be in the future. Check out this [interesting question and the answers][83] if you want to know more. - -Papers may not be the first thing that spring to mind when you’re worried about mastering the basics. But they are your way to get up to date with the latest research. Papers are not for those who are just starting out. They are definitely a good fit for those who are more advanced. - -* [Top 20 Recent Research Papers on Machine Learning and Deep Learning][51] - -* [Journal of Machine Learning Research][52] - -* [Awesome Deep Learning Papers][53] - -* [What are some of the best research papers/books for Machine learning?][54] - -Other technologies are also something to consider. But don’t worry about them when you’re just starting out. You can, for example, focus on adding Python or R (depending on which one you already know) to your skill set. You can look through this post to find interesting resources. - -If you also want to move towards big data, you could consider looking into Spark. Here are some interesting resources: - -* [Introduction to Spark in R with sparklyr][55] - -* [Data Science And Engineering With Spark][56] - -* [Introduction to Apache Spark][57] - -* [Distributed Machine Learning with Apache Spark][58] - -* [Big Data Analysis with Apache Spark][59] - -* [Apache Spark in Python: Beginner’s Guide][60] - -* [PySpark RDD Cheat Sheet][61] - -* [PySpark SQL Cheat Sheet][62]. - -Other programming languages, such as Java, JavaScript, C, and C++ are gaining importance in ML. In the long run, you can consider also adding one of these languages to your to-do list. You can use these blog posts to guide your choice: - -* [Most Popular Programming Languages for Machine Learning and Data Science][63] - -* [The Most Popular Language For Machine Learning And Data Science Is…][64] - - -![](https://cdn-images-1.medium.com/max/800/1*6J6tjlMIi0OcNdm7tyJQ4Q.png) -You’re never done learning. - -### 7\. Make use of all the material that is out there - -Machine learning is a difficult topic which can make you lose your motivation at some point. Or maybe you feel you need a change. In such cases, remember that there’s a lot of material on which you can fall back. Check out the following resources: - -Podcasts. Great resource for continuing your journey into ML and staying up-to-date with the latest developments in the field: - -* [Talking Machines][65] - -* [Data Skeptic][66] - -* [Linear Digressions][67] - -* [This Week in Machine Learning & AI][68] - -* [Learning Machines 101][69] - -There are, of course, many more podcasts. - -Documentation and package source code are two ways to get deeper into the implementation of the ML algorithms. Check out some of these repositories: - -* [Scikit- Learn][70]: Well-known Python ML package - -* [Keras][71]: Deep learning package for Python - -* [caret][72]: very popular R package for Classification and Regression Training - -Visualizations are one of the newest and trendiest ways to get into the theory of ML. They’re fantastic for beginners, but also very interesting for more advanced learners. The following visualizations will intrigue you and will help you gain more understanding into the workings of ML: - -* [A visual introduction to machine learning][73] - -* [Distill][74] makes ML Research clear, dynamic and vivid. - -* [Tensorflow — Neural Network Playground][75] if you’re looking to play around with neural network architectures. - -* More here:[ What are the best visualizations of machine learning algorithms?][76] - - -![](https://cdn-images-1.medium.com/max/800/1*nCt9ZsXRksdOMown4vuxJA.png) -Some variety in your learning can and will motivate you even more. - -### You Can Get Started Now - -Now it’s up to you. Learning ML is something that’s a continuous process, so the sooner you get started, the better. You have all of the tools in your hands now to get started. Good luck and make sure to let us know how you’re progressing. - - _This post is based on an answer I gave to the Quora question _ [_How Does A Total Beginner Start To Learn Machine Learning_][84] _._ - --------------------------------------------------------------------------------- -作者简介: - -Karlijn Willems - -Data Science Journalist - ------------------------ - -via: https://medium.freecodecamp.org/how-machines-learn-a-practical-guide-203aae23cafb - -作者:[ Karlijn Willems][a] -译者:[译者ID](https://github.com/译者ID) -校对:[校对者ID](https://github.com/校对者ID) - -本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 - -[a]:https://medium.freecodecamp.org/@kacawi -[1]:http://www.khanacademy.org/ -[2]:https://ocw.mit.edu/index.htm -[3]:https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/ -[4]:https://www.coursera.org/learn/basic-statistics -[5]:https://www.amazon.com/Linear-Algebra-Its-Applications-4th/dp/0030105676 -[6]:https://www.amazon.com/Applied-Linear-Algebra-3rd-Noble/dp/0130412600 -[7]:https://www.amazon.de/Solved-Problems-Linear-Algebra-Schaums/dp/0070380236 -[8]:https://ocw.mit.edu/courses/online-textbooks/ -[9]:https://www.datacamp.com/courses/intro-to-python-for-data-science -[10]:https://www.datacamp.com/courses/free-introduction-to-r -[11]:https://www.edx.org/course/introduction-python-data-science-microsoft-dat208x-5 -[12]:https://www.edx.org/course/introduction-r-data-science-microsoft-dat204x-4 -[13]:http://www.coursera.org/ -[14]:https://www.codecademy.com/ -[15]:http://www.cs.cmu.edu/~tom/mlbook.html -[16]:http://www.cs.bris.ac.uk/~flach/mlbook/materials/mlbook-beamer.pdf -[17]:http://www.mlyearning.org/ -[18]:https://www.amazon.com/Algorithms-Data-Structures-Applications-Practitioner/dp/0134894286 -[19]:https://www.amazon.com/Data-Mining-Masses-Matthew-North/dp/0615684378 -[20]:http://alex.smola.org/drafts/thebook.pdf -[21]:https://www.coursera.org/learn/machine-learning -[22]:https://youtu.be/TjZBTDzGeGg?list=PLnvKubj2-I2LhIibS8TOGC42xsD3-liux -[23]:https://www.datacamp.com/community/tutorials/machine-learning-python -[24]:https://www.datacamp.com/community/tutorials/machine-learning-in-r -[25]:https://www.datacamp.com/community/tutorials/deep-learning-python -[26]:http://machinelearningmastery.com/machine-learning-in-python-step-by-step/ -[27]:http://machinelearningmastery.com/tutorial-first-neural-network-python-keras/ -[28]:http://www.machinelearningmastery.com/ -[29]:https://www.datacamp.com/courses/supervised-learning-with-scikit-learn -[30]:https://www.datacamp.com/courses/unsupervised-learning-in-python -[31]:https://www.datacamp.com/courses/deep-learning-in-python -[32]:https://www.coursera.org/learn/python-machine-learning -[33]:https://www.datacamp.com/courses/introduction-to-machine-learning-with-r -[34]:https://www.datacamp.com/courses/unsupervised-learning-in-r -[35]:https://www.coursera.org/learn/practical-machine-learning -[36]:https://github.com/rasbt/python-machine-learning-book -[37]:https://github.com/rasbt/deep-learning-book -[38]:https://books.google.be/books/about/Machine_Learning_with_R.html?id=ZQu8AQAAQBAJ&source=kp_cover&redir_esc=y -[39]:http://www.kaggle.com/ -[40]:https://www.datacamp.com/community/open-courses/kaggle-python-tutorial-on-machine-learning -[41]:https://www.datacamp.com/community/open-courses/kaggle-tutorial-on-machine-learing-the-sinking-of-the-titanic -[42]:http://archive.ics.uci.edu/ml/ -[43]:http://homepages.inf.ed.ac.uk/rbf/IAPR/researchers/MLPAGES/mldat.htm -[44]:https://data.world/ -[45]:https://www.quora.com/Why-is-there-a-need-to-manually-implement-machine-learning-algorithms-when-there-are-many-advanced-APIs-like-tensorflow-available -[46]:http://www.kdnuggets.com/2016/05/implement-machine-learning-algorithms-scratch.html -[47]:http://www.jeannicholashould.com/what-i-learned-implementing-a-classifier-from-scratch.html -[48]:http://machinelearningmastery.com/how-to-implement-a-machine-learning-algorithm/ -[49]:https://github.com/eriklindernoren/ML-From-Scratch -[50]:https://github.com/madhug-nadig/Machine-Learning-Algorithms-from-Scratch -[51]:http://www.kdnuggets.com/2017/04/top-20-papers-machine-learning.html -[52]:http://www.jmlr.org/ -[53]:https://github.com/terryum/awesome-deep-learning-papers -[54]:https://www.quora.com/What-are-some-of-the-best-research-papers-books-for-Machine-learning -[55]:https://www.datacamp.com/courses/introduction-to-spark-in-r-using-sparklyr -[56]:https://www.edx.org/xseries/data-science-engineering-apache-spark -[57]:https://www.edx.org/course/introduction-apache-spark-uc-berkeleyx-cs105x -[58]:https://www.edx.org/course/distributed-machine-learning-apache-uc-berkeleyx-cs120x -[59]:https://www.edx.org/course/big-data-analysis-apache-spark-uc-berkeleyx-cs110x -[60]:https://www.datacamp.com/community/tutorials/apache-spark-python -[61]:https://www.datacamp.com/community/blog/pyspark-cheat-sheet-python -[62]:https://www.datacamp.com/community/blog/pyspark-sql-cheat-sheet -[63]:https://fossbytes.com/popular-top-programming-languages-machine-learning-data-science/ -[64]:http://www.kdnuggets.com/2017/01/most-popular-language-machine-learning-data-science.html -[65]:http://www.thetalkingmachines.com/ -[66]:https://dataskeptic.com/ -[67]:http://lineardigressions.com/ -[68]:https://twimlai.com/ -[69]:http://www.learningmachines101.com/ -[70]:https://github.com/scikit-learn/scikit-learn -[71]:http://www.github.com/fchollet/keras -[72]:http://topepo/caret -[73]:http://www.r2d3.us/visual-intro-to-machine-learning-part-1/ -[74]:http://distill.pub/ -[75]:http://playground.tensorflow.org/ -[76]:https://www.quora.com/What-are-the-best-visualizations-of-machine-learning-algorithms -[77]:https://www.datacamp.com/community/tutorials/learn-data-science-infographic -[78]:https://www.datacamp.com/community/tutorials/python-statistics-data-science -[79]:https://www.datacamp.com/community/tutorials/python-scipy-tutorial -[80]:http://www.math.pitt.edu/~siam/workshops/python10/python.pdf -[81]:http://docs.sympy.org/latest/tutorial/calculus.html -[82]:http://web.cecs.pdx.edu/~mm/MachineLearningSpring2017/ -[83]:https://www.quora.com/Should-I-quit-machine-learning -[84]:https://www.quora.com/How-does-a-total-beginner-start-to-learn-machine-learning/answer/Karlijn-Willems-1 diff --git a/translated/tech/20170823 How Machines Learn A Practical Guide.md b/translated/tech/20170823 How Machines Learn A Practical Guide.md new file mode 100644 index 0000000000..d924be36c9 --- /dev/null +++ b/translated/tech/20170823 How Machines Learn A Practical Guide.md @@ -0,0 +1,376 @@ +使用指南:机器是如何学习的 +============================================================ + +![](https://cdn-images-1.medium.com/max/1000/1*MxSBSJIqK19z2qhfspPL-g.png) + +你可能通过多种途径听说过机器学习,比如垃圾邮件过滤,光学字符识别和计算机视觉。 + +开启机器学习之旅是一个涉及多方面的漫长旅途。对于新手,有很多的书籍,有学术论文,有指导联系,有独立项目。在这些众多的选择里面,很容易丢失你最初想学习的目标。 + +所以在今天的文档中,我会列出7个步骤(和50多个资源)帮助你开启这个令人兴奋的计算机科学领域的大门,并逐渐成为一个机器学习的大神。 + +请注意,这个资源列表并不详尽,只是为了让你入门。 除此之外,还有更多的资源。 + +### 1\. 学习必要的背景知识 + +你可能还记得 DataCamp 网站上的[学习数据科学][77]这篇文章里面的信息图:数学和统计学是开始机器学习(ML)的关键。 基础可能看起来很容易,因为它只是三个主题。 但不要忘记这些实际上是三个广泛的话题。 + +在这里需要记住两件非常重要的事情: + +- 首先,你一定会想要进一步的指导,关于开始机器学习需要覆盖那些知识点。 +- 其次,这些是你进一步学习的基础。 不要害怕花时间。学习将会使用到的知识。 + +第一点很简单:学习线性代数和统计学是个好主意。这两门知识是必须要理解的。但是在你学习的同时,也应该尝试学习优化和高级演算等主题。当你越来越深入 ML 的时候,他们就能派上用场。 + +如果是从零开始的,这里有一些入门指南可供参考: + +* [Khan 学院][1] 对于初学者是非常好的资源。 用于研究线性代数和微积分课程。 + +* 在 [麻省理工学院 OpenCourseWare][2] 网站上学习[ 线性代数][3] 课程。 + +* [Coursera course][4] 网站上对描述性统计,概率论和推论统计做的一个介绍。 + + +![](https://cdn-images-1.medium.com/max/800/1*Uw8YXNlt5VGKTXFDbtFEig.png) +统计是学习ML的关键之一 + +如果你更多的是阅读书籍,请参考以下内容: + +* [_线性代数及其应用_][5] _,_ + +* [_应用线性代数_][6] , + +* [_线性代数解决的3000个问题_][7] _,_ + +* [麻省理工学院在线教材][8] + +然而,在大多数情况下,你都会对统计和数学有一个初步的了解。很有可能你已经浏览过上面列举的的那些资源。 + +在这种情况下,真实的回顾和评价你的知识是一个好主意。是否有一些领域是需要复习的,或者现在掌握的比较好的? + +如果你一切都准备好了,那么现在是时候使用R或者Python应用这些知识了。作为一个通用的指导方针,选择一门语言开始是个好主意。另外,你仍然可以将另一门语言加入到你的技能池里。 + +为什么这些编程知识是必需的? + +嗯,你会看到上面列出的课程(或你在学校或大学学习的课程)将为你提供更多的关于数学和统计学主题的的理论(而不是应用的)介绍。 然而,ML非常方便应用,你需要能够应用你所学到的所有主题。 所以最好再次复习一遍之前的材料,但是这次需要付诸应用。 + +如果你想掌握R和Python的基础,可以看以下课程: + +* DataCamp 上 Python 或者 R 的介绍性课程: [Python语言数据科学介绍][9] 或者 [R语言编程介绍][10]。 + +* Edx上 Python 或者 R 的介绍性课程: [Python语言数据科学介绍][11] 和[R语言数据科学介绍][12]。 + +* 还有很多其他免费的课程。查看 [Coursera][13] 或者 [Codeacademy][14] 了解更多。 + +当你确定基础知识后,请查看DataCamp上的博客[40+ Python Statistics For Data Science Resources][78]。 这篇文章提供了统计学方面的40多个资源,这些资源都是你开始数据科学(已经ML)需要学习的。 + +还要确保你查看了关于向量和数组的 [this SciPy tutorial][79]文章,以及使用Python进行科学计算的[研讨会][80] + +要使用Python和微积分进行实践,你可以查看[SymPy软件包][81]。 + +### 2\. 不要害怕在ML的理论上浪费时间 + +很多人并不会努力的去浏览更多的理论材料,因为理论是枯燥的无聊的。但从长远来看,在理论知识上投入时间是至关重要的,非常值得的。 然后,你将更好地了解机器学习的新进展,也能和背景知识结合起来。 这将有助于你保持积极性。 + +此外,理论并不会多无聊。 正如你在介绍中所看到的,你可以借助非常多的资料深入学习。 + +书籍是吸收理论知识的最佳途径之一。 它们会迫使你停下来想一会儿。 当然,看书是一件非常平静的事情,可能不符合你的学习风格。 不过,请尝试阅读下列书籍,看看它是否适合你: + +* [_Machine Learning textbook_][15] , Tom Mitchell著,书可能比较旧,但是却很经典。这本书很好的解释介绍了机器学习中最重要的课题,步骤详尽,逐层深入。 +* _Machine Learning: The Art and Science of Algorithms that Make Sense of Data_(你可以在[这里][16]看到这本书的幻灯片版本):这本书对初学者来说非常棒。 里面讨论了许多实践中的应用程序,其中有一些是在Tom Mitchell的书中缺少的。 +* [_Machine Learning Yearning_][17] :这本书由Andrew Ng编写的,还不完整,但对于那些正在学习ML的学生来说,这一定是很好的参考资料。 +* [_Algorithms and Data Structures_][18]  由Jurg Nievergelt和Klaus Hinrichs著。 +* 也可以参阅Matthew North的[_Data Mining for the Masses_][19] 。 你会发现这本书引导你完成一些最困难的话题。 +* [_Introduction to Machine Learning_][20]  由Alex Smola和S.V.N. Vishwanathan著。 + + +![](https://cdn-images-1.medium.com/max/800/1*TpLLAIKIRVHq6VQs3Q9IJA.png) +花些时间看书并研究其中涵盖的资料 + +视频 / 慕课对于边听边看来学习的人来说非常棒。 慕课和视频非常的多,多到可能你都很难找到适合你的。 下面列出了最知名的几个: + +* [This well-known Machine Learning MOOC][21],是Andrew Ng讲的,介绍了机器学习及其理论。 别担心,这个慕课讲的非常好,一步一步深入,所以对初学者来说非常合适。 + +* [playlist of the MIT Open Courseware 6034 course][22],已经非常前言了。 在你开始本系列之前,你需要做一些ML理论方面的准备工作,但是你不会后悔的。 + +在这一点上,重要的是要将各种独立的技术融会贯通,行程整体的结构图。 首先了解关键的概念:监督和无监督学习的区别,分类和回归等。 手动(书面)练习可以派上用场。 能帮你了解算法是如何工作以及如何应用这些算法。 在大学课程里你经常会找到一些书面练习。 看看波特兰州立大学的[ML课程][82]。 + +### 3\. 开始动手 + +通过看书和看视频了解理论和算法都非常好,但是需要超越这一阶段,开始做一些联系。你要学着去实现这些算法,应用学到的理论。 + +首先,有很多介绍Python和R方面的机器学习的基础知识。当然最好的方法就是使用交互式教程: + +- 在Python机器学习:Scikit学习教程中,能了解有关知名算法KMeans和支持向量机(SVM)的更多信息,以使用Scikit-Learn构建模型。 +- R中的机器学习为初学者提供了带有类和插入包的R中的ML。 +- 克拉斯教程:Python中的深度学习涵盖了如何为分类和回归任务构建多层感知器(MLP),一步一步。 + +* [Python Machine Learning: Scikit-Learn Tutorial][23], 在这边教程里面,你可以学到使用Scikit-Learn 构建模型的KMeans 和 支持向量机 (SVM) 相关的知名算法。 + +* [Machine Learning in R for beginners][24] 用R中的类和caret包介绍机器学习。 + +* [Keras Tutorial: Deep Learning in Python ][25] 涵盖了如何一步一步的为分类和回归任务构建多层感知器。 + +还请查看以下静态的教程,这些需要你在IDE中操作: + +- 使用Keras开发你的第一个神经网络的步骤:通过本教程,学习如何使用Keras开发你的第一个神经网络。 +- 还有更多的你可以考虑,但机器学习掌握的教程是非常好的。 + +* [Machine Learning in Python, Step By Step][26]: 一步一步学习Scikit-Learn。 + +* [Develop Your First Neural Network in Python With Keras Step-By-Step][27]: 一步一步使用Keras开发你的第一个神经网络。 + +* 你可以考虑看更多的教程,但是[Machine Learning Mastery][28]这篇教程是非常好的。 + +除了教程之外,还有一些课程。 参加课程可以帮助你系统性的应用学到的概念。 经验丰富的导师很有帮助。 以下是Python和机器学习的一些互动课程: + +* [Supervised Learning with scikit-learn][29]: 学习如何构建预测模型,调整参数,并预测在未知数据上执行的效果。使用Scikit-Learn操作所有真实世界的数据集。 + +* [Unsupervised Learning in Python][30]: 显示如何从未标记的数据集进行聚类,转换,可视化,提取关键信息。 在课程结束时,还会构建一个推荐系统。 + +* [Deep Learning in Python][31]: 你将获得如何使用Keras 2.0深入学习的实践知识,Keras 2.0是Python深度学习前沿库Keras的最新版本。 + +* [Applied Machine Learning in Python][32]: 将学习者引入到机器学习实践中,更多地关注技术和方法,而不是基于这些方法的统计。 + + +![](https://cdn-images-1.medium.com/max/800/1*xYFavqTjvPDUCfMVrfPr-A.png) +理论学习之后,花点时间来应用你所学到的知识。 + +对于那些正在学习R语言机器学习的人,还有这些互动课程: + +- 实用机器学习涵盖构建和应用预测功能的基本组成部分,重点是实际应用。 + +* [Introduction to Machine Learning][33] 广泛了解机器学习学科最常见的技术和应用,还可以更多的了解不同机器学习模型的评估和培训。这门课程剩下的部分重点介绍三个最基本的机器学习任务: 分类,回归和聚类。 + +* [R: Unsupervised Learning][34] 提供从ML角度对R的聚类和降维的基本介绍。 你可以尽快获得数据的关键信息。 + +* [Practical Machine Learning][35] 涵盖了构建和应用预测功能的基本组成部分,重点是实际应用。 + +最后,还有很多书籍以偏向实践的方式介绍了ML主题。 如果你想借助书籍内容和IDE来学习,请查看这些书籍: + +* [_Python Machine Learning Book_][36]  Sebastian Raschka 著 + +* [Introduction to Artificial Neural Networks and Deep Learning: A Practical Guide with Applications in Python][37] Sebastian Raschka 著 + +* [_Machine Learning with R_][38]  Brett Lantz 著 + +### 4\. 练习 + +实践比使用Python进行实践和复习资料更重要。 这一步对我来说可能是最难的。 在做了一些练习后看看其他人是如何实现ML算法的。 然后,开始你自己的项目,阐述你对ML算法和理论的理解。 + +最直接的方法之一就是将练习做得更大些。 你想做一个更大的练习,就需要你做更多的数据清理和功能工程。 + +- 从[ Kaggle][39]开始。 如果你需要额外的帮助来征服所谓的“数据恐惧”,请查看[Kaggle Python Tutorial on Machine Learning][40] 和 [ Kaggle R Tutorial on Machine Learning][41]。 这些将带给您快速的提升。 +- 此后,你也可以自己开始挑战。 查看这些网站,您可以在其中找到大量的ML数据集:[UCI Machine Learning Repository][42],[Public datasets for machine learning][43] 和 [data.world][44]。 + +![](https://cdn-images-1.medium.com/max/800/1*ZbZrcoYWENMQuKLbDkdG4A.png) +熟能生巧。 + +### 5\. 项目 + +做小练习是好的。 但是在最后,您需要做一个项目,可以在其中展示您对使用到的ML算法的理解。 + +最好的练习是实现你自己的ML算法。 您可以在以下页面中阅读更多关于为什么您应该做这样的练习以及您可以从中学到什么内容: + +- [为什么有许多先进的API,比如tensorflow,还需要自己手动实现机器学习的算法?][45] +- [为什么从头开始实现机器学习算法?][46] +- [使用python从头开始实现一个分类器,我能从中学到什么][47] + +Next, you can check out the following posts and repositories. They’ll give you some inspiration from others and will show how they have implemented ML algorithms. + +接下来,您可以查看以下文章和仓库。 可以从中获得一些灵感,并且了解他们如何实现ML算法。 + +- [如何实现机器学习算法][48] +- [从头开始学习机器学习][49] +- [从头开始学习机器学习算法][50] + + +![](https://cdn-images-1.medium.com/max/800/1*k0vqKBz-LwnMElA0o2FhOg.png) +开始时项目可能会很难,但是可以极大增加你的理解。 + +### 6\. 不要停止 + +机器学习永远不能停止,即使你在这个领域工作了十年,总是有新的东西要学习,许多人都将会证实这一点。 + +例如,ML学习趋势,比如深度学习,现在就很受欢迎。你也可以专注于那些现在不怎么火,但是将来会火的话题上。如果你想了解更多,可以看看[这个有趣的问题和答案][83]。 + +当你担心掌握基础知识时,你最先想到的可能不是论文。 但是他们是你紧跟最新研究的一个途径。 论文并不适合刚刚开始学习的人,但是绝对适合高级人员。 + +- [机器学习和深度学习领域20篇最新的顶级研究论文][51] +- [机器学习研究杂志][52] +- [优秀的深度学习论文][53] +- [机器学习的一些最好的研究论文/书籍是什么?][54] + +其他技术也是需要考虑的。 但是当你刚开始学习时,不要担心这些。 例如,您可以专注于Python或R(取决于你已经知道哪一个),并把他到你的技能池里。 你可以通过这篇文章来查找有趣的资源。 + +如果您还想转向大数据,您可以考虑研究Spark。 这里有一些有趣的资源: + +* [Introduction to Spark in R with sparklyr][55] + +* [Data Science And Engineering With Spark][56] + +* [Introduction to Apache Spark][57] + +* [Distributed Machine Learning with Apache Spark][58] + +* [Big Data Analysis with Apache Spark][59] + +* [Apache Spark in Python: Beginner’s Guide][60] + +* [PySpark RDD Cheat Sheet][61] + +* [PySpark SQL Cheat Sheet][62]. + +其他编程语言,比如Java,JavaScript,C和C ++在ML中越来越重要。 从长远来看,您可以考虑将其中一种语言添加到待办事项列表中。 你可以使用这些博客文章来指导你选择: + +* [机器学习和数据科学最流行的编程语言][63] +* [机器学习和数据科学最流行的语言是...][64] + + +![](https://cdn-images-1.medium.com/max/800/1*6J6tjlMIi0OcNdm7tyJQ4Q.png) +学无止境。 + +### 7\. 利用一切可以利用的资源 + +机器学习是一个充满难度的话题,有时候可能会让你失去动力。 或者也许你觉得你需要点改变。 在这种情况下,请记住,有很多资源可以让你打消掉这种想法。 查看以下资源: + +播客。 伟大的资源,让你继续你的ML旅程,紧跟这个领域最新的发展: + +* [Talking Machines][65] + +* [Data Skeptic][66] + +* [Linear Digressions][67] + +* [This Week in Machine Learning & AI][68] + +* [Learning Machines 101][69] + +当然,还有更多的播客。 + +文档和包的源代码是深入了解ML算法的实现的两种方法。 查看这些仓库: + +Documentation and package source code are two ways to get deeper into the implementation of the ML algorithms. Check out some of these repositories: + +* [Scikit- Learn][70]: 知名的Python ML包 + +* [Keras][71]: Python深度学习软件包 + +* [caret][72]: 用于分类和回归训练非常受欢迎的R包 + +可视化是深入ML理论的最新也是最流行的方式之一。 他们对初学者来说非常棒,但对于更高级的学习者来说也是非常有趣的。 你肯定会被下面这些可视化资源所吸引,能更加了解ML的工作原理: + +- [机器学习的可视化介绍][73] +- [Distill][74] 使ML研究清晰,动态和生动。 +- [Tensorflow - 神经网络游乐场][75],如果你想玩下神经网络架构。 +- 更多的看这里:[机器学习算法最佳的可视化方法是什么?][76] + +![](https://cdn-images-1.medium.com/max/800/1*nCt9ZsXRksdOMown4vuxJA.png) +学习中的一些变化更加能激励你。 + +### 现在你可以开始了 + +现在一切都取决于你自己了。学习机器学习是一个持续的过程,所以开始的越早就会越好。 运用你手边的一切工具开始吧。 祝你好运,并确保让我们知道你的进步。 + +_这篇文章是我基于Quora问题给出的答案,[小白该如何开始机器学习][84]。_ + +-------------------------------------------------------------------------------- +作者简介: + +Karlijn Willems,数据科学记者 + +----------------------- + +via: https://medium.freecodecamp.org/how-machines-learn-a-practical-guide-203aae23cafb + +作者:[ Karlijn Willems][a] +译者:[Flowsnow](https://github.com/Flowsnow) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 + +[a]:https://medium.freecodecamp.org/@kacawi +[1]:http://www.khanacademy.org/ +[2]:https://ocw.mit.edu/index.htm +[3]:https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/ +[4]:https://www.coursera.org/learn/basic-statistics +[5]:https://www.amazon.com/Linear-Algebra-Its-Applications-4th/dp/0030105676 +[6]:https://www.amazon.com/Applied-Linear-Algebra-3rd-Noble/dp/0130412600 +[7]:https://www.amazon.de/Solved-Problems-Linear-Algebra-Schaums/dp/0070380236 +[8]:https://ocw.mit.edu/courses/online-textbooks/ +[9]:https://www.datacamp.com/courses/intro-to-python-for-data-science +[10]:https://www.datacamp.com/courses/free-introduction-to-r +[11]:https://www.edx.org/course/introduction-python-data-science-microsoft-dat208x-5 +[12]:https://www.edx.org/course/introduction-r-data-science-microsoft-dat204x-4 +[13]:http://www.coursera.org/ +[14]:https://www.codecademy.com/ +[15]:http://www.cs.cmu.edu/~tom/mlbook.html +[16]:http://www.cs.bris.ac.uk/~flach/mlbook/materials/mlbook-beamer.pdf +[17]:http://www.mlyearning.org/ +[18]:https://www.amazon.com/Algorithms-Data-Structures-Applications-Practitioner/dp/0134894286 +[19]:https://www.amazon.com/Data-Mining-Masses-Matthew-North/dp/0615684378 +[20]:http://alex.smola.org/drafts/thebook.pdf +[21]:https://www.coursera.org/learn/machine-learning +[22]:https://youtu.be/TjZBTDzGeGg?list=PLnvKubj2-I2LhIibS8TOGC42xsD3-liux +[23]:https://www.datacamp.com/community/tutorials/machine-learning-python +[24]:https://www.datacamp.com/community/tutorials/machine-learning-in-r +[25]:https://www.datacamp.com/community/tutorials/deep-learning-python +[26]:http://machinelearningmastery.com/machine-learning-in-python-step-by-step/ +[27]:http://machinelearningmastery.com/tutorial-first-neural-network-python-keras/ +[28]:http://www.machinelearningmastery.com/ +[29]:https://www.datacamp.com/courses/supervised-learning-with-scikit-learn +[30]:https://www.datacamp.com/courses/unsupervised-learning-in-python +[31]:https://www.datacamp.com/courses/deep-learning-in-python +[32]:https://www.coursera.org/learn/python-machine-learning +[33]:https://www.datacamp.com/courses/introduction-to-machine-learning-with-r +[34]:https://www.datacamp.com/courses/unsupervised-learning-in-r +[35]:https://www.coursera.org/learn/practical-machine-learning +[36]:https://github.com/rasbt/python-machine-learning-book +[37]:https://github.com/rasbt/deep-learning-book +[38]:https://books.google.be/books/about/Machine_Learning_with_R.html?id=ZQu8AQAAQBAJ&source=kp_cover&redir_esc=y +[39]:http://www.kaggle.com/ +[40]:https://www.datacamp.com/community/open-courses/kaggle-python-tutorial-on-machine-learning +[41]:https://www.datacamp.com/community/open-courses/kaggle-tutorial-on-machine-learing-the-sinking-of-the-titanic +[42]:http://archive.ics.uci.edu/ml/ +[43]:http://homepages.inf.ed.ac.uk/rbf/IAPR/researchers/MLPAGES/mldat.htm +[44]:https://data.world/ +[45]:https://www.quora.com/Why-is-there-a-need-to-manually-implement-machine-learning-algorithms-when-there-are-many-advanced-APIs-like-tensorflow-available +[46]:http://www.kdnuggets.com/2016/05/implement-machine-learning-algorithms-scratch.html +[47]:http://www.jeannicholashould.com/what-i-learned-implementing-a-classifier-from-scratch.html +[48]:http://machinelearningmastery.com/how-to-implement-a-machine-learning-algorithm/ +[49]:https://github.com/eriklindernoren/ML-From-Scratch +[50]:https://github.com/madhug-nadig/Machine-Learning-Algorithms-from-Scratch +[51]:http://www.kdnuggets.com/2017/04/top-20-papers-machine-learning.html +[52]:http://www.jmlr.org/ +[53]:https://github.com/terryum/awesome-deep-learning-papers +[54]:https://www.quora.com/What-are-some-of-the-best-research-papers-books-for-Machine-learning +[55]:https://www.datacamp.com/courses/introduction-to-spark-in-r-using-sparklyr +[56]:https://www.edx.org/xseries/data-science-engineering-apache-spark +[57]:https://www.edx.org/course/introduction-apache-spark-uc-berkeleyx-cs105x +[58]:https://www.edx.org/course/distributed-machine-learning-apache-uc-berkeleyx-cs120x +[59]:https://www.edx.org/course/big-data-analysis-apache-spark-uc-berkeleyx-cs110x +[60]:https://www.datacamp.com/community/tutorials/apache-spark-python +[61]:https://www.datacamp.com/community/blog/pyspark-cheat-sheet-python +[62]:https://www.datacamp.com/community/blog/pyspark-sql-cheat-sheet +[63]:https://fossbytes.com/popular-top-programming-languages-machine-learning-data-science/ +[64]:http://www.kdnuggets.com/2017/01/most-popular-language-machine-learning-data-science.html +[65]:http://www.thetalkingmachines.com/ +[66]:https://dataskeptic.com/ +[67]:http://lineardigressions.com/ +[68]:https://twimlai.com/ +[69]:http://www.learningmachines101.com/ +[70]:https://github.com/scikit-learn/scikit-learn +[71]:http://www.github.com/fchollet/keras +[72]:http://topepo/caret +[73]:http://www.r2d3.us/visual-intro-to-machine-learning-part-1/ +[74]:http://distill.pub/ +[75]:http://playground.tensorflow.org/ +[76]:https://www.quora.com/What-are-the-best-visualizations-of-machine-learning-algorithms +[77]:https://www.datacamp.com/community/tutorials/learn-data-science-infographic +[78]:https://www.datacamp.com/community/tutorials/python-statistics-data-science +[79]:https://www.datacamp.com/community/tutorials/python-scipy-tutorial +[80]:http://www.math.pitt.edu/~siam/workshops/python10/python.pdf +[81]:http://docs.sympy.org/latest/tutorial/calculus.html +[82]:http://web.cecs.pdx.edu/~mm/MachineLearningSpring2017/ +[83]:https://www.quora.com/Should-I-quit-machine-learning +[84]:https://www.quora.com/How-does-a-total-beginner-start-to-learn-machine-learning/answer/Karlijn-Willems-1