- Joined
- 13 June 2007
- Posts
- 838
- Reactions
- 136
This message could have been posted as a response to a question -- "How to get started with Python and machine learning?" -- asked in another thread
I think a separate thread devoted to tools and techniques for machine learning will be valuable. Here goes.
====================
Here is a link to the bibliography that is an appendix to the "Foundations" book.
http://www.blueowlpress.com/wp-content/uploads/2016/08/FT-Bibliography-Appendix-D.pdf
There are two areas you will need to study.
1. Python.
2. Machine Learning.
-------------------
Python is your base language. Unless you already have substantial experience with, and support for, R, look no further. If you are uncertain and trying to deciding between Python and R, choose Python. Do not learn another language in preparation of learning Python. The pandas library of Python is very similar to the libraries of R, so quite a lot of R experience will transfer to Python easily. But the data science profession is overwhelmingly moving to Python over R for application beyond statistics.
Download and install the Anaconda distribution of Python.
https://www.continuum.io/downloads
It is free. It is available for Windows, Mac, Unix/Linux. It is the widely accepted standard Python. Most texts recommend Anaconda.
There are two major versions -- Python 2 and Python 3. I am still using version 2. Version 3 has been available for several years. Machine learning depends on libraries that extend the capabilities of the base language. Python 2 and Python 3 have some incompatibilities. Many of those libraries are available for both versions, but not all. Progress is being made in converting everything to Version 3, but many practitioners continue with Version 2. The changes to the base language are minor and will not seriously confuse people programming straight Python. Learn either.
Anaconda Python comes with several development platforms. Two that you will want to consider are Spyder and Jupyter.
Spyder includes an editor and execution module all-in-one.
Jupyter is an outgrowth of iPython Notebook. It includes editing, execution, and documentation all-in-one.
You can sortof move back and forth between them, but I recommend picking one and using it exclusively.
To be clear -- installing Anaconda Python will automatically install both Spyder and Jupyter. Your choice is which to use day-by-day.
Juypter's website:
http://jupyter.org/
-------------------
For home study of Python, there are numerous texts, pocket guides, free online courses, and paid online courses.
I like the work of Dr. Allen Downey. He has written several books, including "Think Python" which can be legally downloaded for free:
http://greenteapress.com/thinkpython/thinkpython.pdf
Or buy a printed copy from Amazon.
Many people like the approach where the student does a lot of exercises -- not downloading or using cut and paste. "Learn Python the Hard Way" is one of the better. Here is a link to a version that can be read online for free:
https://learnpythonthehardway.org/book/
Or buy a printed copy from Amazon.
Coursera has offered several Python courses, ranging from absolute beginner to relatively advanced. Check to see what is available for the time period you plan to study. Some of the previous courses have been archived and resources, including videos of lectures, can be downloaded. Coursera is in the process of changing from free to paid. For most courses, but not all, you can still enroll and get access to the materials for free. I have watched the videos from several of these. None that I have seen are, in my opinion, excellent. Several are poor. Your method of learning will influence how effective each courses is for you.
https://www.coursera.org/courses?languages=en&query=python
---------------------
For home study of machine learning, there is much to learn and there are many sources.
Among the many points to keep in mind, one is very important. Building machine learning models to identify profitable trades requires everything that learning to differentiate between species of iris or determining whether a borrower is likely to repay a loan requires. It also requires that the time sequence organization of the data and the monotonic increase in efficiency of the markets as time progresses be recognized and properly dealt with. I know of no book or online material that adequately addresses these special requirements. Indeed, several seem to intentionally disregard them. Begin by watching my video on "The Importance of Being Stationary."
http://www.blueowlpress.com/video-presentations
For a basic university-level introduction to machine learning, Dr. Andrew Ng's Stanford Open Classroom course is very good:
http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning
I also like Dr. Yaser Abu-Mostafa's Cal Tech Online Course:
https://work.caltech.edu/
To incorporate machine learning into Python, a key library is pandas. Pandas is a Python library for data handling, with particular features for time series. The pandas library was developed by Wes McKinney while he was an analyst at Cliff Asness' AQR Capital Management hedge fund. Wes has left AQR but continues to be active in the applications of machine learning. There are several videos of his presentations on YouTube. His book, "Python for Data Analysis," was the first of several that describe use of pandas:
https://www.amazon.com/Python-Data-...8064&sr=8-2&keywords=python+for+data+analysis
Dr. Jake Vanderplas is an astronomer at the University of Washington who is very active in use of Python, pandas, and machine learning. His book, "Python Data Science Handbook," is outstanding:
https://www.amazon.com/Python-Data-...&keywords=python+for+data+analysis+vanderplas
Also watch his presentation, many posted to YouTube.
For some of the details of machine learning techniques, I like Sebastian Raschka's books:
"Python Machine Learning"
https://www.amazon.com/Python-Machine-Learning-Sebastian-Raschka/dp/1783555130/ref=sr_1_2?ie=UTF8&qid=1488218373&sr=8-2&keywords=Raschka,+Sebastian
"Python, Deeper Insights into Machine Learning"
https://www.amazon.com/dp/B01LD8K994/ref=rdr_kindle_ext_tmb
--------------
There are many more resources available. But this much is probably already an overload. I hope this helps getting started.
Best, Howard
I think a separate thread devoted to tools and techniques for machine learning will be valuable. Here goes.
====================
Here is a link to the bibliography that is an appendix to the "Foundations" book.
http://www.blueowlpress.com/wp-content/uploads/2016/08/FT-Bibliography-Appendix-D.pdf
There are two areas you will need to study.
1. Python.
2. Machine Learning.
-------------------
Python is your base language. Unless you already have substantial experience with, and support for, R, look no further. If you are uncertain and trying to deciding between Python and R, choose Python. Do not learn another language in preparation of learning Python. The pandas library of Python is very similar to the libraries of R, so quite a lot of R experience will transfer to Python easily. But the data science profession is overwhelmingly moving to Python over R for application beyond statistics.
Download and install the Anaconda distribution of Python.
https://www.continuum.io/downloads
It is free. It is available for Windows, Mac, Unix/Linux. It is the widely accepted standard Python. Most texts recommend Anaconda.
There are two major versions -- Python 2 and Python 3. I am still using version 2. Version 3 has been available for several years. Machine learning depends on libraries that extend the capabilities of the base language. Python 2 and Python 3 have some incompatibilities. Many of those libraries are available for both versions, but not all. Progress is being made in converting everything to Version 3, but many practitioners continue with Version 2. The changes to the base language are minor and will not seriously confuse people programming straight Python. Learn either.
Anaconda Python comes with several development platforms. Two that you will want to consider are Spyder and Jupyter.
Spyder includes an editor and execution module all-in-one.
Jupyter is an outgrowth of iPython Notebook. It includes editing, execution, and documentation all-in-one.
You can sortof move back and forth between them, but I recommend picking one and using it exclusively.
To be clear -- installing Anaconda Python will automatically install both Spyder and Jupyter. Your choice is which to use day-by-day.
Juypter's website:
http://jupyter.org/
-------------------
For home study of Python, there are numerous texts, pocket guides, free online courses, and paid online courses.
I like the work of Dr. Allen Downey. He has written several books, including "Think Python" which can be legally downloaded for free:
http://greenteapress.com/thinkpython/thinkpython.pdf
Or buy a printed copy from Amazon.
Many people like the approach where the student does a lot of exercises -- not downloading or using cut and paste. "Learn Python the Hard Way" is one of the better. Here is a link to a version that can be read online for free:
https://learnpythonthehardway.org/book/
Or buy a printed copy from Amazon.
Coursera has offered several Python courses, ranging from absolute beginner to relatively advanced. Check to see what is available for the time period you plan to study. Some of the previous courses have been archived and resources, including videos of lectures, can be downloaded. Coursera is in the process of changing from free to paid. For most courses, but not all, you can still enroll and get access to the materials for free. I have watched the videos from several of these. None that I have seen are, in my opinion, excellent. Several are poor. Your method of learning will influence how effective each courses is for you.
https://www.coursera.org/courses?languages=en&query=python
---------------------
For home study of machine learning, there is much to learn and there are many sources.
Among the many points to keep in mind, one is very important. Building machine learning models to identify profitable trades requires everything that learning to differentiate between species of iris or determining whether a borrower is likely to repay a loan requires. It also requires that the time sequence organization of the data and the monotonic increase in efficiency of the markets as time progresses be recognized and properly dealt with. I know of no book or online material that adequately addresses these special requirements. Indeed, several seem to intentionally disregard them. Begin by watching my video on "The Importance of Being Stationary."
http://www.blueowlpress.com/video-presentations
For a basic university-level introduction to machine learning, Dr. Andrew Ng's Stanford Open Classroom course is very good:
http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning
I also like Dr. Yaser Abu-Mostafa's Cal Tech Online Course:
https://work.caltech.edu/
To incorporate machine learning into Python, a key library is pandas. Pandas is a Python library for data handling, with particular features for time series. The pandas library was developed by Wes McKinney while he was an analyst at Cliff Asness' AQR Capital Management hedge fund. Wes has left AQR but continues to be active in the applications of machine learning. There are several videos of his presentations on YouTube. His book, "Python for Data Analysis," was the first of several that describe use of pandas:
https://www.amazon.com/Python-Data-...8064&sr=8-2&keywords=python+for+data+analysis
Dr. Jake Vanderplas is an astronomer at the University of Washington who is very active in use of Python, pandas, and machine learning. His book, "Python Data Science Handbook," is outstanding:
https://www.amazon.com/Python-Data-...&keywords=python+for+data+analysis+vanderplas
Also watch his presentation, many posted to YouTube.
For some of the details of machine learning techniques, I like Sebastian Raschka's books:
"Python Machine Learning"
https://www.amazon.com/Python-Machine-Learning-Sebastian-Raschka/dp/1783555130/ref=sr_1_2?ie=UTF8&qid=1488218373&sr=8-2&keywords=Raschka,+Sebastian
"Python, Deeper Insights into Machine Learning"
https://www.amazon.com/dp/B01LD8K994/ref=rdr_kindle_ext_tmb
--------------
There are many more resources available. But this much is probably already an overload. I hope this helps getting started.
Best, Howard
Last edited: