# Getting Started in Machine Learning for Trading



## howardbandy (28 February 2017)

This message could have been posted as a response to a question -- "How to get started with Python and machine learning?" -- asked in another thread

I think a separate thread devoted to tools and techniques for machine learning will be valuable.  Here goes.

====================

Here is a link to the bibliography that is an appendix to the "Foundations" book. 
http://www.blueowlpress.com/wp-content/uploads/2016/08/FT-Bibliography-Appendix-D.pdf

There are two areas you will need to study.
1.  Python.
2.  Machine Learning.

-------------------

Python is your base language.  Unless you already have substantial experience with, and support for, R, look no further.  If you are uncertain and trying to deciding between Python and R, choose Python.  Do not learn another language in preparation of learning Python.  The pandas library of Python is very similar to the libraries of R, so quite a lot of R experience will transfer to Python easily.  But the data science profession is overwhelmingly moving to Python over R for application beyond statistics. 

Download and install the Anaconda distribution of Python.
https://www.continuum.io/downloads

It is free.  It is available for Windows, Mac, Unix/Linux.  It is the widely accepted standard Python.  Most texts recommend Anaconda. 

There are two major versions -- Python 2 and Python 3.  I am still using version 2.  Version 3 has been available for several years.  Machine learning depends on libraries that extend the capabilities of the base language.  Python 2 and Python 3 have some incompatibilities.  Many of those libraries are available for both versions, but not all.  Progress is being made in converting everything to Version 3, but many practitioners continue with Version 2.  The changes to the base language are minor and will not seriously confuse people programming straight Python.  Learn either.

Anaconda Python comes with several development platforms.  Two that you will want to consider are Spyder and Jupyter.
Spyder includes an editor and execution module all-in-one.
Jupyter is an outgrowth of iPython Notebook.  It includes editing, execution, and documentation all-in-one. 
You can sortof move back and forth between them, but I recommend picking one and using it exclusively.
To be clear -- installing Anaconda Python will automatically install both Spyder and Jupyter.  Your choice is which to use day-by-day.
Juypter's website:
http://jupyter.org/ 

-------------------

For home study of Python, there are numerous texts, pocket guides, free online courses, and paid online courses. 

I like the work of Dr. Allen Downey.  He has written several books, including "Think Python" which can be legally downloaded for free:
http://greenteapress.com/thinkpython/thinkpython.pdf
Or buy a printed copy from Amazon.

Many people like the approach where the student does a lot of exercises -- not downloading or using cut and paste.  "Learn Python the Hard Way" is one of the better.  Here is a link to a version that can be read online for free:
https://learnpythonthehardway.org/book/
Or buy a printed copy from Amazon.

Coursera has offered several Python courses, ranging from absolute beginner to relatively advanced.  Check to see what is available for the time period you plan to study.  Some of the previous courses have been archived and resources, including videos of lectures, can be downloaded.  Coursera is in the process of changing from free to paid.  For most courses, but not all, you can still enroll and get access to the materials for free.  I have watched the videos from several of these.  None that I have seen are, in my opinion, excellent.  Several are poor.  Your method of learning will influence how effective each courses is for you. 
https://www.coursera.org/courses?languages=en&query=python

---------------------

For home study of machine learning, there is much to learn and there are many sources. 

Among the many points to keep in mind, one is very important.  Building machine learning models to identify profitable trades requires everything that learning to differentiate between species of iris or determining whether a borrower is likely to repay a loan requires.  It also requires that the time sequence organization of the data and the monotonic increase in efficiency of the markets as time progresses be recognized and properly dealt with.  I know of no book or online material that adequately addresses these special requirements.  Indeed, several seem to intentionally disregard them.  Begin by watching my video on "The Importance of Being Stationary." 
http://www.blueowlpress.com/video-presentations

For a basic university-level introduction to machine learning, Dr. Andrew Ng's Stanford Open Classroom course is very good:
http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning

I also like Dr. Yaser Abu-Mostafa's Cal Tech Online Course:
https://work.caltech.edu/

To incorporate machine learning into Python, a key library is pandas.  Pandas is a Python library for data handling, with particular features for time series.  The pandas library was developed by Wes McKinney while he was an analyst at Cliff Asness' AQR Capital Management hedge fund.  Wes has left AQR but continues to be active in the applications of machine learning.  There are several videos of his presentations on YouTube.  His book, "Python for Data Analysis," was the first of several that describe use of pandas:
https://www.amazon.com/Python-Data-...8064&sr=8-2&keywords=python+for+data+analysis

Dr. Jake Vanderplas is an astronomer at the University of Washington who is very active in use of Python, pandas, and machine learning.  His book, "Python Data Science Handbook," is outstanding:
https://www.amazon.com/Python-Data-...&keywords=python+for+data+analysis+vanderplas
Also watch his presentation, many posted to YouTube.

For some of the details of machine learning techniques, I like Sebastian Raschka's books:
"Python Machine Learning"
https://www.amazon.com/Python-Machine-Learning-Sebastian-Raschka/dp/1783555130/ref=sr_1_2?ie=UTF8&qid=1488218373&sr=8-2&keywords=Raschka,+Sebastian
"Python, Deeper Insights into Machine Learning"
https://www.amazon.com/dp/B01LD8K994/ref=rdr_kindle_ext_tmb

--------------

There are many more resources available.  But this much is probably already an overload.  I hope this helps getting started.

Best,  Howard


----------



## Trembling Hand (28 February 2017)

+1 for "Learn Python the Hard Way" if you are starting from scratch.



howardbandy said:


> It also requires that the time sequence organization of the data and the monotonic increase in efficiency of the markets as time progresses be recognized and properly dealt with.  I know of no book or online material that adequately addresses these special requirements.  Indeed, several seem to intentionally disregard them.



This is the big problem with all the material on the net about ML and trading. It just ignores it.  You soon see that people who are putting stuff out there might as well be doing weekend courses on Gann. Its just rubbish. The only material I have found is this dude,

https://www.google.com.au/webhp?sou...me+series+http://machinelearningmastery.com&*

But its advance stuff. It is taking me some time to get my head around it (but I'm dumbarse so maybe some of the quants in making will smash it) I would be interested in what you think of it Howard?

A good site to play around with once you have some basic Python skills is
https://www.quantopian.com/
Has the data on the site (US EOD and 1 min)and lots of examples. As its a back-testing/forward testing site based on Python it takes away a lot of the problems of getting the data into Python.

A site I keep on going to that is handy for little code snippets is,
http://chrisalbon.com/


----------



## howardbandy (28 February 2017)

Greetings --

MachineLearningMastery is the website and material of Jason Brownlee.  I have most of his material, and it is quite good.  (I believe he lives in Melbourne, Australia.)  His most recent, and his only material related to time series as far as I know, is "Introduction to Time Series Forecasting with Python" which focuses on ARIMA-like models which are not very useful to traders.

I have not found much of value in Quantopian.  But I am willing to be convinced otherwise.  A lot of their material seems to be longer term and / or fundamental -- both of which have little to no value for trading.

I am skeptical of sites that do the hosting.  I believe every trader should run his or her own code on his or her own computer.  Learn from others, but when you have discovered something that works for you, be cautious about who else sees it.

Thanks for the introduction to Chris Albon's site.  I will do some exploring.

----------------

A note to people who are not familiar with my work.

I have posted several videos on YouTube which will give some background into my thoughts, research, and experience.  Start here:
http://www.blueowlpress.com/video-presentations

Pay particular attention to the material on risk.  Begin with an assessment of personal risk tolerance, assess the risk of all financial systems being considered, normalize position size for risk, then estimate the future rate of return using the metric CAR25.  Trade the system that has the highest CAR25.  Position size must be kept out of the signal generation model.  It is only useful when it is in the trading management model.  Manage the trading day-by-day, adjusting position size using the dynamic position sizing technique as necessary to hold risk within tolerance.  When the risk of the system being traded increases, which will be indicated by a drop in the CAR25 metric, take it offline and replace it with the then-best system.  

The sweet spot is: trade often, trade accurately, hold a short period, avoid serious losses.  Very few investment / trading system give a return that is high enough to beat risk-free use of funds if they have a holding period longer than about 5 days and/or accuracy lower than about 65%.  Any serious losses or sequences of small losses will raise the probability of a serious drawdown, causing position size to drop, making the system untradable.

Best regards,  Howard


----------



## KAO (1 March 2017)

In addition to the resources you've listed above, I highly recommend DataCamp for learning data science related programming, statistics & computational finance skills in either R or Python.

Also, for anyone struggling to make a choice between R and Python, this may be of assistance - Choosing R or Python for data analysis? An infographic.


----------



## qldfrog (1 March 2017)

Gents,
Very busy lately but I will try to follow your leads.
just want to thank you for your inouts


----------



## Trembling Hand (1 March 2017)

howardbandy said:


> I have not found much of value in Quantopian.  But I am willing to be convinced otherwise.  A lot of their material seems to be longer term and / or fundamental -- both of which have little to no value for trading.



Not sure about that they have data that is 1min bars and Daily. So you can pick your time frame. But my point is that its useful to get a start. One big problem with Python is data. There is no simple off the shelf solution to run your algos on. You are going to have to write something to do portfolio testing on. That in itself will lead to having to know SQL or some other database. When starting it can be a step too far. Something like Quantopian has it all sorted so you just program in Python.... and the data is free!



howardbandy said:


> I am skeptical of sites that do the hosting.  I believe every trader should run his or her own code on his or her own computer.  Learn from others, but when you have discovered something that works for you, be cautious about who else sees it.




Yeah I probably agree. Once you get to do any sort of decent work I too would go private. But lets face it as a first step we are not going to be doing work that will be putting Jim Simons out of business so for the ease of use and free data I would have a look.


----------



## DaveDaGr8 (1 March 2017)

Trembling Hand said:


> One big problem with Python is data




I really have very limited experience in Python, however i would have thought that data handling would be one of it's strengths.

SQL ? .... for this type of analysis don't overlook flat files. Flat files are easier, better and way faster by orders of magnitude. Stay away from SQL unless you are really convinced you need it, the overhead is too expensive. The old Computrac -> Metastock filesystem that most programs can read is still one of the fastest, even with converting all the floats from MSBIN to IEEE standards. Premium data still push their data in that format and Amibroker still reads it. I would suggest learning to read your data from there, i am sure Howard sent that link in another thread that has all the nuts and bolts required.

On a side note, avoid using CSV files as working files because you are converting strings to numbers which is expensive too. Use them exclusively as import  export  to and from other programs, but load and save your working data in a flat file or series of flat files.

Data handling and your data structure is the first thing you need to get right, do it wrong and your program will be slow and clunky and will suffer forever until you do it right.

THEN you can start looking at machine learning algorithms.


----------



## Trembling Hand (1 March 2017)

DaveDaGr8 said:


> I really have very limited experience in Python, however i would have thought that data handling would be one of it's strengths.
> 
> THEN you can start looking at machine learning algorithms.



Yeah mate thats kinda my point. Its all very well to say this is the best. But for someone learning you want to avoid re-inventing the wheel - just to get to the start of doing the real work. Python is good.... once you get in there. As it stands there is no simple solution to getting your data files like Premium data into Python and running portfolio tests. Very easy to do single files but that is useless for testing across a whole market. Nothing that I know of. 

Anyone?


----------



## Alpha27 (2 March 2017)

I heard that you can link Amibroker to Matlab and Matlab has lots of options for machine learning.

Has anyone linked Matlab to Amibroker?


----------



## howardbandy (2 March 2017)

Years ago I was teaching computer science as Microsoft began publishing Windows, Word, and Excel.  The conversations were similar then -- Windows versus MSDOS versus CP/M -- Excel versus VisiCalc -- Word versus WordPerfect.  Now, for us, Python versus R.  

The two primary reasons to pick one over the other are capabilities and support.  

To test capabilities.  Install both and perform the functions that will be used over the life of the project.

To test support.  The bookstore test helped then -- an equivalent internet test might be helpful now.  I sent students to local bookstores -- Borders, Barnes & Noble, etc -- with the assignment of noting the number and quality of books for each.  Also to read the employment ads to see which skills were most in demand.     

For several years, I have heard the discussion about advantages and disadvantages of R and Python.  In terms of capabilities, and in keeping with my own advice, I tried to develop machine learning trading systems in both.  I worked through an entire trading system application -- from data acquisition through data munging, data transformations, train-test splits, crossvalidations, model selection, hyperparameter selection, model fit, model storage, model retrieval, prediction on new data, reporting and emailing results, running dynamic position sizing.  I found that Python was easier to work with, provided a consistent set of tools, and allowed me to focus on the trading system aspects of the project.  

In terms of support, first consider knowledge you already have and support you will receive from your friends and employer.  If that is heavily oriented toward R, then choose R.  Otherwise, do the bookstore and internet test.  Look for the reference material you will need -- tutorials, books, websites.  In my opinion, Python has a better support base.

You choose.  Pick one.  Become an expert in programming in it, in knowledge and use of the libraries you will need.

Then focus on machine learning for trading.    

In very broad terms, the steps and components of developing a machine learning trading system are:
Data acquisition -- free or subscription services.
Data munging -- alignment, identifying errors, correcting or dropping erroneous data.
Data transformation -- create indicators/predictors, lagged values, prediction target.
Data selection -- extract in-sample data, reserving out-of-sample data.
Model selection -- decision tree, support vector, emsemble, etc.
Model fit -- learning.
Model test -- validation.
Model evaluation -- simple metrics, computing safe-f, CAR25.
Model storage -- save to disk for future use without refitting.

The steps and components of using a machine learning trading system are:
Data acquisition -- gathering current values of the data series.
Data munging.
Data transformation -- applying the same transformations that were used in development.
Model retrieval from disk.
Prediction using the stored model and the new data.
Determination of system health and position size based on recent performance.
Trading based on the prediction.

------------------

As I have written, the sweet spot is to use daily data, trade a single issue, long/flat or short/flat.  Trade frequently, trade accurately, hold a short period, and avoid serious losses.  That is fortunate, because those characteristics are the easiest to model.  Pick one issue, choose the target to be whether the next close is higher or lower than the current close, mark-to-market daily, manage daily.  

The concept of a trading system changes considerably.  There are no preset rules.  The model decides what is important from analysis of the training data.  It is data mining, searching for signals among the noise.  

The result will be a series of state signals, each valid for one day.  That is -- a sequence of "beLong" or "beFlat" states for a single tradable issue, each one day long.  Holding periods longer than one day will be indicated by several "beLong" states in succession.  There are no maximum loss stops, no trailing stops, no imposed holding periods, no predefined critical parameter values, no portfolios, no position sizing.  If you are not comfortable with this, machine learning-based trading system will not work for you.

------------

In response to some of the comments made.

I think Python with Pandas is an easy solution to handling input data.  There are several free sources of daily data -- including Quandl, Yahoo, Google.  All of these come with the caveat that using free data moves some of the data quality issues to the end user.  Quandl also has subscription data.  I have not evaluated Quantopian's data.  

If a single source does not provide every data series you need, it is easy to gather data from multiple sources, store them in one or more Pandas dataframes, and use Pandas utilities to align, adjust, combine, etc.

Long-lookback indicators and long-term filter indicators are not helpful.  The modeling process is searching for signals.  A signal can only occur when there are changes within the data.  In order to generate, say, 50 signals a year, there must be 50 changes in predictor values each year.  The data analysis routines (often hidden) within each model will evaluate and discard data series that are overly redundant -- that either duplicate other series or present long sequences of constant values. 

In the end, you will wind up with one or two indicators that oscillate at about the same frequency as the signals you are looking for. 

The signals generated by the model will correspond to the bars are used to model.  Each bar results in one row in the data matrix presented to the modeling routine.  There will be a signal for each bar.  If you plan to trade one-minute bars, then model one-minute bars.  Modeling one-minute bars, or even hourly bars, will not help if your plan is to trade daily.

Similarly, using daily bars will not help if your plan is to trade once a week.  But, if you are managing trades less frequently than daily the risk of the position is certain to exceed your risk tolerance, and the CAR25 value will suggest that the system not be traded. 

Similarly for portfolios.  CAR25 is a Dominant metric.  Given two or more trading systems, each with signals to enter and exit a single issue, pick the one with the highest CAR25.  Splitting trading funds to take two positions and form a portfolio might provide a feeling of comfort due to diversification.  What it really does in ensure that some of the funds are being used sub-optimally.

Database issues do not arise.  Once the data has been read into the dataframe, there is no use of external data storage -- no need for csv files, no need for SQL databases.  After fitting, the trained model will be stored on disk.  The main component of the model is a matrix of coefficients that are the solution to the AX = Y set of simultaneous equations -- the "A" matrix.  Python stores that for us, and we have no control over the format of the storage.  When we later want to use the model, Python retrieves the model.  All we need to do for both operations is provide the file name and path we want to use.

Best regards,  Howard


----------



## howardbandy (2 March 2017)

I reread the thread.  There seems to be a question of efficiency of data access.  This is the complete sequence necessary to load 20 years of daily data for SPY into a pandas dataframe named qt:

import Quandl
qt = Quandl.get("GOOG/NYSE_SPY")

best,  Howard


----------



## Trembling Hand (2 March 2017)

howardbandy said:


> I reread the thread.  There seems to be a question of efficiency of data access.  This is the complete sequence necessary to load 20 years of daily data for SPY into a pandas dataframe named qt:
> 
> import Quandl
> qt = Quandl.get("GOOG/NYSE_SPY")
> ...





Thats not the issue!!!! It's one instrument! If I wanted to do a test against the S&P500 constituents your code blows out to 1000 lines just for the data query of poor quality free data.

Show me some code to feed into and filter data that is actually useful. I have Norgates ASX data in a 
C:\Trading Data\Stocks\ASX\Equities which is in 24 folders and has countless .dat files. 
I also have 37.5 Gb of futures data spread over 849,803 files in 1000s of folders in a flat files. I'd love for you to show me how to do a portfolio test against that with two lines of code. I can read each individual file but that doesn't mean its an easy task to consolidate a PORTFOLIO to test against.

I have stated this a few time that one of the big problems with Python or any of the 'new' programming languages is the woeful practical applications outside of overly simplistic examples like Howard has just given. The 'old' backtesting software where programmers have done all the nutz'n'boltz work for you saves you hundreds of hours and saves you from having to re-invent the wheel just to get started. Which all brings me back to quantopian.... they have done nutz'n'boltz.


----------



## howardbandy (3 March 2017)

Hi TH --

From your comments, you have not read and worked through the examples of risk-normalized profit potential.  

Here is a brief summary:

Begin by analyzing the price data itself before applying a model.  There is a procedure called the Data Prospector, fully disclosed, that will analyze the risk and profit potential of any data series, even before attempting to develop a model to trade it.  Some issues will have too little volatility to provide profit; some too much to be tradable.  There is a middle group of goldilocks issues that Might work -- we do not know yet.  Continue with that group and check liquidity.  I recommend that there be enough liquidity so that the trader can exit his or her entire position in any minute of any day without substantially affecting the bid-ask spread.  I also look for bid-ask spreads that are one cent at almost any time.  

There will be a list of a couple dozen issues that pass those filters.  Now begin system development.  Try to model each one long/flat.  Or short/flat (but this is much harder).  Each data series and its associated model create a trading system -- a system that trades a single issue long/flat.  Validate each of those individually to ensure that the system is likely to be profitable in the future.  

Define your personal risk tolerance.  For example, wanting to hold the risk of a drawdown in excess of 20% to a chance of less then 5%.  Each system you trade will have its position size adjusted trade-by-trade to keep it within your risk tolerance.

Apply the risk analysis to each of the several systems that look promising.  Each has a maximum safe position size, safe-f.  This is the portion of funds that can be used to take positions.  The remainder of the trading account must stay in a risk-free account to act as ballast to compensate when the funds traded enter a drawdown.  When traded at safe-f, each system has a profit potential.  It can be quantified.  It is called CAR25.  

CAR25 is a Dominant metric.  The best use of funds is to trade the single system with the highest CAR25.  

Expect distribution drift.  Monitoring each of the systems day-by-day, compute CAR25 for each.  As trading performance changes, trade the one that has the highest CAR25.  If the CAR25 of the one at the top of the list is not higher than the risk-free funds, do not trade any of the systems.

Portfolio construction is not necessary, or even desirable.  Assume there are two systems, one with CAR25 of 12%, the other with CAR25 of 22%.  Assume safe-f is 100% for both, so all funds are available to buy shares.  Forming a two-issue portfolio creates a profit stream that is 17% -- half from the 12% system, and half from the 22% system.  The trader is better off using all funds for the 22% system.  Watching carefully to switch when some other system shows better performance as indicated by CAR25.

Best regards,  Howard


----------



## howardbandy (3 March 2017)

Greetings -- 
Regarding:  Thats not the issue!!!! It's one instrument! If I wanted to do a test against the S&P500 constituents your code blows out to 1000 lines just for the data query of poor quality free data.

-----------------

You will have read in my post of a few minutes ago my recommendation for issue selection and portfolio construction -- that each system be a single issue traded long/flat or short/flat.  If a person wanted to analyze many issues, all in the same run, with all data in core at the same time, here is how.

-----------------

First -- choose poor quality free data or curated premium data.  There is no change other than the string identifying the ticker of the issue to be loaded.

The example I posted used free data which can be sourced, at your preference, from Google or Yahoo.  The call is exactly the same to use curated premium data.  It changes from:
qt = Quandl.get("GOOG/NYSE_SPY")  #  Free data
to:
qt = Quandl.get("EOD/SPY")  #  Curated premium data

----

From the Quandl site:
"Quandl hosts several commercial-grade "premium" databases in addition to our free data. These premium databases are of a higher quality, accuracy, timeliness and documentation standard than our free data and are intended for professional use.  
Premium databases are not free; you have to subscribe to access them.  However, we do offer generous free trials so that you can try before you buy."

-----------------

Then -- to load many issues into core.

To load the prices for several issues, or perhaps several thousand issues (there is no limit), the code changes very little.  There will be a call to the data server for each issue, but the block of code is three lines longer (a little shy of 1000 lines) in order to store any number of additional data series into additional columns of the dataframe.

To fill a dataframe with prices from a few issues (slightly pseudo-coded):
tickerlist=["IBM","SPY",...,"AAPL"]  #  Or read the tickerlist from a diskfile, perhaps stored as a watchlist
for issue in tickerlist:
    colName = issue    
    qt[colName] = Quandl.get(issue)

--------------------

I understand that many people will be resistant to the ideas being suggested.  Questions are welcome.  Civil questions preferred.  If, after working through the math involved, this approach is not for you, please revert to lurking rather than disruption.  

There is math -- as already seen in the data prospector, safe-f, and CAR25 -- and there will be much more math as we get into machine learning.

----------------  

Best regards,  Howard


----------



## howardbandy (3 March 2017)

Trembling Hand said:


> I have stated this a few time that one of the big problems with Python or any of the 'new' programming languages is the woeful practical applications outside of overly simplistic examples like Howard has just given. The 'old' backtesting software where programmers have done all the nutz'n'boltz work for you saves you hundreds of hours and saves you from having to re-invent the wheel just to get started. Which all brings me back to quantopian.... they have done nutz'n'boltz.




Greetings --

This isn't about Python.  It is about machine learning.  Specifically for trading.  Python is the base language which provides access to a set of libraries that implement the machine learning fitting and testing routines we will discuss and develop. 

If something being used is already providing satisfactory results, look no further.  Continue to use it.  Please do not disrupt this thread.

The vast majority of traditional trading system development platforms implement "old backtesting software" that is based on simple decision trees.  That is the simplistic part. 

By the time we get to the end of this thread, there will be few readers who think machine learning, and the scientific, data driven, distribution oriented, Bayesian, and crossvalidation-based techniques it uses, are simplistic.  Machine learning broadens available models to include dozens of techniques that are often significantly better -- higher risk-normalized profit -- than individual decision trees.  There will be some math.

Best regards,  Howard


----------



## nevsupra (13 March 2017)

howardbandy said:


> Greetings --
> 
> This isn't about Python.  It is about machine learning.  Specifically for trading.  Python is the base language which provides access to a set of libraries that implement the machine learning fitting and testing routines we will discuss and develop.
> 
> ...




Greetings Howard,
Most of us don't have time to write a stock selecting, stock trading program. But I think its a great idea if you can get it working. It would be great to leave all the hard analysing to an AI that would automatically trade and make profits. If you could perfect it it would be worth millions. I would be interested, keep up the good work and give us updates when you can.


----------



## Gringotts Bank (14 March 2017)

http://www.cnbc.com/2017/03/13/mark...-trillionaire-will-be-an-ai-entrepreneur.html

fwiw


----------



## howardbandy (15 March 2017)

Trillionaire --

1,000,000 bank accounts, each with a balance of $1,000,000.  Or the equivalent.

Earning 1% per year, the return would be $10,000,000,000 per year -- over $25 million per day.

In accumulating funds, there is risk.  Whenever there is risk, there are two absorbing boundaries -- winning and bankruptcy.  When there is a non-zero chance of bankruptcy, no matter how small, a prudent person would decide what level constituted "enough" and remove funds from further risk.  

Assuming living costs are not significantly different than they are today --- Why??  Money decreases in utility as the accumulation of it increases.  After buying a reasonable, or even an excessive, house, what does the 10th million enable that the 9th did not?  Or 110th and 109th.  Or fill in your own numbers.  But at some point one addition million dollars adds an undetectable amount of value.  Well before reaching one trillion, in my opinion and for my lifestyle.


Best,  Howard


----------



## skc (15 March 2017)

howardbandy said:


> Assuming living costs are not significantly different than they are today --- Why??  Money decreases in utility as the accumulation of it increases.  After buying a reasonable, or even an excessive, house, what does the 10th million enable that the 9th did not?  Or 110th and 109th.  Or fill in your own numbers.  But at some point one addition million dollars adds an undetectable amount of value.  Well before reaching one trillion, in my opinion and for my lifestyle.




Yes... I rather be this guy.


----------



## howardbandy (18 March 2017)

Data Structures used in Machine Learning for Trading

There are many development platforms that support traditional trading systems, including TradeStation, AmiBroker, NinjaTrader, WealthLab, and dozens of others.  There is not yet a trading system development platform that gives trading system developers who want to use impulse signals and multi-day holding access to a wide variety of model fitting techniques.  The reason to look beyond the traditional platforms is that all of them are limited in model choice to decision trees, while other models may produce trading systems with better performance.

Fortunately, trading systems that use state signals and mark-to-market every bar are easily implemented.  They fit directly into the machine learning techniques supported by Python and scikit-learn.

What follows is a simplified list of the steps of trading system development and trading management, annotated with short descriptions of the data structure and program associated with each step.

Development

... Acquire historical data.  A temporary Pandas dataframe (similar to a two-dimensional array or spreadsheet -- references at the end of this post) will be used to receive the data from the vendor.  Each day or bar of data creates an observation.  Most likely your program will open and read data files from a data provider such as Yahoo, Google, or Quandl.

... Data examination and cleaning.  The temporary Pandas dataframe.  Programs that examine the data series looking for missing data, inconsistencies, outliers.  Programs that plot the data giving an opportunity for visual inspection.

... Data consolidation.  The main Pandas dataframe.  Data from the individual streams and sources are combined into a single dataframe.  Pandas performs date alignment and time zone adjustment automatically.

... Indicator computation.  The main Pandas dataframe.  Functions that compute indicators (such as RSI or detrended price oscillator) that will be used as predictors are computed using functions that operate on columns of the dataframe, creating additional columns.  Previous values of indicators (lagged values) are copied into individual columns creating new indicators.

... Target computation.  The main Pandas dataframe.  The machine learning algorithm will make the best fit it can to the target variable, based on the predictor variables.  There will be a target for every observation.  The target has its own column in the dataframe.

... Data preparation for machine learning.  Conversion from Pandas dataframe to numpy array (more basic two dimensional array).  A single assignment statement performs the conversion to the data format that the scikit-learn models are programmed to expect.

... Hyperparameter determination (Model selection; period of stationarity; performance metrics; predictor variable selection; train/validate/test split).  Hyperparameters are variables set before fitting the model to predict the target.  In python, these are set using the crossvalidation libraries of scikit-learn, typically together  with grid searches.

... Model fitting.  Two numpy arrays are passed to the model fitting procedure -- a two dimensional array with columns of predictor variables and rows of observations; and a one dimensional array with an entry for each observation, holding the target value we want  the model to learn.  The scikit-learn model has been designed to implement a particular fitting technique, programmed, verified for correctness, and optimized for execution speed.  The fitting process produces a storable model that can be used later to predict the target value for a given set of predictor variables.

... Prediction.  The previously fitted and stored model, together with a two dimensional numpy array of predictor variables.  The prediction process applies the model to the data and produces a one dimensional array with predicted target values -- one per observation.

... Model assessment.  Two one dimensional arrays -- one of known target values, the second of predicted target values.  Built-in assessment routines compare the two arrays and produce goodness-of-fit metrics.  Alternatively, a custom program (perhaps written by you) evaluates the risk profile and profit potential of the predictions.

... Model storage.  The model is essentially the coefficients to a set of simultaneous equations, together with the definition of the model.  It is stored on disk for later retrieval and use.


Trading and trading management

... Acquire current data.  Extend the Pandas dataframes used to acquire historical data.  Examine the data, consolidate it, compute indicators, and prepare the predictor variable array expected by the model.  The target is unknown, and no target array is prepared.

... Model retrieval.  Retrieve the previously fitted model from disk.

... Prediction.  The model processes the updated array of predictor variables and produces prediction for the new observations.

... Model assessment.  Using the trading management model -- which has been prepared using a similar procedure to the trading model -- determine system health, estimate risk-normalized profit potential, determine maximum safe position size.

... Trade placement.  Place an order.

-----------------------------

The new data structure is Pandas dataframe.  Pandas was developed by Wes McKinney while he was an analyst at Cliff Asness' AQR Capital Management hedge fund.  Wes is no longer at AQR, but continues to develop machine learning tools and techniques.  His description of Pandas can be found in his book, "Python for Data Analysis."  A second edition is due to be published in August 2017.

Jake VanderPlas, an astronomer active in the machine learning community, has published "Python Data Science Handbook" which explains Pandas dataframes and numpy arrays with excellent examples.  I highly recommend Jake's book.

Sebastian Raschka, a PhD candidate in computer science at Michigan State University, has published several excellent books related to machine learning.  Begin with "Python Machine Learning."

Best, Howard


----------



## ThingyMajiggy (3 May 2017)

Why does a time series make it more difficult for machine learning? I don't really understand how they're teaching machine learning/AI to play video games or board games and it's nailing it after running overnight on thousands/millions of tries and by morning its better than any human, why can't that be done with trading, load as much history into it as you can and treat each day as a "level"(obviously intra-day data would be optimal) and let it go thousands or millions of times learning the different days, I know each day is different in trading but surely it would recognise patterns that we might not have noticed before or have a fair idea of what to expect from the training set?


----------



## tech/a (4 May 2017)

It is being done


----------



## howardbandy (4 May 2017)

ThingyMajiggy said:


> Why does a time series make it more difficult for machine learning? I don't really understand how they're teaching machine learning/AI to play video games or board games and it's nailing it after running overnight on thousands/millions of tries and by morning its better than any human, why can't that be done with trading, load as much history into it as you can and treat each day as a "level"(obviously intra-day data would be optimal) and let it go thousands or millions of times learning the different days, I know each day is different in trading but surely it would recognise patterns that we might not have noticed before or have a fair idea of what to expect from the training set?



Hi TM --

Machine learning, as we will be using it, is a two step process.
1.  Examine some data, looking for patterns that precede profitable trades.  Fit a model to the data in such a way that the patterns are identified.  For example, each day at the close of trading, identify whether the next day's close will be higher or lower.
2.  Verify that the patterns persist.  That they continue to exist in data that was not examined during development.  

In many machine learning applications, such as determining / estimating / predicting whether a borrower will repay a loan or default, the data points are independent.  Each data point is related to a single person and consists of his or her employment, residency, age, etc.  We could write the data on slips of paper, one per person, put them in a big bowl, draw them out in any order, and treat them as being independent.

Time series data introduces a complication in that data points -- data for individual days -- are not independent.  For one, we rely on comparisons between current data and previous data (did the RSI fall through 20?).  For another, patterns that were profitable in the past may no longer be profitable.  And, the data is very noisy (there may not be actual profit-indicating patterns -- it may all be noise).   

So, in addition to the model fitting and validation process called for by the scientific process, the analysis of time series must take all of these extra issues into consideration.  One of the concepts is "stationarity."  My video, "The Importance of Being Stationary," might help:


Best regards
Howard


----------



## Skate (4 May 2017)

howardbandy said:


> Hi TM --
> 
> Machine learning, as we will be using it, is a two step process.
> 1.  Examine some data, looking for patterns that precede profitable trades.  Fit a model to the data in such a way that the patterns are identified.  For example, each day at the close of trading, identify whether the next day's close will be higher or lower.
> ...





Hi Howard

Machine Learning -- Versus – AmiBroker Mechanical System

To get my head around the two systems am I correct assuming that they work completely different.

1. As with an AmiBroker mechanical system it searches for patterns that meets the condition of the Buy code

== whereas ==

2. Machine Learning looks for any pattern that precedes a profitable trade and once that pattern is repeatedly evaluated and found to be TRUE – then a Buy signal is generated.

In a nutshell what I'm fumbling to ask is – A mechanical system searches for a pattern specified in our Buy Formula and it generates a Buy Signal once that condition is met – whereas -- Machine Learning is looking for a multi-confirmed pattern that precedes a profitable trade and then a Buy signal is generated.

Meaning - Machine Learning is NOT looking for a 'specified pattern' as it's searches for ANY repeated pattern that end in a profitable trade.

Is this correct?


----------



## tech/a (4 May 2017)

This is a Ducks explanation.

Amibroker can run systems tests on a pre defined formula,

Machine learning
Identifies patterns and learns how to improve the buy and sell signal by
Analysing data . It's a continuing process.

Systems are static.

Howard will make it sound really exciting and interesting.


----------



## Trembling Hand (4 May 2017)

Dudes. Traditionally systems are designed by decision trees.

If A = X and B = Y then do C else do D

ML is a collection of standard algos based on multiple statistical and modelling methods. Literally 10 to 100s of differant methods. Some simple like linear regression, some have some serious maths behind them like some of the Classification and Clustering algos. Very few of them have a direct link to tradition trading models though there is a ML algo based on decision trees.

Simple ML applications work very well with classifying data that is not time series. The classic ML data set is of 3 species of Iris.


```
Fisher's Iris Data
Sepal length    Sepal width    Petal length    Petal width    Species
5.1                     3.5              1.4                 0.2       I. setosa
4.9                     3.0              1.4                 0.2      I. setosa
4.7                     3.2              1.3                 0.2      I. setosa
7.0                     3.2              4.7                 1.4     I. versicolor
6.4                     3.2              4.5                 1.5      I. versicolor
6.9                      3.1             4.9                1.5      I. versicolor
etc
```
As you can see it's just rows of independent data.

Others that work very well are medical test data like cancer screen test or as Howard has mentioned credit scores. Where one row of data is not linked to what procced it or what will follow. And the difference between what you are trying to find is very clear as you can see by the Petal width in the above.

Market data is nothing like this. Every row of data is linked to what was behind it and what you are trying to predict is what is not yet seen. Even more so market data is very noisy as we all know. Most of the ML algo are just **** when you feed daily or intraday data and expect a result.

If you want to have a look at what algos are used one of the main collections are the Scikit-Learn algos,
http://scikit-learn.org/stable/



If you want to use these to force in market data you are going to have to be very smart and creative to make it work.

The algos probably hold the most promise in classification of market regimes etc rather than finding patterns.


----------



## howardbandy (5 May 2017)

tech/a said:


> This is a Ducks explanation.
> 
> Amibroker can run systems tests on a pre defined formula,
> 
> ...




Hi Tech/a, and all --

Modeling data is exciting and interesting, in a nerdy sort of way.

Pardon me for repeating myself.  I have already posted most of what follows in one or more ASF threads, but I'll make a few comments for the benefit of new readers.

1.  In general, a system is a combination of a model and some data.  The model consists of the rules, indicators, parameters, etc.  The purpose of the model is to identify some aspect of the data -- often identification of patterns that are important to the person who will be using the system.  Changing either the model or the data creates a new system.

2.  We are developing trading systems.  Assume it is a system to trade IBM stock using daily data.  The data is the IBM price series (and perhaps volume), along with perhaps some auxiliary data series.  The model is the set of rules that identify good times to buy and sell.  

3.  There are many -- dozens to hundreds -- of possible models templates.  As simple as linear regression.  As complex as deep neural networks.  One of the possible models is decision trees.

4.  The purpose of the system is to identify some important "signal" in the data.  There are two general approaches.  One.  Compute some indicator, then see what happens later.  Two.  Identify some important events, then see what happened earlier.

5.  Traditional trading system development platforms (including TradeStation, NinjaTrader, AmiBroker, WealthLab, etc) use a single model type -- decision tree.  Traditional platforms are primarily of the "indicator first" approach.  

6.  Machine learning system development (such as those supported in scikit-learn or R) give the person developing the systems a very wide choice of models.  Machine learning is primarily the "identify something important first" approach.

7.  There is no general rule, and no way to determine in advance, which model will best identify the signal in a set of data.  Try dozens and rank them.  Single decision tree is rarely (in my experience, never) among the top performers.

8.  Most trading systems tried and tested using either traditional or machine learning are worthless.  The gold standard is use of the scientific method, including one-time testing of data that is more recent than that used to fit the model to the data.

Best regards,  Howard


----------



## howardbandy (5 May 2017)

One more thing.  Time series systems add a considerable complexity.  They are not stationary.  They are not static.  They are dynamic.  No matter how the system was developed (traditional platform or machine learning),  there will be periods where the signal is clearly identified and can be profitably traded.  And periods where it is inaccurate and is not profitable.  We cannot tell whether a period of poor performance is a temporary loss of synchronization between the model and the data and will recover; or the first indications that the system is broken and will never recover.  Prudent system management reduces position size in response to poor performance.  The relationship between the recent trades and the safe position size that results in the highest wealth over a period of time is itself a system -- the trading management system.  

Best,  Howard


----------



## luisenoz (20 July 2017)

Hi Howard,
I see you recommended the course on ML from Dr. Andrew Ng's at Stanford Open Classroom.
Unfortunately, he works with a Windows version of Octave and I work in a Mac environment.
Do you know where I can find instructions to implement Octave for Mac in a way that would work within Dr Ng's course? Or if there is a forum or other communication channels for students of those courses?
Thanks,
Luis


----------



## Indoril (21 July 2017)

luisenoz said:


> Hi Howard,
> I see you recommended the course on ML from Dr. Andrew Ng's at Stanford Open Classroom.
> Unfortunately, he works with a Windows version of Octave and I work in a Mac environment.
> Do you know where I can find instructions to implement Octave for Mac in a way that would work within Dr Ng's course? Or if there is a forum or other communication channels for students of those courses?
> ...



Do you mean GNU Octave? It's cross platform and so it works on Mac. https://www.gnu.org/software/octave/


----------



## luisenoz (21 July 2017)

Indoril said:


> Do you mean GNU Octave? It's cross platform and so it works on Mac. https://www.gnu.org/software/octave/



Thanks Indoril,
You're right, I asked about GNU Octave. I know there is a iOs version of it, but I'm not sure it'll work with the Openclassroom course. The course has a specific link from where download the Windows version that it's somehow linked to their servers, so you can lodge your exercises and get them corrected.
Anyway, I'll install it and give it a try.
Thanks


----------



## Indoril (22 July 2017)

Oh I see. Sorry, I didn't realise.


----------



## luisenoz (22 July 2017)

Indoril said:


> Oh I see. Sorry, I didn't realise.



All good. Installed and working with the course. 
Thanks


----------



## Wysiwyg (5 August 2017)

Is not data an issue? What data, where to get it, data authenticity, cost, readily available and all data or partial data.



Gringotts Bank said:


> How long before there's a point-and-click retail version of machine learning software?  Something like Watson's Analytics for financial markets?



 How would you use 'Watson Analytics' by IBM?


----------



## CanOz (5 August 2017)

There are already two at least, retail machine learning platforms for trading.....been out for years.


----------



## Wysiwyg (5 August 2017)

Interesting research ...


----------



## Wysiwyg (5 August 2017)

Free course ...

https://www.udacity.com/course/machine-learning-for-trading--ud501

Paid course ... 

https://www.coursera.org/learn/computational-investing


----------



## Trembling Hand (6 August 2017)

Wysiwyg said:


> Free course ...
> 
> https://www.udacity.com/course/machine-learning-for-trading--ud501
> 
> ...



I would be very skeptical about the ultimate usefulnessof such material. At best they are extreme oversimplification of the problems leaving you with a big hole in your applications and knowledge.

At the other end some are straight ripoffs. Like paying someone to teach you how to do a MA cross system and saying thats what hedge funds do!


----------



## entropy (23 November 2021)

howardbandy said:


> One more thing.  Time series systems add a considerable complexity.  They are not stationary.  They are not static.  They are dynamic.  No matter how the system was developed (traditional platform or machine learning),  there will be periods where the signal is clearly identified and can be profitably traded.  And periods where it is inaccurate and is not profitable.  We cannot tell whether a period of poor performance is a temporary loss of synchronization between the model and the data and will recover; or the first indications that the system is broken and will never recover.  Prudent system management reduces position size in response to poor performance.  The relationship between the recent trades and the safe position size that results in the highest wealth over a period of time is itself a system -- the trading management system.
> 
> Best,  Howard



Thank you for your excellent series of posts in this thread!

Your web site videos and articles were greatly appreciated.

You have motivated me to learn the Python 3 language and to investigate some machine learning algorithms that may be useful for stock price movement prediction.

Your suggestions re holding stocks for 1 to 5 days was an eye-opener!

My Python program can now get and preprocess 20 years of price and volume data: is this time frame necessary or should I focus just on more recent data?

I have implemented some xgboost models (uses trees) and some Deep Neural Network models using keras as these two machine learning models seem to feature in the winning entries of many of the data prediction competitions (eg Kaggle).

I am currently pondering as to which features I could test with a view to being included in a model.
There are a plethora of simple and exponential moving averages but I am assuming their signals are already factored into the stock price and are unlikely to lead to an edge.

If you are still active on this site I would be grateful for any advice.


----------



## DaveDaGr8 (23 November 2021)

I haven't seen or heard from Howard in a long time. Last time i spoke to him he had stopped trading.

Personally i use RSI().
I also use c/ma(c,X) a lot ( in almost everything ), usually as a series to add a momentum input.

The 2 examples are auto normalising formulas, hence they will work on a variety of different instruments and classes. So to answer your question fully, use any indicator that self normalises, it makes life easier.

The timeframes you need to use are a LOT faster than what you would as a system trader. RSI() i would use 2 -> 5 on a daily. But there is no reason why you can't throw ALL of them in. The model should work out which is relevant and which isn't.

In that vein, any indicator that you overlay on a price chart ( bollinger bands, moving averages etc ) are out OR if you really want to use them you have to normalise them to the underlying price somehow.

You also need to input the underlying market as an input. This becomes another complex can of worms.

In terms of models I like Neural networks better. Most quants prefer trees because they can create a human readable system. THIS IS NOT THE POINT OF ML. More edges can be gained from ML models by finding patterns that humans can't and NN's are more adept at finding patterns. Trees are just repeating a process that quants have been working on for years.


----------



## entropy (23 November 2021)

DaveDaGr8 said:


> I haven't seen or heard from Howard in a long time. Last time i spoke to him he had stopped trading.
> 
> Personally i use RSI().
> I also use c/ma(c,X) a lot ( in almost everything ), usually as a series to add a momentum input.
> ...



Thanks Dave, brilliant post!

And plenty for me to go on with. I have only traded since last year so missed the input from Dr Bandy of four to five years ago.

Your comments re tree-based methods and the need for quants to use these so their clients can understand the output are apt and also encouraging.

When you say to input the underlying market do you mean something like the All Ordinaries Index?
If I was trying to predict, say, CBA movement, include some sort of Bank Index, to predict FMG, some sort of Mining Index etc?


----------



## DaveDaGr8 (24 November 2021)

Yes the index or indexes. 

The inputs should be whatever is relative to what output you're trying to achieve.

If you are looking for specific patterns within the stock itself, then you don't need to know about the market. However if you're looking for trading signals, or price prediction then the market as a minimum is crucial.

I must admit, i've never seen xgboost before ... it looks interesting so i might have a play with it.


----------



## markrmau (25 November 2021)

Blow me down, I didn't expect people to be talking about using machine learning on this forum!  I was on this path myself and was just putting some tools in place to start investigating. I have no doubt it will be challenging to get useful algorithms - machine learning won't be a magic bullet due to the inherrent randomness of the stock market.

I did the following course a year or so ago and thouroughly enjoyed it. I think it gives a lot of the basic concepts which are important to understand before launching into the more complex details.     https://www.coursera.org/learn/machine-learning

Useful comments in the above thread!

As for implementation, I am in the process of putting tools together for the training and testing of algorithms.  If anyone is interested, I am happy to share.  The big proviso is that I am not an expert coder.  However the hard work is already done in python libraries so I don't think this will be a barrier.

The tools I have put together are from the old unix ethos - lots of little tools to do specific tasks:

Download stock data from yahoo finance and save
Search stock data to provide a series of buy / sell signals (currently for testing I have implemented techtrader, but the code can use use the buy/sell signals from a machine learning algorithm).
Undertake monte carlo analysis based on taking a selection of the above buy/sell signals (not finished yet though). Arguably this could be done using amibroker or other programs, but I prefer my own tools so I can verify I am not using "future signals".  It is not a complex program.

So to reiterate, I am not an expert coder, but here is my simple code to download and save asx stock data (based on a download of stocks listed in a csv file). Obviously make a directory somewhere you want to hold the project in, with a subdirectory 'Data'.



> import yfinance as yf
> import pandas as pd
> import time, os, pickle
> 
> ...



This is python 3. Edit - python won't like that - the indentation seems to have disappeared!  I'll attach - need to change the extension of course

This code simply downloads buckets of data from yahoo and saves it as pickled dataframe.   Next steps would be to add code to be able to update the data as time goes on, and also for machine learning we need to create separate training and validation sets 

If there is interest, I can share more of this sort of basic code - or someone might point out that this framework already exists in python!


----------



## qldfrog (26 November 2021)

markrmau said:


> Blow me down, I didn't expect people to be talking about using machine learning on this forum!  I was on this path myself and was just putting some tools in place to start investigating. I have no doubt it will be challenging to get useful algorithms - machine learning won't be a magic bullet due to the inherrent randomness of the stock market.
> 
> I did the following course a year or so ago and thouroughly enjoyed it. I think it gives a lot of the basic concepts which are important to understand before launching into the more complex details.     https://www.coursera.org/learn/machine-learning
> 
> ...



@markrmau ,
 following with interest.
Spent last 2y using AB for basic system building and trading, but with background in AI, obviously ML teaining has never been far of my mind.
I believe the experiences I gained so far in standard system building is not wasted if aiming the ML way and would not be surprised to see myself in that area in 4 or 5 years.
Good luck in your progress : all thes posts are really encouraging


----------



## KevinBB (26 November 2021)

@markrmau well ... I haven't been following this thread, as I don't do machine learning ... but my latest project is something similar. Your Python reference has sparked my interest. I am like you, a novice Python programmer.

I currently trade a futures system, very similar to the Carver system, which I am trying to adapt to Australian stocks. So far, unsuccessfully. Those who know the Carver system will know that lately he has been playing with portfolio optimisation. So, this is what I've been working on in Python.

The process I am going through is to select stocks based on their minimal correlation to an "uncorrelated portfolio". I add stocks one by one to this portfolio based on their low correlated returns. Breakthrough last night ... got the code working.

Now for the fun bit ... with this "uncorrelated portfolio", apply the Carver system and see how it turns out.

KH


----------



## markrmau (26 November 2021)

Here is a modified form of techtrader - applys all buy/sell signals.  The techtrader buy/sell signals can be replaced with ML signals.

This produces a list of all possible trades (asssume sell on last day of data series).  Then the next step would be to do montecarlo analysis on these trades to fit with available capital, commissions etc.  I have this 50% complete (excludes the montecarlo analysis - just does one run).

This code is a bit messy.  I originally coded using pandas which produced beautiful clean code but was too slow.  I moved to numpy for speed but it is little less readable - I know I could use dictionaries for the column numbers but I wanted to keep the code fast.

@qldfrog & KH  - I haven't heard of Carver - I'll have to look it up.  Thanks for the interest!


----------



## DaveDaGr8 (26 November 2021)

Rob Carvers book systematic trading was a real eye opener and the depth he goes into is not for the faint hearted. It made me realise just how important it is to trade uncorrelated systems.

Andreas Clenow's Trading evolved is also a MUST READ for anyone who wants to get started in using Python for any sort of trading application.


----------



## markrmau (26 November 2021)

Thanks Dave!  I had a quick flip through the book - I see that a lot of what I have been doing is already implemented here:








						GitHub - quantopian/zipline: Zipline, a Pythonic Algorithmic Trading Library
					

Zipline, a Pythonic Algorithmic Trading Library. Contribute to quantopian/zipline development by creating an account on GitHub.




					github.com
				




I think I'll ditch what I have been doing and use zipline.


----------



## KevinBB (26 November 2021)

DaveDaGr8 said:


> Rob Carvers book systematic trading was a real eye opener and the depth he goes into is not for the faint hearted. It made me realise just how important it is to trade uncorrelated systems.
> 
> Andreas Clenow's Trading evolved is also a MUST READ for anyone who wants to get started in using Python for any sort of trading application.



True. I found his books about one year ago, implemented Leveraged Trading first, and then Systematic Trading, both with CFDs, before progressing on to a small futures portfolio. I'm now trying to implement ST into Australian stocks.

Sometimes, though, his Python coding and his recent writing is far beyond my understanding. Its taken me a couple of months, and much discussion with another person (afaik not on this forum) to get this "optimised portfolio" to a stage where it can be tested.

We're off the topic of machine learning .. so not too much more.

KH


----------



## markrmau (28 November 2021)

Ick,  zipline isn't what I want to use.  This looks a lot better:






						Backtesting.py - Backtest trading strategies in Python
					

Fast Python framework for backtesting trading and investment strategies on historical candlestick data.




					kernc.github.io
				




It seems well suited to integration into ML and even has a ML specific tutorial:


			https://kernc.github.io/backtesting.py/doc/examples/Trading%20with%20Machine%20Learning.html
		


I can assist with python if anyone needs help.


----------



## entropy (29 November 2021)

markrmau said:


> Blow me down, I didn't expect people to be talking about using machine learning on this forum!  I was on this path myself and was just putting some tools in place to start investigating. I have no doubt it will be challenging to get useful algorithms - machine learning won't be a magic bullet due to the inherrent randomness of the stock market.
> 
> I did the following course a year or so ago and thouroughly enjoyed it. I think it gives a lot of the basic concepts which are important to understand before launching into the more complex details.     https://www.coursera.org/learn/machine-learning
> 
> ...



Excellent post mark, and your generous offer to share is appreciated!

I have been learning Python and some of the ML and DeepNN models.

Currently using Python 3, Anaconda Navigator, Jupyter Notebooks.

The Kaggle site is interesting: working through the revealed code and the thought processes of prizewinners is instructive.
Looking forward to all discussion and posts re ML and the markets.


----------

