Reinforcement Learning applied to Forex Trading

It is already well-known that in 2016, the computer program AlphaGo became the first Go AI to beat a world champion Go player in a five-game match. AlphaGo utilizes a combination of reinforcement learning and Monte Carlo tree search algorithm, enabling it to play against itself and for self-training. This no doubt inspired numerous people around the world, including me. After constructing the automated forex trading system, I decided to implement reinforcement learning for the trading model and acquire real-time self-adaptive ability to the forex environment.

Environment Setup

The model runs on a Windows 10 OS (i9-9900K CPU) with DDR4 2666MHz 16G RAM and NVIDIA GeForce RTX 2060 GPU. Tensorflow is used for constructing the artificial neural network (ANN), and a multilayer perceptron (MLP) is used. The code is modified from the Frozen-Lake example of reinforcement learning using Q-Networks. The model training process follows the Q-learning algorithm (off-policy TD control), which is illustrated in Fig. 1.

Figure 1. Algorithm for Q-learning and the agent-environment interaction in a Markov decision process (MDP) [1].

For each step, the agent first observes the current state, feeds the state values into the MLP and outputs an action that is estimated to attain the highest reward, performs that action on the environment, and fetches the true reward for correcting its parameters. The agent follows the epsilon-greedy policy (ε = 0.1) for striking a balance between exploration and exploitation.

State, Action and Reward

For the 1st generation, price values at certain time points and technical indicators are used for constructing the states. The technical indicators used are the exponential moving average (EMA) and Bollinger bands (N=20, k=2), and time frames of 1, 5 and 15min are used with the last 10 time points being recorded. A total number of 36 inputs are connected to the MLP.

There are three action values for the agent: buy, sell and do nothing. The action being taken by the agent is determined by the corresponding three outputs of the MLP, where sigmoid activation functions are used for mapping the outputs to a value range of 0 ~ 1, representing the probability of the agent taking that action.

For the reward function, the difference between the trade price (the price when a buy/sell action is taken) and the averaged future price is considered. If a buy action is taken, then the reward function is calculated by subtracting the averaged future price with the trade price; if a sell action is taken then the reward is calculated the other way around. For “do nothing” actions, the reward is 0. A spread is subtracted from the reward for buy/sell actions to obtain the final reward. This prevents the agent to perform actions that result in insignificant profit, which would likely lead to a loss for real trades (Fig. 2).

Figure 2. Reward calculation method for buy/sell actions.

Noisy Sine Function Test

For preliminary verification of effectiveness for the training model and methods, a noisy sine wave is generated with Brownian motion of offset and distortion in frequency. This means at a certain time point (min), the price is determined by the following equation:

$$P(t)=P_{bias} + P_{amp} sin{2\pi \over T}t+P_{noise}$$

where Pbias is an offset value with Brownian motion, Pamp is the price vibration amplitude, T is the period with fluctuating values, and Pnoise is the noise of the price with randomly generated values. (Note that the “price” mentioned here is defined as the exchange rate between two currencies)

Fig. 3 shows a randomly generated price vs time sequence within a range of 50,000 minutes with an initial values Pbias = 1.0, T = 120 min, Pamp = 0.005, and Pnoise amplitude = 0.001. Generally, the price seems to fluctuate randomly with no obvious highs or lows. However, if it is viewed close-up, waves with clear highs and lows can be observed (Fig. 4).

Figure 3. Price vs time of the noisy sine wave from 0 to 50,000 min.

Figure 4. Price vs time of the noisy sine wave from 20000 to 20600 min.

The whole time period is 1,000,000 min (approximately 700 days, or 2 years). Initially, a random time period is set for the environment. Every time the agent takes an action, there is a certain chance (= 1%) that the time will jump to another random point within the whole period. Otherwise, the time will move on to a random point which is around 1 ~ 2 day(s) in the future. This setting is expected to correspond to real conditions, where a profitable strategy can have stable earnings and can also adapt quickly to rapid changing environments.

Fig. 5 plots the cumulative profit for trading using the noisy sine wave signal for 50,000 steps. Although it took approximately 25,000 steps to make the model get “on track”, I recognize this result as an important start for implementing real data.

Figure 5. Cumulative profit from trading using a noisy sine wave signal.

Fundamental Analysis for Economic Events

Fundamental analysis is a tricky part in forex trading, since economic events not only correlate with each other, but also might have opposite effects on the price at different conditions. In this project, I extracted the events that are considered significant, and contain previous, forecast and actual values for analysis. Data from 14 countries of the past 10 years are downloaded and columns with incomplete values are abandoned, making a complete table of economic events.

Because different events have different impacts on forex, the price change after the occurrence of an event is monitored, and a correlation between each event and the seven major pairs (commodity pairs). Table 1 displays a portion of the correlation table for different economic events. The values are positive, which indicates the significance of an event on the currency pair. Here, a pair is denoted by the currency other than the USD (e.g. USD/JPY is denoted as JPY).

Table 1. Correlation table between 14 events and 5 currency pairs. Here, a pair is abbreviated as the currency other than the USD.

Country Economic Event (Index)AUDCADEURGBPJPY
AUDCommodity Prices0.00313 0.00268 0.00266 0.00339 0.00278
AUDMI Inflation Expectations0.003380.001680.002170.002000.00266
AUDRBA Interest Rate Decision0.004280.002620.002580.002980.00225
EURManufacturing PMI0.003310.002840.002630.002980.00278
EURItalian CPI0.003150.003190.002950.003160.00255
EURServices PMI0.003410.002900.002930.002950.00284
EURCPI0.003040.002940.002620.003170.00241
EURGerman Unemployment Rate 0.003150.003150.002730.003130.00246
EURECB President Trichet Speaks0.003440.002480.003410.003020.00268
EURGerman Unemployment Change 0.003130.003130.002680.003070.00243
EURGerman Trade Balance0.003060.002550.003000.002840.00268
EURGerman Factory Orders0.002920.002650.003120.002800.00275
EURGerman Retail Sales0.003040.003040.003530.003100.00275
EURFrench Trade Balance0.003120.003120.002960.003010.00299

A total of 983 events are analyzed. However, due to the fact that a large portion of events have little influence on the price, only 125 events that have a relatively significant impact are selected as the inputs of the MLP.

Real Data Implementation Results

Per-minute exchange rate data of the seven currency pair is downloaded from histdata.com. A period from 2010 to 2019 is extracted, and blank values are filled by interpolation. This gives us a total of approximately 23 million records of price data (note that weekends have no forex data records), and is deemed sufficient for model training. The data is integrated into a table, and technical indices are calculated using ta, a technical analysis library for Python built on Pandas and Numpy.

Figure 6. EUR/USD exchange rate from 2010 to 2019.

Summing the inputs from technical analysis, fundamental analysis, and pure price data, a total of 1049 inputs are fed into the MLP. Within the hidden layers, ReLU activation is used, and a sigmoid activation function is used for the output layer. The output has a shape of 7×3, which represents the probability of the seven currency pairs and the three actions (buy, sell, do nothing).

Fig. 7 shows the accumulative profit from 2,000,000 steps in a single episode and its win rate (percentage of profitable trades within a moving average). An increasing spread value from 0.00001 to 0.00004 is applied, which the spread value starts from 0.00001 and increases by 0.00001 every 50,000 step. It can be seen that overall, the accumulative profit rises steadily. However, the win rate usually falls below the 50% line. How could a profitable trading strategy be possible? This is due to the fact that the average profit of a winning trade (=0.003736) is larger than the average loss of a losing trade (=0.003581). Thus, the overall result is a profitable trading strategy.

Figure 7. Accumulative profit and win rate from the training procedure of 2,000,000 steps.

Conclusion

In conclusion, a trading model for profitable forex trading is developed using reinforcement learning. The model can automatically adapt to dynamic environments to maximize its profits. Although for real conditions that have a larger spread, the model hasn’t achieved a stable and profitable result, the potential for optimizing is promising. In the future, I am planning to integrate this trading model with the automated forex trading system that I have made, and become a competitive player in this fascinating game of forex.

[Source code of RL model training section]

References

[1] R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction, MIT Press2018.

Automated Microfluidic Controlling Platform

I have always wanted to automate the cumbersome experimental operation process. For biochemical experiments, often a whole day in web lab is spent in order just to acquire one single set of data. Having the hands-on experience of developing hardware and software integrated systems for several projects (e.g. Surface Plasmon Resonance Platform, Real-time Impedance Detection Systems), I initiated this project for constructing a microfluidic controlling platform that can automatically manipulate liquid-based solutions of little volume, and also assist real-time detection experiments.

Platform Structure

The platform is constructed by four sections: A Raspberry Pi, an actuator module that consists of an Arduino and two H-bridge circuits, a fluid controlling system made up of a syringe pump/syringe device, and the microfluidic platform (Fig. 1).

Figure 1. System architecture of the automated microfluidic controlling platform.

Here, the Raspberry Pi serves as the main processing unit, which an Apache web server is constructed on, and is used to communicate with a remote user by website. The front-end of the website is designed using HTML, Javascript and CSS, and the back-end is designed using PHP. Javascript and PHP communicate using jQuery, and the PHP code is written for controlling peripheral devices.

For x/y position control, the Raspberry Pi sends data through type-B USB to the Arduino, which afterwards commands the H-bridge circuits, then control the x/y stepper motors on the microfluidic platform for moving the position of the racket. The servo motor is used to control the high/low position of the tube it holds, where the high (low) position means the tube isn’t (is) inserted into the microtube (Fig. 2).

Figure 2. The microfluidic platform and surrounding modules.

For fluid control, the Raspberry Pi sends an infuse/withdraw signal to the syringe pump using another type-B USB, which subsequently pushes/pulls the syringe on it. A simple flow for moving liquid from microtube A to microtube B is:

1. Confirm that the tube (for fluid conveyance) position is high. If not, then move it up by servo motor.
2. Move to microtube A by stepper motor.
3. Change tube position to low by servo motor.
4. Withdraw liquid by syringe pump.
5. Change tube position to high by servo motor.
6. Move to microtube B by stepper motor.
7. Change tube position to low by servo motor.
8. Infuse liquid by syring pump.

Hardware Development

Fig. 3 shows a photograph of the whole platform. For the microcontrollers, Raspberry Pi 3 B+ and Arduino Uno are used. For the H-bridge, L298N dual driver module is used. Legato® 111 syringe pump (kd Scientific) and Series 700 Microliter syringe (Hamilton) are used for fluid control. HMS-25BY46L38 stepper motors and an SG90 servo motor are used for the microfluidic platform.

Figure 3. Photograph of the automated microfluidic controlling platform.

The circuitry for this platform is relatively easy compared with other projects (e.g. Aroma Alarm Clock), and is consisted simply with wires and resistors. Except the motors and the racket with microtubes, all the other components of the for microfluidic platform are fabricated using 3D printing. 3D-printed gears are fixed with the stepper motors. Precise control of racket position (~ 0.2mm) is realized by combining the gears with a 3D-printed linear gear that fits on the racket and another linear gear of a subsidiary platform which the racket sits on. Below is a clip demonstrating how the 3D-printed components, the stepper motors, the racket with microtubes, and a microfluidic electrode chip are integrated together, along with x/y position control of the racket.

Software Development

The program structure written inside Raspberry Pi is quite similar to the program structure in another project “Real-time Impedance Detection Systems”. The difference for this project is that PHP is used to directly communicate with peripheral devices (Fig. 1).

Fig. 4 displays the website-based user interface for controlling this platform. The UI is divided in four sections: procedure window, buttons field, racket window, and status window. Briefly, the user can save/load settings from the connected Arduino, add/delete a control command, start/pause/stop the current procedure, download the control procedure to a text file, and append a command after another one.

Figure 4. Website user interface of the platform.

Fig. 5 shows the settings menu. Here, the user needs to find the device url for Arduino and the syringe pump beforehand and insert them. Several controlling preferences, such as motor operation delay time, steps for the stepper motor to move per cell (microtube), precise high/low position of the servo motor, current cell of the racket … etc. can be set.

Figure 5. Settings menu of the UI.

Fig. 6 shows the add command menu. The user can either move the stepper motor to a target cell (microtube), infuse/withdraw fluid at a self-defined rate and target volume, move servo high/low position, or perform a time delay.

Figure 6. The add command menu of the UI.

Concentration Gradient Generation

An automated concentration gradient generation process is written, and is carried out by the automated platform. Here is a clip for demonstration (speed = 10x):

Real-time Impedimetric Detection

The tube doesn’t have to directly connect to a syringe. A detection chip can be inserted between for real-time detection of different solutions (similar to the method illustrated in Fig. 1 of another project Surface Plasmon Resonance Platform). A microfluidic interdigitated electrode chip using microfabrication technique is previously developed (which is a part of my research), and is used for detection of electrochemical impedimetric properties of the fluid. Here, potassium ferricyanide (K3Fe(CN)6) and potassium ferrocyanide (K4Fe(CN)6) are serial diluted using the platform, and real-time impedance detection is carried out using an electrochemical analyzer (CHI614b, CH Instruments) and the microfluidic chip.

Fig. 7 shows the detection result. It can be seen that the solution switching time is relatively fast and stable, and the detection time is consistent, which demonstrates the advantages of this automated platform.

Figure 7. Real-time impedimetric detection plot for different diluted concentrations of K3Fe(CN)6/K4Fe(CN)6.

Summary

In summary, a website-controlled automatic microfluidic controlling platform is designed and fabricated for real-time microfluidic sensing and other applications. Solution manipulation using this platform is stable, repeatable, and time-saving compared with manual operation.