Predicting Stock Prices using Reinforcement Learning (with Python Code!) (2024)

This article was published as a part of the Data Science Blogathon.

Introduction

The share price of HDFC Bank is going up. It’s on an increasing trend. People are selling in higher numbers and making some instant money.

These are sentences we hear about the stock market on a regular basis nowadays. You can replace HDFC with any other stock that thrived during a tumultuous 2020 and the narrative remains pretty similar.

The stock market is an interesting medium to earn and invest money. It is also a lucrative option that increases your greed and leads to drastic decisions. This is majorly due to the volatile nature of the market. It is a gamble that can often lead to a profit or a loss. There is no proper prediction model for stock prices. The price movement is highly influenced by the demand and supply ratio.

In this article, we will try to mitigate that through the use of reinforcement learning. We will go through the reinfrocement learning techniques that have been used for stock market prediction.

Techniques We Can Use for Predicting Stock Prices

As it is a prediction of continuous values, any kind of regression technique can be used:

  • Linear regression will help you predict continuous values
  • Time series models are models that can be used for time-related data
  • ARIMA is one such model that is used for predicting futuristic time-related predictions
  • LSTM is also one such technique that has been used for stock price predictions. LSTM refers to Long Short Term Memory and makes use of neural networks for predicting continuous values. LSTMs are very powerful and are known for retaining long term memory

However, there is another technique that can be used for stock price predictions which is reinforcement learning.

Predicting Stock Prices using Reinforcement Learning (with Python Code!) (2)

What is Reinforcement Learning?

Reinforcement learning is another type of machine learning besides supervised and unsupervised learning. This is an agent-based learning system where the agent takes actions in an environment where the goal is to maximize the record. Reinforcement learning does not require the usage of labeled data like supervised learning.

Reinforcement learning works very well with less historical data. It makes use of the value function and calculates it on the basis of the policy that is decided for that action.

Reinforcement learning is modeled as a Markov Decision Process (MDP):

  • An Environment E and agent states S

  • A set of actions A taken by the agent

  • P(s,s’)=>P(st+1=s’|st=s,at=a) is the transition probability from one state s to s’

  • R(s,s’) – Immediate reward for any action

How can we predict stock market prices using reinforcement learning?

The concept of reinforcement learning can be applied to the stock price prediction for a specific stock as it uses the same fundamentals of requiring lesser historical data, working in an agent-based system to predict higher returns based on the current environment. We will see an example of stock price prediction for a certain stock by following the reinforcement learning model. It makes use of the concept of Q learning explained further.

Steps for designing a reinforcement learning model is –

  • Importing Libraries
  • Create the agent who will make all decisions
  • Define basic functions for formatting the values, sigmoid function, reading the data file, etc
  • Train the agent
  • Evaluate the agent performance

Define the Reinforcement Learning Environment

MDP for Stock Price Prediction:

  • Agent – An Agent A that works in Environment E
  • Action – Buy/Sell/Hold
  • States – Data values
  • Rewards – Profit / Loss
Predicting Stock Prices using Reinforcement Learning (with Python Code!) (3)

The Role of Q – Learning

Q-learning is a model-free reinforcement learning algorithm to learn the quality of actions telling an agent what action to take under what circ*mstances. Q-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any successive steps, starting from the current state.

Obtaining Data

  1. Go to Yahoo Finance

  2. Type in the company’s name for eg. HDFC Bank

  3. Select the time period for e.g. 5 years

  4. Click on Download to download the CSV file

Predicting Stock Prices using Reinforcement Learning (with Python Code!) (4)

Let’s Implement Our Model in Python

Importing Libraries

To build the reinforcement learning model, import the required python libraries for modeling the neural network layers and the NumPy library for some basic operations.

import kerasfrom keras.models import Sequentialfrom keras.models import load_modelfrom keras.layers import Densefrom keras.optimizers import Adamimport mathimport numpy as npimport randomfrom collections import deque

Creating the Agent

The Agent code begins with some basic initializations for the various parameters. Some static variables like gamma, epsilon, epsilon_min, and epsilon_decay are defined. These are threshold constant values that are used to drive the entire buying and selling process for stock and keep the parameters in stride. These min and decay values serve like threshold values in the normal distribution.

The agent designs the layered neural network model to take action of either buy, sell, or hold. This kind of action it takes by looking at its previous prediction and also the current environment state. The act method is used to predict the next action to be taken. If the memory gets full, there is another method called expReplay designed to reset the memory.

Class Agent:

 def __init__(self, state_size, is_eval=False, model_name=""): self.state_size = state_size # normalized previous days self.action_size = 3 # sit, buy, sell self.memory = deque(maxlen=1000) self.inventory = [] self.model_name = model_name self.is_eval = is_eval self.gamma = 0.95 self.epsilon = 1.0 self.epsilon_min = 0.01 self.epsilon_decay = 0.995 self.model = load_model(model_name) if is_eval else self._model() def _model(self): model = Sequential() model.add(Dense(units=64, input_dim=self.state_size, activation="relu")) model.add(Dense(units=32, activation="relu")) model.add(Dense(units=8, activation="relu")) model.add(Dense(self.action_size, activation="linear")) model.compile(loss="mse", optimizer=Adam(lr=0.001)) return model def act(self, state): if not self.is_eval and random.random()<= self.epsilon: return random.randrange(self.action_size) options = self.model.predict(state) return np.argmax(options[0]) def expReplay(self, batch_size): mini_batch = [] l = len(self.memory) for i in range(l - batch_size + 1, l): mini_batch.append(self.memory[i]) for state, action, reward, next_state, done in mini_batch: target = reward if not done: target = reward + self.gamma * np.amax(self.model.predict(next_state)[0]) target_f = self.model.predict(state) target_f[0][action] = target self.model.fit(state, target_f, epochs=1, verbose=0) if self.epsilon > self.epsilon_min: self.epsilon *= self.epsilon_decay

Define Basic Functions

The formatprice() is written to structure the format of the currency. The getStockDataVec() will bring the stock data into python. Define the sigmoid function as a mathematical calculation. The getState() is coded in such a manner that it gives the current state of the data.

def formatPrice(n): return("-Rs." if n<0 else "Rs.")+"{0:.2f}".format(abs(n))def getStockDataVec(key): vec = [] lines = open(key+".csv","r").read().splitlines() for line in lines[1:]: #print(line) #print(float(line.split(",")[4])) vec.append(float(line.split(",")[4])) #print(vec) return vec def sigmoid(x): return 1/(1+math.exp(-x))def getState(data, t, n): d = t - n + 1 block = data[d:t + 1] if d >= 0 else -d * [data[0]] + data[0:t + 1] # pad with t0 res = [] for i in range(n - 1): res.append(sigmoid(block[i + 1] - block[i])) return np.array([res])

Training the Agent

Depending on the action that is predicted by the model, the buy/sell call adds or subtracts money. It trains via multiple episodes which are the same as epochs in deep learning. The model is then saved subsequently.

import sysstock_name = input("Enter stock_name, window_size, Episode_count")window_size = input()episode_count = input()stock_name = str(stock_name)window_size = int(window_size)episode_count = int(episode_count)agent = Agent(window_size)data = getStockDataVec(stock_name)l = len(data) - 1batch_size = 32for e in range(episode_count + 1): print("Episode " + str(e) + "/" + str(episode_count)) state = getState(data, 0, window_size + 1) total_profit = 0 agent.inventory = [] for t in range(l): action = agent.act(state) # sit next_state = getState(data, t + 1, window_size + 1) reward = 0 if action == 1: # buy agent.inventory.append(data[t]) print("Buy: " + formatPrice(data[t])) elif action == 2 and len(agent.inventory) > 0: # sell bought_price = window_size_price = agent.inventory.pop(0) reward = max(data[t] - bought_price, 0) total_profit += data[t] - bought_price print("Sell: " + formatPrice(data[t]) + " | Profit: " + formatPrice(data[t] - bought_price)) done = True if t == l - 1 else False agent.memory.append((state, action, reward, next_state, done)) state = next_state if done: print("--------------------------------") print("Total Profit: " + formatPrice(total_profit)) print("--------------------------------") if len(agent.memory) > batch_size: agent.expReplay(batch_size) if e % 10 == 0: agent.model.save(str(e))

Training Output at the end of the first episode:

Total Profit: Rs.340.03
Predicting Stock Prices using Reinforcement Learning (with Python Code!) (5)

Evaluation of the model

Once the model has been trained depending on new data, you will be able to test the model for the profit/loss that the model is giving. You can accordingly evaluate the credibility of the model.

stock_name = input("Enter Stock_name, Model_name")model_name = input()model = load_model(model_name)window_size = model.layers[0].input.shape.as_list()[1]agent = Agent(window_size, True, model_name)data = getStockDataVec(stock_name)print(data)l = len(data) - 1batch_size = 32state = getState(data, 0, window_size + 1)print(state)total_profit = 0agent.inventory = []print(l)for t in range(l): action = agent.act(state) print(action) # sit next_state = getState(data, t + 1, window_size + 1) reward = 0 if action == 1: # buy agent.inventory.append(data[t]) print("Buy: " + formatPrice(data[t])) elif action == 2 and len(agent.inventory) > 0: # sell bought_price = agent.inventory.pop(0) reward = max(data[t] - bought_price, 0) total_profit += data[t] - bought_price print("Sell: " + formatPrice(data[t]) + " | Profit: " + formatPrice(data[t] - bought_price)) done = True if t == l - 1 else False agent.memory.append((state, action, reward, next_state, done)) state = next_state if done: print("--------------------------------") print(stock_name + " Total Profit: " + formatPrice(total_profit)) print("--------------------------------") print ("Total profit is:",formatPrice(total_profit))

End Notes

Reinforcement learning gives positive results for stock predictions. By using Q learning, different experiments can be performed. More research in reinforcement learning will enable the application of reinforcement learning at a more confident stage.

You can reach out to

blogathonpythonReinforcement Learningstock pricestock price prediction

Ekta23 Dec, 2020

IntermediatePythonReinforcement LearningStock TradingTime Series

Predicting Stock Prices using Reinforcement Learning (with Python Code!) (2024)
Top Articles
52 Week Money Challenge: Save For A Better Year!
How to Use Sinking Funds for Budget Success
3 Tick Granite Osrs
It may surround a charged particle Crossword Clue
Jonathon Kinchen Net Worth
Displays settings on Mac
Evita Role Wsj Crossword Clue
Purple Crip Strain Leafly
Craigslist Pets Sac
735 Reeds Avenue 737 & 739 Reeds Ave., Red Bluff, CA 96080 - MLS# 20240686 | CENTURY 21
Blackwolf Run Pro Shop
Mals Crazy Crab
Roof Top Snipers Unblocked
Foxy Brown 2025
Christina Steele And Nathaniel Hadley Novel
Graphic Look Inside Jeffrey Dahmer
Titanic Soap2Day
U Of Arizona Phonebook
‘The Boogeyman’ Review: A Minor But Effectively Nerve-Jangling Stephen King Adaptation
Craigslist Apartments Baltimore
Rubber Ducks Akron Score
Tomb Of The Mask Unblocked Games World
The Goonies Showtimes Near Marcus Rosemount Cinema
Jail Roster Independence Ks
The Creator Showtimes Near Baxter Avenue Theatres
What Is Opm1 Treas 310 Deposit
How Do Netspend Cards Work?
Rogold Extension
Fox And Friends Mega Morning Deals July 2022
Newsday Brains Only
Scioto Post News
Justin Mckenzie Phillip Bryant
Selfservice Bright Lending
Craigslist Neworleans
Rocketpult Infinite Fuel
A Man Called Otto Showtimes Near Amc Muncie 12
Keeper Of The Lost Cities Series - Shannon Messenger
Ksu Sturgis Library
Frommer's Philadelphia &amp; the Amish Country (2007) (Frommer's Complete) - PDF Free Download
Infinite Campus Farmingdale
Gt500 Forums
F9 2385
The Conners Season 5 Wiki
Below Five Store Near Me
Hillsborough County Florida Recorder Of Deeds
Movie Hax
St Anthony Hospital Crown Point Visiting Hours
Zom 100 Mbti
Okta Login Nordstrom
Phunextra
Jesus Calling Oct 6
Cbs Scores Mlb
Latest Posts
Article information

Author: Catherine Tremblay

Last Updated:

Views: 6165

Rating: 4.7 / 5 (67 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Catherine Tremblay

Birthday: 1999-09-23

Address: Suite 461 73643 Sherril Loaf, Dickinsonland, AZ 47941-2379

Phone: +2678139151039

Job: International Administration Supervisor

Hobby: Dowsing, Snowboarding, Rowing, Beekeeping, Calligraphy, Shooting, Air sports

Introduction: My name is Catherine Tremblay, I am a precious, perfect, tasty, enthusiastic, inexpensive, vast, kind person who loves writing and wants to share my knowledge and understanding with you.