How to get historical stock data

In the first three articles about coding with financial data (How to code a backtest, How to code a basic stock scanner and How to code the first indicators) I did not mention how to get historical stock data.

You will know about it after reading this article! 🙂

Historical stock data

To create a candlestick chart you need four prices for each candle:
The price when market opened and closed, the high of the time period the candle represents and the low. In addition to that the volume is useful. It’s the number of shares traded in the time period (= 1 day, 1 week, 1 month, 1 minute, … per candle).

For our example we use daily candles, so we need the date/day, open, close, high, low and volume.

You know from the other articles that the best module for data sets is pandas. You can simply import it in your code with

import pandas

The 6 best sources for stock data

For collecting the data we have several possibilities:

  1. Yahoo Data
  2. IEX Cloud
  3. Alpha Vantage
  4. Quandl
  5. Quantopian
  6. IB API

To be honest, I tried all of them but I do not like all of them.

In the beginning of my first financial applications in Python I used the Yahoo API but then they stopped this service and I searched for an alternative.
Alpha Vantage should be one of the easiest but I didn’t get it run except some errors and the Quandl system, website and data feeds seemed too complicated for me (but maybe one of them is perfect for you).
Quantopian has a very high quality and a helpful community but you have to use their own environment. I coded many backtests there but I like to have my own data and my own code to know that every mistake is mine and not one of another system.

Why I finally ended up with Yahoo and why I am also working with the most complicated API (Interactive Brokers API), you will learn in the next sections.

What about Yahoo Finance?

Yes, you got it right. Yahoo stopped it’s API service but I am using Yahoo data again.

You can enter a ticker symbol on the Yahoo Finance page and then after choosing the tab “Historical Data” you can select time period and frequency and download a csv-file like the ones we used for our last examples by clicking the little link called “Download Data”.

And whenever data is displayed online or can be downloaded there are ways to do the steps automatically and “collect” the data in a variable.

This isn’t legal in every context but because there are modules created for and Yahoo didn’t block anything like that, we can use it for some stocks. I wouldn’t recommend to load huge amounts of data because it isn’t an official API but you can use it for your own single charts, some calculations or our examples.

But now enough words about the theory.

Get your candles from Yahoo

As I wrote above, there’s already a module (library) written for this purpose. It changed from fix-yahoo-finance (because it fixed the old API-library pandas_datareader) to yfinance.

Like every module you can add it (and same for pandas) simply by

pip3 install yfinance
pip3 install pandas

In our example we want to fill in the Yahoo data for XOM into a pandas DataFrame:

import yfinance as yf

ticker  = "XOM"
data    = yf.download(ticker, start="2018-11-08", end="2019-11-08")

To show the last 5 or more rows of our data and to check if the returned variable data is really a DataFrame, we can add these lines:

print("Last 5 rows:")
print(data.tail())
print("And the data type:", type(data))

Because we always need to handle dates in our codes, I will now change the example to a version without the (disturbing) progress bar and some date functions but with exactly the same result:

import yfinance as yf
from datetime import datetime, timedelta

ticker      = "XOM"

date_format = "%Y-%m-%d"
today       = datetime.now().strftime(date_format)
from_date   = (datetime.now() - timedelta(days=365)).strftime(date_format)

data        = yf.download(ticker, start=from_date, end=today, progress=False)

print("Last 5 rows:")
print(data.tail())
print("And the data type:", type(data))

More stock data…

If you need more than one stock for your calculations you can just enter several tickers like "XOM PG EDIT". Here you have to work with 2 levels of columns:

import yfinance as yf
from datetime import datetime, timedelta

ticker      = "XOM PG EDIT"
data        = yf.download(ticker, start="2018-11-08", end="2019-11-08", progress=False)

print("Last 5 rows:")
print(data.tail())
print("And the data type:", type(data))

print(data.Close.EDIT.tail())
print(data.Volume.XOM.tail())
print(data.columns)

Personally, I do not like this style of DataFrame. I prefer to use yf.download() several times. It’s a little bit slower but much easier to handle in my opinion.

It’s simple and easy, isn’t it? That’s why I like Yahoo as a source for historical stock data when I need just some ticker symbols.

With the DataFrame full of price data you can do all the calculations you want and those you find here on Market and Us. Many of them will follow!

No limits? IB API!

Why do we need more? Isn’t it enough?

No. You already read that Yahoo isn’t for bigger amounts of stock data. And what about the trading itself? Wouldn’t it be marvelous to scan for good setups and to set the calculated orders directly?

Of course! And that is what’s possible with the API of Interactive Brokers (IB).
To be correct I must mention that the IEX Cloud and Quantopian can both be used also to trade live. But you know: I do not like them. And I already use IB as my broker (= cheap, all in one place and an ugly antiquated trading software 😉 ; the app is getting better and better).

I will just focus on handling historical data because that’s the title of this article. But because it isn’t easy to install the official API and to handle the objects, more articles about IB and its API will follow.

Tricks for the IB API

Nevertheless, a few important things:

  • Activate the Read-Only-Mode in your Setup. Then it isn’t possible to set orders or do anything what would change your portfolio.
  • TWS or the Gateway must be open for getting access to the API.
  • The official API (unlike the easier but less powerful module ibPy) is object-oriented. You cannot write the code from top to bottom. There are “Threads” running catching the data you want.
  • It’s much easier at the beginning to use a frame of code and add some features step by step than building everything for your own.

The module importing section

For the IB API we need more modules:

from ibapi.wrapper import EWrapper
from ibapi.client import EClient
from ibapi.contract import Contract
from ibapi.common import BarData, TickerId

import pandas as pd
from datetime import datetime

The EWrapper and EClient are the main classes (you will learn more about classes later in a separate article) of the API. Contract is a data type (also a class) for contracts like stocks, currencies, futures and options. And BarData and TickerId are used for functions within our code.
To make it easier for later projects you can import all parts of a module with from ibapi.common import *.
The last two modules are well-known from above.

The main class for API projects

When you do not need the individual classes you can combine EWrapper and EClient to one class. Here is the basic class with the __init__ function:

class ibApp(EWrapper, EClient):
    def __init__(self, ipaddress, portid, clientid):
        EWrapper.__init__(self)
        EClient.__init__(self, wrapper=self)

        self.connect(ipaddress, portid, clientid)

        self.data = pd.DataFrame(columns=['Open', 'High', 'Low', 'Close', 'Volume'])

Whenever a variable is declared as ibApp (you can use another name if you want), the __init__ function is called. Here both basic classes (it’s called inheritance) are initialized, the function connect of the class ibApp is called (it’s in the library, so we do not need it here but we have to set three parameters: the IP address, the port and the client ID) and I already added the initialization of our DataFrame we want to get filled.

By using inherited classes you can add your own lines of code to the functions but you do not have to.

And the main code

Yes, I call it the main code but it’s only the outer frame of the code:

def main():
    try:
        app         = ibApp("127.0.0.1", 7496, 999)

        tickerId    = 4000
        stock       = setContract("XOM")
        date_format = "%Y%m%d %H:%M:%S"
        endDateTime = datetime.now().strftime(date_format)
        duration    = "1 Y"
        candleSize  = "1 day"

        app.reqHistoricalData(tickerId, stock, endDateTime, duration, candleSize, "TRADES", 1, 1, False, []) 

        app.run()
    except Exception as e:
        print("It's a mistake, I am sorry ({}).".format(str(e)))
    finally:
        app.disconnect()

if __name__ == '__main__':
    main()

To shorten the explanation:
Our main class is initialized with the parameters (the port must be the same as entered in TWS/Gateway setup!), a contract and the start and end of our data stream are set, the request as well as the app is started and we catch some errors.

For a better overview I outsourced to set the contract:

def setContract(CSymbol, CSecType="STK", CExchange="ISLAND", CCurrency="USD"):
    stock = Contract()
    stock.symbol = CSymbol
    stock.secType = CSecType
    stock.exchange = CExchange
    stock.currency = CCurrency

    return stock

An interesting point is the exchange of the contract. Instead of selecting the right exchange (like NYSE or NASDAQ) you simply set "ISLAND" for US stocks. For German stocks it’s "IBIS" (the former name for XETRA) most of the time.

But what about IB’s historical stock data?

The code can already be started, but nothing would happen. Okay, you’ll see some lines with the results of checking the market data. But what about the results of our request in the __init__ function?

Two functions (part of the ibApp-class) are called as a result of our request and we have to include them and add our own code elements:

    def historicalData(self, reqId:int, bar: BarData):
        row = [ bar.open, bar.high, bar.low, bar.close, bar.volume ]
        self.data.loc[ datetime.strptime(bar.date, "%Y%m%d").strftime("%Y-%m-%d") ] = row

    def historicalDataEnd(self, reqId: int, start: str, end: str):
        super().historicalDataEnd(reqId, start, end)

        print(self.data.tail())
        print(type(self.data))

        self.done = True

Every moment one data package of a candle is received by our code the function historicalData is called. We can find the historical stock data for one candle in the variable bar.
So we sort this candle data as a row and add it to our DataFrame with the new formatted date as index.

When this process is finished and all candles are received historicalDataEnd is called. And because in this moment we do not need anything anymore we can show the last 5 candles, the type of the variable as above and then tell the class that we’re done with everything and finish our program.

Complicated? Yes!

But you know already what’s possible with the IB API. It’s hard to understand the complex object-oriented and thread-controlled API system. For some challenges you have to change the source code of the module for yourself.
The complexity is based on almost direct access to the interactions with the IB servers. Therefore, the scope of all functions is enormous. Everything is possible!

Of course, I will write more detailed articles about the parts of coding with the IB API. And more features will follow: Requesting fundamental data, details about your current running trades and order setting.
And to be informed about every new article and theme you should enter your email address in the form on the right and subscribe to our newsletter. We would be happy to send you the news!

Happy coding with the two methods of this article!

Alexander

Alexander bought his first stock in October 2009 without knowing about the luck for this point of time. In 2016 he started to trade, since 2017 he notes down watchlists and statistics every day and because he knows how to code since he was a child, he uses Python, PHP, HTML5 and JS for making the daily to-dos easier. Because many of his friends wanted him not to stop writing about the markets he started this blog to share his ideas and tools.

9 Comments

  • Dear Alexander, could you explain why the Readonly mode should be active? We could actually deactivate it, and test our code in the demo TWS, so nothing can happen with our real money ….

  • Is it actually possible to get the volume (number of shares traded per day) of X months before a certain data?

  • I will answer to both questions here:

    1. Why Read-Only-Mode?
    I recommend to active the Read-Only-Mode because this is the safest solution. Maybe you code some scripts for the paper account and some for the live account. With the activation you do not have to think about anything. Just code your stuff. Of course, when you want to set or change orders you have to deactivate it. But you can try it first with Read-Only-Mode because you will get notifications and you can check your code results in the API log file.

    2. Sum of volume for time period?
    Of course. Like in our example you can get the volume for each day. You can sum it up easily by data.Volume.sum() for example. And for your historical data you can chose whatever time period you want by setting the last date and the duration for the request.

  • What do you mean exactly by “TWS or the Gateway must be open for getting access to the API”? How/where to open it?

    • You cannot use the IB API without having started TWS or Gateway. When you use an automation trading system you will run it on a server where you have no graphical interface. And that’s not easy. It’s possible. I can write about it in another article. But on your local computer you cannot run Python code using the IB API without running TWS or Gateway.

  • It seems I simply can’t get my IB API running, even though I followed all steps described on the IB webpage.

    I get following error:
    from ibapi.wrapper import EWrapper
    ModuleNotFoundError: No module named ‘ibapi’

    It seems the IB API does not get installed on my python environment

    • I am sorry but when you get this message IB API is not installed on your system. You are right.
      Maybe you can wait for my article about the installation because my first trials some years ago were also not successful. Or you can write about the steps you did with a reply.

  • Dear Alexander,
    thanks to this article and the clearity you’re explaining the coding process, even as a Python newbie, I’m now able to extract candlestick values and I’m seeing them right now in my Terminal. This is crazy exciting and can’t wait for the next articles!!!!!

Leave a Reply

Your email address will not be published. Required fields are marked *