Skip to main content
Effective Frontier.

Make a Profitable Portfolio using Python

In this tutorial, you will learn how to find a combination of stocks with high expected return and low risk using Python.

Introduction to Modern Portfolio Theory

I will not go in-depth about the details of Modern Portfolio Theory, but I will just mention the most important bits of it. Modern Portfolio Theory (MPT or mean-variance analysis) was invented by Harry Markowitz back in 1952 and he was rewarded by a Nobel Prize in Economics for his work. MPT is a mathematical framework for constructing a portfolio of assets such that the expected return is maximized for a given level of risk. For example, suppose you accept a risk level of 10%. How can the expected return be maximized?

Portfolio Theory by Figures

Stocks.

Suppose we have 3 stocks and we want to figure out which stocks are the best to choose. Stock A and stock C have an expected return of 4% (0.04) per month, stock B has an expected return of 2% (0.02) per month. The volatility (the standard deviation, a measure of how much a stock varies) of A and B is 0.05 per month and the volatility of C is 0.07 per month. What stock should we choose? First compare stock A and stock B. For the same risk level (0.05), we get an higher expected return for stock A than for stock B. Therefore, stock A performs (based on past observations) better than stock B. Now compare stock A to stock C. Which stock is better? Now stock A is definitely better because you get the same expected return for a lower level of risk! If C would have an higher expected return, then there is a trade-off. We will explore this trade-off in the following figure:

Effective Frontier.

You can see here the effective frontier (the red line). Stocks near this boundary are optimal as by the arguments in the previous section. It would be inefficient to go for stocks far from the red line. Now here is the trade-off: the farther you go to the left in the figure, the less the volatility (risk) is and the lower the expected return. The more you go to the right, the higher the volatility and higher the expected return. This is an question for yourself. Do you prefer higher risk (and higher returns) or lower risk (and lower returns)? In the remainder of this tutorial, we will gather financial data and make a similar plot.

Gather Financial Data using Python

In this article, I will use Python 3. First, we will install pandas-datareader in order to fetch some stocks from the web. We also need some other libraries (like Pandas, Numpy for computations and Matplotlib for the visualization). This can be done by using the following one-liner:

pip install pandas-datareader==0.4.0 pandas==0.19.2 numpy==1.12.1 matplotlib==2.0.0

Using the pandas-datareader package, we can simply fetch stock data and store the fetched data as CSV files inside a folder:

import pandas_datareader as web
import datetime
import os

# Bootstrap the folder in which we will store the stocks
stock_path = 'stocks'
if not os.path.exists(stock_path):
    os.makedirs(stock_path)

# Fetch some stocks
symbols = ['PHG', 'ABY', 'GOLD', 'ADAP', 'ASML', 'MTLS']
for symbol in symbols:
    print('Fetching %s...' % symbol)
    start = datetime.datetime(2000, 1, 1)
    df = web.DataReader(symbol, 'google', start)
    df.to_csv(os.path.join(stock_path, '%s.csv' % symbol))

After running the code, a folder called “stocks” is created and should contain some CSV files.

Combine the Data

Take a look at one of the data files, for example on GOLD.csv:

Date,Open,High,Low,Close,Volume
2002-07-11,3.29,3.32,3.1,3.26,1022400
2002-07-12,3.25,3.26,3.16,3.18,127200
2002-07-15,3.21,3.25,3.13,3.14,225600
2002-07-16,3.13,3.13,3.11,3.13,16800
2002-07-17,3.14,3.18,3.11,3.12,81600
2002-07-18,3.1,3.11,3.05,3.1,87200
2002-07-19,3.25,3.25,3.11,3.11,352000
2002-07-22,3.1,3.1,3.02,3.1,55200
...

The columns we are interested in, are the Date column and the Open column. We will compute the expected return and standard deviation (based on months) for each of the stocks and generate a plot out of the data:

import matplotlib.pyplot as plt
import pandas as pd
import os

stock_path = 'stocks'
df = pd.DataFrame()
for file in os.listdir(stock_path):
    # Derive the symbol from the filename
    symbol = file.split('.')[0]

    # Load the data
    path = os.path.join(stock_path, file)
    df_stock = pd.read_csv(path)

    # Set the Date as index column
    df_stock['Date'] = pd.to_datetime(df_stock['Date'])
    df_stock = df_stock.set_index('Date')

    # Resample based on months and compute the changes
    resampled = df_stock.resample('BM')
    monthly = resampled.apply(lambda x: x[-1])
    df[symbol] = monthly['Open'].pct_change()

# Drop NaNs
df = df.dropna()

# Make the plot
x = df.std().tolist()
y = df.mean().tolist()
symbols = df.columns

# Scatterplot and annotation
plt.scatter(x, y)
for index, symbol in enumerate(symbols):
    plt.annotate(symbol, (x[index], y[index]))

# Title and axis
plt.xlabel('Risk')
plt.ylabel('Expected Return')
plt.title('Expected Return versus Risk')

# Save the plot
plt.savefig('expected_return_vs_risk.png')

This creates the following figure:

Expected return versus risk.

From this plot, it is clear that (based on past observations at this point in time) AMSL, PHG and MTLS are good candidates. Of course, you should do this with many more stocks in order to choose even better candidates.

Conclusion

Using only a fraction of Modern Portfolio Theory, it is easy to spot good candidates for your portfolio. Beware that the predictions are made based on past observations and this is not a guarantee for the future. However, this would make an educated guess at least. If you have any questions or comments, feel free to post them below!

Kevin Jacobs

Kevin Jacobs

Kevin Jacobs is a certified Data Scientist and blog writer for Data Blogger. He is passionate about any project that involves large amounts of data and statistical data analysis. Kevin can be reached using Twitter (@kmjjacobs), LinkedIn or via e-mail: mail@kevinjacobs.nl.