site logo
Back to topic: Tutorials on investing and portfolios


April 2023 · 5 min read

Don't be scared of the math


There are some consistent terms referenced when investors refer to quantitative strategies, that tend to be mathematical in nature. In this post, we try to simplify these concepts and give some formulae and code so users can try this out for themselves if they wish. We will cover

  • returns: daily and cumulative returns and rolling returns
  • risk: same as above
  • risk-adjusted performance i.e Sharpe ratio
  • drawdowns
  • correlation
Returns and Risk

Let's discuss the two most important concepts in quantitative investing, returns and risk. A daily return is today's asset price minus yesterday's price divided by yesterday's price. Over the course of a year, we will obtain a series of daily returns, one for each day.

  • The returns of a strategy over a year are the compounded daily returns, assuming 252 trading days. This will tell you the overall total annual performance of your strategy.
  • Risk is also referred to as the volatility and is usually measured by the standard deviation of the same return series. The standard deviation of your returns measures the dispersion of your daily returns compared to the average return.

Where E[r] is the average return.

Let's assume we have a quantitative investment strategy and we want to plot the equity graph of our portfolio assuming we invested US$10k in that strategy.

1# Import python libraries 
2import matplotlib.pyplot as plt
3import pandas as pd
5# Assume we have dataframes with dates "my_dates" and returns "my_returns"
6# Plot the equity graph of a US$10k investment in the strategy
7fig, ax = plt.subplots(figsize=(18,12))
8ax.plot(my_dates, 10000*(1+my_returns).cumprod())
Sharpe ratio and rolling Sharpe ratio

The Sharpe ratio measures the risk-adjusted return of an investment. Risk-adjusted refers to the scaling of your returns by the risk taken (volatility) to achieve those returns. A higher risk-adjusted return should be the primary objective for all rational investors. The formula and code for the sharpe ratio calculation are given below, where E(rp - rf) is the average excess return of the portfolio above the risk-free rate and sigma is the volatility of the portfolio.

1# import python libraries
2import pandas as pd
4# Calculate average daily return, volatility and then sharpe ratio
5average_return = rets.rolling(window=252).mean()
6volatility     = rets.rolling(window=252).std()
7rolling_sharpe = np.sqrt(252) * average_return / volatility

We dedicate a full post to drawdown here with graphs and a code snippet for calculating drawdown, so we will only touch briefly on it here by giving some formula to calculate the maximum drawdown. The maximum drawdown, commonly referred to as MDD, is given by the following formula where lv = lowest value after achieving the most recent peak value, given by pv. The drawdown is expressed in percentage terms always hence the multiplication by 100 and is a negative number.


One of the most important concepts in investing is correlation. Diversification and higher risk-adjusted returns can be aided by careful correlation analysis and portfolio construction.

Linear regression

One the most tractable models is linear regression - where an unknown variable is predicted using a known variable, scaled by some coefficient. In a standard linear regression, y is the unknown variable, x is the known variable and beta is the coefficient. The alpha term is the intercept and is the value of y when x is zero.

The coefficient beta is essentially the slope of a line that most people learned in school. It is estimated by a simple algorithm called least squares however, users can just let python do this for you in one line of code.

1import statsmodel.api as sm
3# now fit unknown y using known x
4sm.OLS(y, x, missing="drop").fit(params_only=True).params.values[0]

The natural extension of linear regression is to make it multi-dimensional, which means we take more than one independent variable x as input into our model. This means we have more than one coefficient to estimate. If we assume we have p known variables of type x, then our multi-dimensional linear regression is solved using the same python code and is based off the following equation.

If you wanted to see how to use this as it relates to finance, a toy linear model for the S+P 500 could be built. The researcher could hypothesise that the future S+P annual return depends on five factors, such as:

  • crypto-currency returns
  • US GDP change
  • US treasury bond returns
  • emerging market equity returns
  • US inflation yearly change

This linear model would generate coefficients beta 1 to beta 5, which can then be used to make predictions for future S+P returns.

What are residuals ?

After a linear model has been fitted to your regression problem, the model can then tell the user what the model residuals are. Residuals are the model error. If we assume we have n points in our dataset then your fitted model will give you n residuals - even if the model predicted the answer exactly, then the residual is just zero. In formula, where y is the correct answer and y hat is the estimate, we have the following


More in

Tutorials on investing and portfolios

Data-driven and Quantitative lingo explained clearly
Glossary: Learn some lingo
June 13, 2022 · 5 min read