Skip to content Skip to sidebar Skip to footer

Efficient Way To Process Pandas Dataframe Timeseries With Numba

I have a DataFrame with 1,500,000 rows. It's one-minute level stock market data that I bought from QuantQuote.com. (Open, High, Low, Close, Volume). I'm trying to run some home-mad

Solution 1:

Numba is a NumPy-aware just-in-time compiler. You can pass NumPy arrays as parameters to your Numba-compiled functions, but not Pandas series.

Your only option, still as of 2017-06-27, is to use the Pandas series values, which are actually NumPy arrays.

Also, you ask if the values are "guaranteed to not be a copy of the data". They are not a copy, you can verify that:

import pandas


df = pandas.DataFrame([0, 1, 2, 3])
df.values[2] = 8
print(df)  # Should show you the value `8`

In my opinion, Numba is a great (if not the best) approach to processing market data and you want to stick to Python only. If you want to see great performance gains, make sure to use @numba.jit(nopython=True) (note that this will not allow you to use dictionaries and other Python types inside the JIT-compiled functions, but will make the code run much faster).

Note that some of those indicators you are working with may already have an efficient implementation in Pandas, so consider pre-computing them with Pandas and then pass the values (the NumPy array) to your Numba backtesting function.

Post a Comment for "Efficient Way To Process Pandas Dataframe Timeseries With Numba"