Skip to content Skip to sidebar Skip to footer

What Is The Fastest Way To Prepare Data For Rnn With Numpy?

I currently have a (1631160,78) np array as my input to a neural network. I would like to try something with LSTM which requires a 3D structure as input data. I'm currently using t

Solution 1:

Here's an approach using NumPy strides to vectorize the creation of output_x -

nrows = input_x.shape[0] - window_size + 1
p,q = input_x.shape
m,n = input_x.strides
strided = np.lib.stride_tricks.as_strided
out = strided(input_x,shape=(nrows,window_size,q),strides=(m,m,n))

Sample run -

In [83]: input_x
Out[83]: 
array([[ 0.73089384,  0.98555845,  0.59818726],
       [ 0.08763718,  0.30853945,  0.77390923],
       [ 0.88835985,  0.90506367,  0.06204614],
       [ 0.21791334,  0.77523643,  0.47313278],
       [ 0.93324799,  0.61507976,  0.40587073],
       [ 0.49462016,  0.00400835,  0.66401908]])

In [84]: window_size = 4

In [85]: out
Out[85]: 
array([[[ 0.73089384,  0.98555845,  0.59818726],
        [ 0.08763718,  0.30853945,  0.77390923],
        [ 0.88835985,  0.90506367,  0.06204614],
        [ 0.21791334,  0.77523643,  0.47313278]],

       [[ 0.08763718,  0.30853945,  0.77390923],
        [ 0.88835985,  0.90506367,  0.06204614],
        [ 0.21791334,  0.77523643,  0.47313278],
        [ 0.93324799,  0.61507976,  0.40587073]],

       [[ 0.88835985,  0.90506367,  0.06204614],
        [ 0.21791334,  0.77523643,  0.47313278],
        [ 0.93324799,  0.61507976,  0.40587073],
        [ 0.49462016,  0.00400835,  0.66401908]]])

This creates a view into the input array and as such memory-wise we are being efficient. In most cases, this should translate to benefits on performance too with further operations involving it. Let's verify that its a view indeed -

In [86]: np.may_share_memory(out,input_x)
Out[86]: True# Doesn't guarantee, but is sufficient in most cases

Another sure-shot way to verify would be to set some values into output and check the input -

In [87]: out[0] = 0

In [88]: input_x
Out[88]: 
array([[ 0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ],
       [ 0.93324799,  0.61507976,  0.40587073],
       [ 0.49462016,  0.00400835,  0.66401908]])

Post a Comment for "What Is The Fastest Way To Prepare Data For Rnn With Numpy?"