Skip to content Skip to sidebar Skip to footer

Filter Numpy Array Of Tuples

Scikit-learn library have a brilliant example of data clustering - stock market structure. It works fine within US stocks. But when one adds tickers from other markets, numpy's err

Solution 1:

So quotes is a list of recarrays, and in date_all you collect the intersection of all values in the date field.

I can recreate one such array with:

In [286]: dt=np.dtype([('date', 'O'), ('year', '<i2'), ('month', 'i1'), ('day', 
     ...:
     ...: ), ('low', '<f8'), ('volume', '<f8'), ('aclose', '<f8')])
In [287]: 
In [287]: arr=np.ones((2,), dtype=dt)  # 2 element structured array
In [288]: arr
Out[288]: 
array([(1, 1, 1, 1,  1.,  1.,  1.,  1.,  1.,  1.,  1.),
       (1, 1, 1, 1,  1.,  1.,  1.,  1.,  1.,  1.,  1.)], 
      dtype=[('date', 'O'), ('year', '<i2'), ('month', 'i1'), ('day', 'i1'), ... ('aclose', '<f8')])
In [289]: type(arr[0])
Out[289]: numpy.void

turn that into a recarray (I dont' use those as much as plain structured arrays):

In [291]: np.rec.array(arr)
Out[291]: 
rec.array([(1, 1, 1, 1,  1.,  1.,  1.,  1.,  1.,  1.,  1.),
 (1, 1, 1, 1,  1.,  1.,  1.,  1.,  1.,  1.,  1.)], 
          dtype=[('date', 'O'), ('year', '<i2'), ('month', 'i1'), ('day', 'i1'), .... ('aclose', '<f8')])

dtype of the recarray displays slightly different:

In[292]: _.dtypeOut[292]: dtype((numpy.record, [('date', 'O'), ('year', '<i2'), ('month', 'i1'), ....('aclose', '<f8')]))
In[293]: __.dateOut[293]: array([1, 1], dtype=object)

In any case the date field is an array of objects, possibly of datetime?

q is one of these arrays; i is an element, and i.date is the date field.

 [i for i in q if i.date in dates_all]

So filtered is list of recarray elements. np.stack does a better job of reassembling them into an array (that works with the recarray too).

np.stack([i for i in arr if i['date'] in alist])

Or you could collect the indices of the matching records, and index the quote array

In[319]: [i for i,v in enumerate(arr) if v['date']inalist]
Out[319]: [0, 1]In[320]: arr[_]

or pull out the date field first:

In[321]: [i for i,v in enumerate(arr['date']) ifvinalist]
Out[321]: [0, 1]

in1d might also work to search

In [322]: np.in1d(arr['date'],alist)
Out[322]: array([ True,  True], dtype=bool)
In [323]: np.where(np.in1d(arr['date'],alist))
Out[323]: (array([0, 1], dtype=int32),)

Post a Comment for "Filter Numpy Array Of Tuples"