Can A Dask Dataframe With A Unordered Index Cause Silent Errors?
Methods around dask.DataFrame all seem to make sure, that the index column is sorted. However, by using from_delayed, it is possible to construct a dask dataframe that has a index
Solution 1:
Many dask.dataframe operations will refuse to operate or will operate with slower algorithms on dataframes without known divisions. See http://dask.pydata.org/en/latest/dataframe-design.html#partitions
For example df.loc
is fast if dask.dataframe knows that the index is sorted and it knows the min/max of each partition. However if this information is not known then df.loc
has to look through all of the partitions exhaustively.
Generally speaking dask.dataframe is aware of the possibility that you bring up and should act accordingly. Some operations will be slower. Some operations will refuse to operate.
Post a Comment for "Can A Dask Dataframe With A Unordered Index Cause Silent Errors?"