Why Isn't My Dict Lookup Faster Than My List Lookup In Python?

February 15, 2024 Post a Comment

I'm reading each line of a file into both a list and a dict, with open('../data/title/pruned2_titleonly.txt', 'rb') as f_titles: titles_lst = f_titles.read().split('\n') assert

Solution 1:

The above result seems to imply that dictionaries are not as efficient as lists for lookup tables, even though list lookups are O(n) while dict lookups are O(1). I've tested the following to see if the O(n)/O(1) performance was true... turns out it isn't...

It's not true that dict lookups are O(N), in the sense of "getting an item" which is the sense your code seems to test. Determining where (if at all) an element exists could be O(N), e.g. somelist.index(someval_not_in_the_list) or someval_not_in_the_list in somelist will both have to scan over each element. Try comparing x in somelist with x in somedict to see a major difference.

But simply accessing somelist[index] is O(1) (see the Time Complexity page). And the coefficient is probably going to be smaller than in the case of a dictionary, also O(1), because you don't have to hash the key.

Solution 2:

What is the deal?

The reason for this behaviour is that for dictionary access, the index (key) is hashed, and the hash is then used to access the value in a hash table. In a list, the index is a direct mapping to the respective slot in the list, which is essentially a calculated offset into memory.

See implementation of dictionaries and lists, respectively.

The above result seems to imply that dictionaries are not as efficient as lists for lookup tables,

No, we cannot conclude that. What your experiment shows is one particular instance, comparing numeric index access in a list to hashed-key access in a hash table. Obviously there is more work to be done to access the dictionary (namely, hash the index).

I've tested the following to see if the O(n)/O(1) performance was true... turns out it isn't...

I also ran some tests, see below. Before we go into details, note that all O(1) states is that the access time is independent of the size of the respective collection object. To state Wikipedia:

An algorithm is said to be constant time (also written as O(1) time) if the value of T(n) is bounded by a value that does not depend on the size of the input.

In other words, given some collection object x, all access to this object will be independent of the size of the object. Note that O(1) also does not mean, however, that all accesses will have the exact same running time. Again to quote Wikipedia:

Despite the name "constant time", the running time does not have to be independent of the problem size, but an upper bound for the running time has to be bounded independently of the problem size.

The key word here is bounded. We can verify that the access time is within reasonable bounds by slightly modifying your program. Note that I am generating random input instead of building the list and dict objects from a file. Also I have simplified the code to test access only (your code builds a new list for each round, which may change the scale of the result).

import sha
from timeit import timeit
import random
import numpy as np

N = 1000# generate random input of size N
titles_lst = [sha.sha(str(x)).digest() for x inrange(1, N)]
titles_dict = {}
for i inrange(0, len(titles_lst)):
    titles_dict[i] = titles_lst[i]
# permutate access pattern
n = len(titles_lst)
a = np.random.permutation(n)

defaccess_list():
    x = titles_lst[b]

defaccess_dict():
    x = titles_dict[b]

# run measurements for dictionary
X = []
C = 1000for i inrange(C):
    for b in a:
       X.append(timeit(access_dict, number=C))
print"dict std=%f avg=%f" % (np.std(X), np.mean(X))
# run measurements for list       
X = []
for i inrange(C):
    for b in a:
       X.append(timeit(access_list, number=C))
print"list std=%f avg=%f" % (np.std(X), np.mean(X))

On my 2.7GHz macmini this yields the following results.

N=100 C=1000 dict std=0.000001 avg=0.000002N=100 C=1000 list std=0.000001 avg=0.000001N=1000 C=1000 dict std=0.000001 avg=0.000002N=1000 C=1000 list std=0.000001 avg=0.000001N=10000 C=1000 dict std=0.000001 avg=0.000002N=10000 C=1000 list std=0.000001 avg=0.000001

Clearly, the tests confirm that both dictionary and lists have indeed access time in O(1). They also confirm that dictionary look ups are, on average, slower than list access in this particular case, using a numeric index/key.

Solution 3:

list[] is geting something from a specific known location in the list - O(1)

dict[] is searching the dictionary using a key - O(1) on average

if you want to compare searching try to search a list - O(n)

[x for x in list if x==val]

to see the real difference

lacucinadiadine

Why Isn't My Dict Lookup Faster Than My List Lookup In Python?

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Why Isn't My Dict Lookup Faster Than My List Lookup In Python?"

Widget HTML #3