Skip to content Skip to sidebar Skip to footer

What Is A Good Way To Speed Up Test Runs Utilizing Larger Spacy Models?

I have constructed some tests relying on the en_core_web_md model. The model takes ~15 sec to load into memory on my computer making the tests a pain to run. Is there a smart way t

Solution 1:

The v2.2.[0-5] md models have a minor bug that make them particularly slow to load (see https://github.com/explosion/spaCy/pull/4990).

You can reformat one file in the model package to improve the load time. In the vocab directory for the model package (e.g., lib/python3.7/site-packages/en_core_web_md/en_core_web_md-2.2.5/vocab):

importsrslyorig_data= srsly.read_msgpack("key2row")
new_data = {}
for key, value in orig_data.items():
    new_data[int(key)] = int(value)
srsly.write_msgpack("key2row", new_data)

In my tests, this nearly halves the loading time (18s to 10s). The remaining time is mostly loading strings and lexemes for the model, which is harder to optimize further at this point. So this improves things a bit but the overall load time is still relatively burdensome for short tests.

Post a Comment for "What Is A Good Way To Speed Up Test Runs Utilizing Larger Spacy Models?"