Rebuilding an HNSW index is among the most resource-intensive points of utilizing HNSW in manufacturing workloads. In contrast to conventional databases, the place information deletions may be dealt with by merely deleting a row in a desk, utilizing HNSW in a vector database usually requires a whole rebuild to keep up optimum efficiency and accuracy.
Why is Rebuilding Obligatory?
Due to its layered graph construction, HNSW isn’t inherently designed for dynamic datasets that change steadily. Including new information or deleting present information is important for sustaining up to date information, particularly to be used circumstances like RAG, which goals to enhance search relevence.
Most databases work on an idea known as “onerous” and “mushy” deletes. Laborious deletes completely take away information, whereas mushy deletes flag information as ‘to-be-deleted’ and take away it later. The difficulty with mushy deletes is that the to-be-deleted information nonetheless makes use of important reminiscence till it’s completely eliminated. That is notably problematic in vector databases that use HNSW, the place reminiscence consumption is already a big subject.
HNSW creates a graph the place nodes (vectors) are related primarily based on their proximity within the vector house, and traversing on an HNSW graph is finished like a skip-list. As a way to help that, the layers of the graph are designed in order that some layers have only a few nodes. When vectors are deleted, particularly these on layers which have only a few nodes that function vital connectors within the graph, the entire HNSW construction can change into fragmented. This fragmentation might result in nodes (or layers) which can be disconnected from the principle graph, which require rebuilding of the whole graph, or on the very least will end in a degradation within the effectivity of searches.
HNSW then makes use of a soft-delete approach, which marks vectors for deletion however doesn’t instantly take away them. This strategy lowers the expense of frequent full rebuilds, though periodic reconstruction continues to be wanted to keep up the graph’s optimum state.