CAGRA new vector addition #2157

enp1s0 · 2024-02-06T04:35:15Z

This PR introduces the new vector addition feature to CAGRA.

CAGRA-Q is not supported

Usage

auto additional_dataset = raft::make_host_matrix<float, int64_t>(res,updated_dataset_size, dim);
raft::neighbors::cagra::extend(handle, raft::make_const_mdspan(additiona_dataset.view()), cagra_index);

Algorithm

Graph degree: d

The algorithm consists of two stages: rank-based reordering and reverse edge addition.

Rank-based reordering
1-1. Obtain d' (=2d) nearest neighbor vectors (V) of a given new vector using the CAGRA search
1-2. Count the number of detourable edges using the result of step 1 and the neighbor list of the input index. Then we prune (3*d/2) edges in the same way as the CAGRA graph optimization. Through this operation, we decide d/2 neighbors.
Reverse edge addition
2-1. Count the number of incoming edges for all nodes.
2-2. Add d/2 reverse edges from the nodes added to the neighbor list in Step 1 by replacing a node with a new node. To prevent the connection to the replaced node from being lost, we add the node to the neighbor list of the new node. This allow us to make a detour connection. The replaced nodes are the largest number of incoming edge nodes in the 2/d nodes from the back of the neighbor list without duplication with the nodes already in the neighbor list.

Performance

In this experiment, we first split the dataset into two parts: the initial and the additional part. Then, we extend the CAGRA index built by the initial part to include the additional part.

We can see a larger recall drop compared to the baseline by increasing the number of added vectors.
Therefore, rebuilding the CAGRA index is recommended when one wants to add a lot of vectors.

TODO

Implementation
Test

copy-pr-bot · 2024-02-06T04:35:20Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

tfeher · 2024-04-25T12:11:22Z

/ok to test

enp1s0 · 2024-04-25T12:27:50Z

/ok to test

tfeher · 2024-04-25T12:28:13Z

/ok to test

…agra-add-new-vectors

tfeher · 2024-04-30T15:47:18Z

/ok to test

tfeher · 2024-05-02T07:01:29Z

/ok to test

enp1s0 · 2024-05-02T10:27:48Z

low recall in DataT=I8/U8 tests due to #2287. All additional vectors tend to be connected to large L2 norm dataset vector nodes if we don't normalize the dataset vectors.

tfeher · 2024-05-07T22:44:05Z

/ok to test

tfeher · 2024-05-08T15:25:57Z

/ok to test

cjnolet · 2024-05-17T04:09:40Z

@enp1s0 now that CAGRA has been moved over to cuVS, this PR will also have to be migrated over to cuVS. No rush, of course, just letting you know.

cjnolet · 2024-05-21T15:47:56Z

@enp1s0 just a heads up- now that we've migrated CAGRA over to cuVS, we'll need to migrate these changes over as well. It should be a fairly straightforward merge because the CAGRA impl in cuVS is a direct migration. We are no longer updating the vector search implementations in RAFT and they will be removed soon.

enp1s0 added 5 commits February 6, 2024 01:56

Update copyright

5708c38

Fix the permission of cagra_build.cuh

34a6dcb

Initial implemention of new vector addition

77bcf6e

Fix add_node

61fc176

Update

6e950a3

enp1s0 requested review from a team as code owners February 6, 2024 04:35

enp1s0 self-assigned this Feb 6, 2024

github-actions bot added cpp CMake python ci labels Feb 6, 2024

enp1s0 added feature request New feature or request non-breaking Non-breaking change 5 - DO NOT MERGE Hold off on merging; see PR for details and removed cpp CMake python ci labels Feb 6, 2024

Merge branch 'branch-24.04' into cagra-add-new-vectors

37985a0

github-actions bot added cpp CMake python ci labels Feb 6, 2024

enp1s0 and others added 4 commits February 6, 2024 17:30

Update func name

aa855f5

Fix add_nodes

9b7f476

Add test of add_nodes

19baa92

Merge branch 'branch-24.04' into cagra-add-new-vectors

25dad94

enp1s0 added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Apr 22, 2024

Merge branch 'branch-24.06' into cagra-add-new-vectors

6aabe6e

enp1s0 removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Apr 24, 2024

Fix copyright

edcd0ef

github-actions bot removed the CMake label Apr 25, 2024

enp1s0 and others added 5 commits April 26, 2024 10:35

Merge branch 'branch-24.06' into cagra-add-new-vectors

c6d246f

Fix docs

6415c30

Merge branch 'cagra-add-new-vectors' of github.com:enp1s0/raft into c…

527036f

…agra-add-new-vectors

Merge branch 'branch-24.06' into cagra-add-new-vectors

64a38d8

Merge branch 'branch-24.06' into cagra-add-new-vectors

2db47b7

Skip Add new node test when NN Descent && InnerProduct

cbddbe1

enp1s0 and others added 3 commits May 2, 2024 19:27

Merge branch 'branch-24.06' into cagra-add-new-vectors

917aeb8

Merge branch 'branch-24.06' into cagra-add-new-vectors

5299378

Merge branch 'branch-24.06' into cagra-add-new-vectors

63d26c1

enp1s0 and others added 2 commits May 8, 2024 12:59

Update AddNodeTest to use normalized dataset

a6f01c6

Merge branch 'branch-24.06' into cagra-add-new-vectors

56a6f98

Merge branch 'branch-24.06' into cagra-add-new-vectors

78dbc12

cjnolet added the 5 - DO NOT MERGE Hold off on merging; see PR for details label May 17, 2024

enp1s0 mentioned this pull request May 24, 2024

CAGRA new vector addition rapidsai/cuvs#151

Open

enp1s0 closed this May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CAGRA new vector addition #2157

CAGRA new vector addition #2157

enp1s0 commented Feb 6, 2024 •

edited

copy-pr-bot bot commented Feb 6, 2024

tfeher commented Apr 25, 2024

enp1s0 commented Apr 25, 2024

tfeher commented Apr 25, 2024

tfeher commented Apr 30, 2024

tfeher commented May 2, 2024

enp1s0 commented May 2, 2024

tfeher commented May 7, 2024

tfeher commented May 8, 2024

cjnolet commented May 17, 2024

cjnolet commented May 21, 2024

CAGRA new vector addition #2157

CAGRA new vector addition #2157

Conversation

enp1s0 commented Feb 6, 2024 • edited

Usage

Algorithm

Performance

TODO

copy-pr-bot bot commented Feb 6, 2024

tfeher commented Apr 25, 2024

enp1s0 commented Apr 25, 2024

tfeher commented Apr 25, 2024

tfeher commented Apr 30, 2024

tfeher commented May 2, 2024

enp1s0 commented May 2, 2024

tfeher commented May 7, 2024

tfeher commented May 8, 2024

cjnolet commented May 17, 2024

cjnolet commented May 21, 2024

enp1s0 commented Feb 6, 2024 •

edited