Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAGRA new vector addition #2157

Closed
wants to merge 79 commits into from

Conversation

enp1s0
Copy link
Member

@enp1s0 enp1s0 commented Feb 6, 2024

This PR introduces the new vector addition feature to CAGRA.

Rel: #1775

CAGRA-Q is not supported

Usage

auto additional_dataset = raft::make_host_matrix<float, int64_t>(res,updated_dataset_size, dim);
raft::neighbors::cagra::extend(handle, raft::make_const_mdspan(additiona_dataset.view()), cagra_index);

Algorithm

Graph degree: d

The algorithm consists of two stages: rank-based reordering and reverse edge addition.

  1. Rank-based reordering
    1-1. Obtain d' (=2d) nearest neighbor vectors (V) of a given new vector using the CAGRA search
    1-2. Count the number of detourable edges using the result of step 1 and the neighbor list of the input index. Then we prune (3*d/2) edges in the same way as the CAGRA graph optimization. Through this operation, we decide d/2 neighbors.
  2. Reverse edge addition
    2-1. Count the number of incoming edges for all nodes.
    2-2. Add d/2 reverse edges from the nodes added to the neighbor list in Step 1 by replacing a node with a new node. To prevent the connection to the replaced node from being lost, we add the node to the neighbor list of the new node. This allow us to make a detour connection. The replaced nodes are the largest number of incoming edge nodes in the 2/d nodes from the back of the neighbor list without duplication with the nodes already in the neighbor list.

Performance

In this experiment, we first split the dataset into two parts: the initial and the additional part. Then, we extend the CAGRA index built by the initial part to include the additional part.
search-eval

We can see a larger recall drop compared to the baseline by increasing the number of added vectors.
Therefore, rebuilding the CAGRA index is recommended when one wants to add a lot of vectors.

TODO

  • Implementation
  • Test

@enp1s0 enp1s0 requested review from a team as code owners February 6, 2024 04:35
Copy link

copy-pr-bot bot commented Feb 6, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@enp1s0 enp1s0 self-assigned this Feb 6, 2024
@enp1s0 enp1s0 added feature request New feature or request non-breaking Non-breaking change 5 - DO NOT MERGE Hold off on merging; see PR for details and removed cpp CMake python ci labels Feb 6, 2024
@enp1s0 enp1s0 added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Apr 22, 2024
@enp1s0 enp1s0 removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Apr 24, 2024
@tfeher
Copy link
Contributor

tfeher commented Apr 25, 2024

/ok to test

@github-actions github-actions bot removed the CMake label Apr 25, 2024
@enp1s0
Copy link
Member Author

enp1s0 commented Apr 25, 2024

/ok to test

1 similar comment
@tfeher
Copy link
Contributor

tfeher commented Apr 25, 2024

/ok to test

@tfeher
Copy link
Contributor

tfeher commented Apr 30, 2024

/ok to test

@tfeher
Copy link
Contributor

tfeher commented May 2, 2024

/ok to test

@enp1s0
Copy link
Member Author

enp1s0 commented May 2, 2024

low recall in DataT=I8/U8 tests due to #2287. All additional vectors tend to be connected to large L2 norm dataset vector nodes if we don't normalize the dataset vectors.

@tfeher
Copy link
Contributor

tfeher commented May 7, 2024

/ok to test

@tfeher
Copy link
Contributor

tfeher commented May 8, 2024

/ok to test

@cjnolet cjnolet added the 5 - DO NOT MERGE Hold off on merging; see PR for details label May 17, 2024
@cjnolet
Copy link
Member

cjnolet commented May 17, 2024

@enp1s0 now that CAGRA has been moved over to cuVS, this PR will also have to be migrated over to cuVS. No rush, of course, just letting you know.

@cjnolet
Copy link
Member

cjnolet commented May 21, 2024

@enp1s0 just a heads up- now that we've migrated CAGRA over to cuVS, we'll need to migrate these changes over as well. It should be a fairly straightforward merge because the CAGRA impl in cuVS is a direct migration. We are no longer updating the vector search implementations in RAFT and they will be removed soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - DO NOT MERGE Hold off on merging; see PR for details cpp feature request New feature or request non-breaking Non-breaking change Vector Search
Projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants