-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Favor upsert over add + use element_id to prevent duplicates #2926
Conversation
Make chroma etl pipeline idempotent :)
@0xjgv Thanks for this! And thanks for alerting me. Great idea. I will bring it up to the team. |
Thanks for the PR! I do agree with the upsert. I had doubts on assigning the element id, however based on the last state of id calculation, collisions don't seem to be likely. Looks good to me, needs to pass ingest tests. |
@ahmetmeleq @0xjgv can you see any downsides to doing it this way? (seems like it is a good idea all aound) |
@ryannikolaidis Think this change will affect any users of ChromaDB that expect duplicates? (not sure why they would want duplicates....) |
with matching ids? I think that's okay |
🤔 I can't think of any downsides though. Collisions are not very likely and there are no clear advantages to allowing duplicates in the DB. Idepomtecy reduces storage space, increases query performance, and simplifies data maintenance and integrity. |
Implemented in #3086 |
Make chroma ingest pipeline idempotent :)
@potter-potter