Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement "minimum number should match" on BooleanQuery #2398

Open
fulmicoton opened this issue May 13, 2024 · 2 comments
Open

Implement "minimum number should match" on BooleanQuery #2398

fulmicoton opened this issue May 13, 2024 · 2 comments

Comments

@fulmicoton
Copy link
Collaborator

fulmicoton commented May 13, 2024

In a booleanquery, it can be useful to indicate that we want any 2 out of 3 terms to match.
In lucene, this is possible by setting https://lucene.apache.org/core/6_1_0/core/org/apache/lucene/search/BooleanQuery.Builder.html#setMinimumNumberShouldMatch-int-

Regardless of the implementation, it will need to have no impact over the performance of the existing union queries.

@LebranceBW
Copy link

I'm willing to implement this feature. However, my familiarity with this project is limited. Here are my initial considerations:

  • The minimum_should_match functionality operates as an additional constraint on BooleanQuery. It appears that the existing UnionScorer cannot fit it. A novel scorer MinimumRequirementScorer based on UnionScorer needs to be developed, which rejects Doc missing conditions. minimum_should_match will be passed as one of it's member.
  • To minimize performance impacts, the MinimumRequirementScorer,
    UnionScorer, and RequiredOptionalScorer will be employed under different BooleanQuery combinations.

Advices wanted. 😄

@fulmicoton
Copy link
Collaborator Author

@LebranceBW I agree with both points.

The BooleanQuery -> BooleanWeight -> BooleanScorer are precisely here so that while users define their query by instanting a Query object, this query can be at runtime converted into different Scorer.

LebranceBW added a commit to LebranceBW/tantivy that referenced this issue May 20, 2024
…h`. see issue quickwit-oss#2398

In this commit, a novel scorer named DisjunctionScorer is introduced, which performs the union of inverted chains with the minimal required elements. BTW, it's implemented via a min-heap. Necessary modifications on `BooleanQuery` and `BooleanWeight` are performed as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants