Implement Multi-query processing #511

AlekseySh · 2024-03-24T12:30:27Z

The concept is that a query involves multiple objects instead of just one. We aim to retrieve results for all these objects simultaneously. A straightforward approach is to use frequency voting:

First, we obtain N results for each sub-query, yielding N*X results.
Then, we retain only the N most frequent results.

As the result, we should have an example similar to "Using a trained model for retrieval" (https://github.com/OML-Team/open-metric-learning?tab=readme-ov-file#examples)

AlekseySh · 2024-04-12T00:36:58Z

EXAMPLE

AlekseySh · 2024-05-22T18:00:07Z

DRAFT

from collections import defaultdict

import numpy as np
import torch
from torch import FloatTensor, LongTensor

from oml.retrieval import RetrievalResults

rr = RetrievalResults(
    distances=[
        FloatTensor([0.1, 0.3, 0.6, 0.9]),
        FloatTensor([0.5, 0.8]),
        FloatTensor([0.1, 0.2]),
        FloatTensor([]),
    ],
    retrieved_ids=[
        LongTensor([0, 1, 2, 3]),
        LongTensor([4, 2]),
        LongTensor([10, 20]),
        LongTensor([])
    ],
    gt_ids=[
        LongTensor([0, 2, 50]),
        LongTensor([0, 2, 50]),  # todo: it may be not consisted
        LongTensor([10, 30]),
        LongTensor([50])
    ]
)

query_groups = [[0, 1], [2], [3]]

rr_expected = RetrievalResults(
    distances=[
        FloatTensor([0.1, 0.3, 0.5, 0.7, 0.9]),
        FloatTensor([0.1, 0.3, 0.5, 0.7, 0.9]),
        FloatTensor([0.1, 0.2]),
        FloatTensor([]),
    ],
    retrieved_ids=[
        LongTensor([0, 1, 4, 2, 3]),
        LongTensor([0, 1, 4, 2, 3]),
        LongTensor([10, 20]),
        LongTensor([])
    ],
    gt_ids=[
        LongTensor([0, 2, 50]),
        LongTensor([0, 2, 50]),
        LongTensor([10, 30]),
        LongTensor([50])
    ]
)

distances_upd, retrieved_ids_upd = dict(), dict()
for group in query_groups:
    group_lens = [len(rr.retrieved_ids[ig]) for ig in group]
    if set(group_lens) == {0}:
        for ig in group:
            distances_upd[ig] = FloatTensor([])
            retrieved_ids_upd[ig] = LongTensor([])

    else:
        dist_group = torch.concat([rr.distances[ig] for ig in group])
        ri_group = torch.concat([rr.retrieved_ids[ig] for ig in group])
        gt_ids = torch.concat([rr.gt_ids[ig] for ig in group])

        ri2dist = defaultdict(list)
        for d, ri in zip(dist_group, ri_group):
            ri2dist[int(ri)].append(float(d))

        ri_dist = [(ri, float(np.mean(d))) for ri, d in ri2dist.items()]
        ri_dist = sorted(ri_dist, key=lambda x: x[1], reverse=False)
        ri_upd, dist_upd = zip(*ri_dist)

        for ig in group:
            distances_upd[ig] = FloatTensor(dist_upd)
            retrieved_ids_upd[ig] = LongTensor(ri_upd)

distances_upd_final = []
retrieved_ids_upd_final = []
for iq in range(len(rr.retrieved_ids)):
    distances_upd_final.append(distances_upd[iq])
    retrieved_ids_upd_final.append(retrieved_ids_upd[iq])

rr_produced = RetrievalResults(distances=distances_upd_final, retrieved_ids=retrieved_ids_upd_final, gt_ids=rr.gt_ids)

print(rr_expected)
print(rr_produced)

AlekseySh added the good first issue Good for newcomers label Mar 24, 2024

AlekseySh added this to To do in OML-planning via automation Mar 24, 2024

AlekseySh added the documentation Improvements or additions to documentation label Mar 24, 2024

This comment was marked as outdated.

Sign in to view

AlekseySh moved this from To do to In progress in OML-planning Mar 25, 2024

AlekseySh assigned VSXV Mar 25, 2024

This comment was marked as outdated.

Sign in to view

AlekseySh moved this from In progress to To do in OML-planning Apr 12, 2024

AlekseySh moved this from To do to In progress in OML-planning May 22, 2024

AlekseySh unassigned VSXV Jun 8, 2024

AlekseySh moved this from In progress to To do in OML-planning Jun 8, 2024

AlekseySh changed the title ~~Add an example of using multi query~~ Implement Multi-query processing Jun 8, 2024

AlekseySh removed the good first issue Good for newcomers label Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Multi-query processing #511

Implement Multi-query processing #511

AlekseySh commented Mar 24, 2024 •

edited

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

AlekseySh commented Apr 12, 2024 •

edited

AlekseySh commented May 22, 2024 •

edited

Implement Multi-query processing #511

Implement Multi-query processing #511

Comments

AlekseySh commented Mar 24, 2024 • edited

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

AlekseySh commented Apr 12, 2024 • edited

AlekseySh commented May 22, 2024 • edited

AlekseySh commented Mar 24, 2024 •

edited

AlekseySh commented Apr 12, 2024 •

edited

AlekseySh commented May 22, 2024 •

edited