Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data: implement resultspecs dataclass compatibility #7611

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tdesveaux
Copy link
Contributor

As mentioned in #7610

resultspecs were only really working with dict data. This fix them when used with dataclass.

Change to includeFields implements field filtering, which convert a dataclass to a dict.
I think this should just not work, but implementation was trivial so I'm still looking for feedback.

Contributor Checklist:

  • I have updated the unit tests
  • [n/a?] I have created a file in the newsfragments directory (and read the README.txt in that directory)
  • I have updated the appropriate documentation


class ResultSpecMKListMixin:
@staticmethod
def mkdata(fld: Sequence[str] | str, *values):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just have free function? It doesn't use self, so mixin is possibly an overkill.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, it's used below for inheritance.

@p12tic
Copy link
Member

p12tic commented May 15, 2024

I think this should just not work, but implementation was trivial so I'm still looking for feedback

I don't understand what downsides are with the current approach and implementation. What future problems could we have if this PR is merged?

From a practical standpoint I think the current approach is perfectly fine.

@p12tic
Copy link
Member

p12tic commented May 15, 2024

The PR needs rebase due to conflicts.

@tdesveaux
Copy link
Contributor Author

tdesveaux commented May 15, 2024

I think this should just not work, but implementation was trivial so I'm still looking for feedback

I don't understand what downsides are with the current approach and implementation. What future problems could we have if this PR is merged?

From a practical standpoint I think the current approach is perfectly fine.

When using ResultSpec.apply() where datais a dataclass. If the ResultSpec has fields, it will convert the dataclass to a dict silently. This seem pretty risky.

I see two options:

  • always convert the dataclass to a dict in apply
  • raise an exception if apply is called with a dataclass

I was mostly worried about when a ResultSpec was used in the DB layer, but this codepath will ignore the ResultSpec.fields, so it won't change DB API return types.

@tdesveaux tdesveaux force-pushed the data/resultspec/dataclass-compat branch from d65e0f7 to 9c4d61d Compare May 15, 2024 19:50
@p12tic
Copy link
Member

p12tic commented May 15, 2024

More conflicts again.

Regarding the risks, resultspecs are used both in DB and data layers, so we need to support both.

@p12tic
Copy link
Member

p12tic commented May 15, 2024

I think that we should have 2 apply functions in ResultSpec: one (e.g. apply_db) for DB layer, another (e.g. apply_data) for data layer. The current code is a mess that sort of works everywhere, but is very brittle because things in DB layer are actually quite different from things in data layer. We then have code like isinstance(data, base.ListResult) checks which are only relevant for data layer. The introduction of data classes made this defect more visible.

I think for the database layer we can simply ignore fields. Most queries return full rows anyway. In the cases where incomplete row is returned we can either change the code to request full row or adjust dataclass to allow null fields.

This will not have any user impact because resultspec filtering is only relevant to REST queries and they do resultspec filtering again. See

# post-process any remaining parts of the resultspec
data = rspec.apply(data)

in master/buildbot/www/rest.py.

@tdesveaux
Copy link
Contributor Author

Marking as Draft as I need a chunk of time to take a good look at this and I won't be able until next week.

@tdesveaux tdesveaux marked this pull request as draft May 16, 2024 20:20
@tdesveaux tdesveaux force-pushed the data/resultspec/dataclass-compat branch from 9c4d61d to e32974a Compare May 21, 2024 09:20
@tdesveaux
Copy link
Contributor Author

I think that we should have 2 apply functions in ResultSpec: one (e.g. apply_db) for DB layer, another (e.g. apply_data) for data layer. The current code is a mess that sort of works everywhere, but is very brittle because things in DB layer are actually quite different from things in data layer. We then have code like isinstance(data, base.ListResult) checks which are only relevant for data layer. The introduction of data classes made this defect more visible.

I think for the database layer we can simply ignore fields. Most queries return full rows anyway. In the cases where incomplete row is returned we can either change the code to request full row or adjust dataclass to allow null fields.

This will not have any user impact because resultspec filtering is only relevant to REST queries and they do resultspec filtering again. See

# post-process any remaining parts of the resultspec
data = rspec.apply(data)

in master/buildbot/www/rest.py.

@p12tic from what I saw, ResultSpec.apply is never used in the DB layer, only ResultSpec.thd_execute.
apply is used in FakeDB in test contexts leading to the change I made to make it work, but it's unnecessary.

A better solution would be to change FakeDB to use ResultSpec without apply for now.

Longer term, I think its a bit dangerous that some DB functions can take a ResultSpec as arg and trust they will only call thd_execute.
From the caller perspective, it can lead to weird case were the fields and properties are ignored.

So a second change would be to remove the result_spec arg from the DB functions and replace it with a new arg from a new class that only allow to add order, limit, offset and where directives to the query.
My opinion is that trying to factor the Data and DB usage of ResultSpec in one class would result in a complex implementation with too many edge cases.

So to summarize, I'll now:

  • update FakeDB implementations to not use ResultSpec.apply

If you confirm my understanding and agree with my proposal, above:

  • Create an issue to split ResultSpec DB implementation into another class.

Depending on my availability, I could take care of the split once I'm done with the move to dataclasses for DB models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants