Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage of sgqlc prevents typing for response objects #129

Open
reconman opened this issue Feb 6, 2021 · 14 comments
Open

Usage of sgqlc prevents typing for response objects #129

reconman opened this issue Feb 6, 2021 · 14 comments
Assignees
Labels
enhancement New feature or request

Comments

@reconman
Copy link
Contributor

reconman commented Feb 6, 2021

I just tried out sqlgc and I'm able to build the queries just fine.

But if I use the generated classes for parsing the response, the IDE is utterly confused since every attribute is a Field. You can't assign a Field to an int variable or iterate over it.

Take the example from the README. Just change the line

repo = (op + data).repository

to

repo: Repository = (op + data).repository

Then pylint generates this error when you try to iterate over repository.issues.nodes:
Instance of 'Field' has no 'nodes' member

VS Code with pylance generates this error:

Cannot access member "nodes" for type "Field"
  Member "nodes" is unknown

You would probably have to generate each type twice: Once for the query parameters and another time for the response objects. Using Union types for the fields instead would require lots of instanceof checks:

if isinstance(repository.issues, IssueConnection):
    for issue in repository.issues.nodes
        if isinstance(issue, Issue):
            title: str = issue.title
@reconman reconman changed the title Usage of sgqlc prevents typing Usage of sgqlc prevents typing for response objects Feb 6, 2021
@barbieri
Copy link
Member

barbieri commented Feb 8, 2021

yeah, SGQLC predates typing and I'll work on this later. In your example it's wrong, you can't : Repository as it's not what you get. You get a selection based on it, it's partial.

I wrote the sgqlc-codegen operation thinking that it will be the one generating the actual types you get when you "interpret" the results, but I still have no time to do it.

In JavaScript world you can see how Apollo GraphQL generates, it will generate one type for each query, with only the fields you did query. Likely I'll do the same and take care of the __add__ to return the proper type so IDE can know exactly what's in there.

@barbieri barbieri self-assigned this Feb 8, 2021
@barbieri barbieri added the enhancement New feature or request label Feb 8, 2021
@reconman
Copy link
Contributor Author

reconman commented Feb 8, 2021

I also looked at the logic of https://github.com/graphql-python/gql-next, which generates the types based on the queries, but:

  1. It uses outdated dataclasses-json and json libraries, generating an exception when running the generation
  2. pylance in VS code doesn't understand that @dataclass_json creates to_json() and from_json() methods: Support dataclasses-json microsoft/pylance-release#926
  3. Classes are generated multiple times, especially if the queries only have different search conditions
  4. The auto-generated HTTP client code doesn't offer a headers parameter, so you would have to provide a on_before_callback function which adds your desired HTTP headers
  5. The generated code contains expressions like film: Film = None instead of film: Optional[Film] = None.

@barbieri
Copy link
Member

barbieri commented Feb 8, 2021

Yeah, I know your pain. But you'll have to go with a : Any for the time being, better not to verify than verify it wrongly :-(

As for 3, I think all the generators will do that... none that I know will try to coalesce similar types, they just output as it goes.
My suggestion is you to spill the conditions into variables, then you just pass different values, rather than building different queries. With the sgqlc-codegen operation you can do this (it's recommended).

But I'm really busy with some work-related issues, likely can't get to this until next month or so. If you wish to help, I'm more than welcome to take patches.

My idea is that the operations functions will return types (not actual classes) that will provide the proper type tree once __add__, alternatively even return using a generic that you execute the operation in the endpoint and it will return the correct data, so it will block op + data if they do not match (use the incorrect result with the operation).

It would be something like (eventually just generate a .pyi companion file, but you get the idea):

class MyOperation_Repository_Issues_Nodes:
   number: int
   title: str

class MyOperation_Repository_Issues_PageInfo:
   has_next_page: bool
   end_cursor: str

class MyOperation_Repository_Issues:
   nodes: MyOperation_Repository_Issues_Nodes
   page_info: Sequence[MyOperation_Repository_Issues_PageInfo]

class MyOperation_Repository:
   issues: MyOperation_Repository_Issues

@reconman
Copy link
Contributor Author

reconman commented Feb 8, 2021

My suggestion is you to spill the conditions into variables, then you just pass different values, rather than building different queries. With the sgqlc-codegen operation you can do this (it's recommended).

I think I already did what you're describing, the response objects had 20 nested attributes and I refactored the query code so there's 1 function adding all the attributes to the query.

As for your comment about me possibly implementing it myself, I'm a python noob, who only found out about the existance of typing 1 1/2 weeks ago. I know nothing about Python best practices.

@YassineElbouchaibi
Copy link

Would you have some insights to share on how and where in the code you would implement this ?

I feel like without this, I'm just not getting any "static" typing on my queries. It almost makes feel as if I was using the library wrong, maybe I am?

I feel like this only offers typing but at runtime, is this the current case ?

I would be interested in helping implement this.

@barbieri
Copy link
Member

@YassineElbouchaibi yes, we just have runtime checking at the moment. The major issue is that the code is based on Metaclasses (thus, dynamic) and to make declaration easy (just define class members as types, use the instance members as value), it conflicts with much of the the mypy and typing expectations -- any user of metaclass has such problems, unless support is explicitly written in mypy, such as done for enum.Enum

However we could mitigate the issues with our sgqlc-codegen operation. We could return a casted Operation that maps explicitly to the written operation and with a __add__ that returns a properly casted type instances. However we need to write this code and I'm out of time to do that :-(

Basically we need to output a specific type for each declared operation. So, in addition to output the def my_operation(), we would output the fields and __add__ that maps. Similar to what Apollo TypeScript codegen does.

It's wrong to output generic types (ie: just based on the schema), as the resulting objects will only contain certain fields and they may be renamed if __alias__ are used.

Not a ultra hard task, but definitely not a few minutes... it's quite time consuming (although simple). Need to change https://github.com/profusion/sgqlc/blob/master/sgqlc/codegen/operation.py#L296 GraphQLToPython visitor to also generate the types, then output them before the actual function.

@fishi0x01
Copy link

fishi0x01 commented May 24, 2022

Do you think it is feasible to generate pydantic dataclasses instead? Pydantic's BaseModel can take care of converting nested dicts into classes.

I wrote a small PoC custom query-to-code generator and example output can be seen here.

Untyped query data can then get static types:

from generated import MyQueryOperation

query_data: dict[Any, Any] = do_query()
op: MyQueryOperation = MyQueryOperation(**query_data)

Im wondering if it makes sense to add sth similar here too? Do you see any issues with that approach?

@barbieri
Copy link
Member

I never used pydantic's dataclasses... but there is one possible issue: GraphQL types does not always match 1:1 with the query result, since the query result may be a subset of the fields.

What most tools do (ex: Apollo) is to generate query-specific types, such as:

type T {
  a: String!
  b: String!
}
type Query {
  t: T!
}

query MyOperationNameHere { t { a } }

Would generate:

class MyOperationNameHere_t:
   a: str

class MyOperationNameHere:
   t: MyOperationNameHere_t

Which is correct, but can cause issues with most typing systems that are based on names (it works nice in TypeScript since it's shape-based, the type names doesn't matter).

For other languages (Kotlin, Swift) they have to resort to fragments to enable multiple queries to produce the same type, ex:

so it's not that trivial do cover it properly... my preferred way would be to follow the iOS approach.

@fishi0x01
Copy link

fishi0x01 commented May 24, 2022

The proposed code generator does not generate schema types. It generates dataclasses for queries. I.e., every query has its own classes. Does that address your concern?

Which is correct, but can cause issues with most typing systems that are based on names (it works nice in TypeScript since it's shape-based, the type names doesn't matter).

I am not really sure why that would cause issues? You mean if there is a name collision, e.g., one query addressing the same type multiple times?

@fishi0x01
Copy link

The code generator also addresses inline fragments, i.e., interface types by using Union. E.g., https://github.com/app-sre/qontract-reconcile/pull/2389/files#diff-094336e65ab4cc473bbdbfd2bff9e55096d7fc3c4bfb685816865fa89dc538dfR111

@barbieri
Copy link
Member

The proposed code generator does not generate schema types. It generates dataclasses for queries. I.e., every query has its own classes. Does that address your concern?

seems so, I'll need to take a deeper look to confirm.

Which is correct, but can cause issues with most typing systems that are based on names (it works nice in TypeScript since it's shape-based, the type names doesn't matter).

I am not really sure why that would cause issues? You mean if there is a name collision, e.g., one query addressing the same type multiple times?

in the links for ios/kotlin you can see the issues, but it basically boils down to gave a huge set of types in an union, even if the fields are all the same. But solving with a fragment (assuming unique names) is okay (also avoid possible inconsistencies when people change one query but forget the others).

@fishi0x01
Copy link

fishi0x01 commented Jun 1, 2022

I understand - at least I think so 😅

Lets take a concrete example: The query in the Kotlin text https://www.apollographql.com/docs/kotlin/advanced/response-based-codegen/#the-responsebased-codegen

query HeroForEpisode($ep: Episode!) {

  hero(episode: $ep) {
    name
    ... on Droid {
      primaryFunction
    }
    ... on Human {
      height
    }
  }
}

The proposed generator would transform this into

class ThingWithName(BaseModel):
  name: str = Field(..., alias="name")


class Droid(ThingWithName):
  primary_function: str = Field(..., alias="primaryFunction")
  

class Human(ThingWithName):
  height: str = Field(..., alias="height")


class QueryData(BaseModel):
  hero: list[Union[Droid, Human, ThingWithName]] = Field(..., alias="hero")
  
  class Config:
    # This is set so pydantic can properly match the data to union, i.e., properly infer the correct type
    smart_union = True
    extra = 'forbid'

EDIT: adjusted the sub-types to actually inherit attributes from the interface type.

The data from that query can then be turned into a type via:

query_data: dict[Any, Any] = do_query()
typed_query_data: QueryData = QueryData(**query_data)  # Pydantic does the mapping magic

My understanding is that this is in accordance with the above mentioned article. However, you mention another interesting aspect

but it basically boils down to gave a huge set of types in an union, even if the fields are all the same.

Lets assume the inline fragments would be like this

    ... on Droid {
      primaryFunction
    }
    ... on Human {
      primaryFunction
    }

I.e., the subtypes all query exactly the same attributes. Im not sure if such a query makes any sense. But in that case it would be impossible to properly infer the type from the untyped query data. Is that what you mean?

Sorry if I again misunderstood the core issue and thank you for bearing with me on this one 🙇

@j30ng
Copy link

j30ng commented Dec 13, 2022

Hi @fishi0x01.
@barbieri's point, as I understood, is that when you have

query HeroForEpisode($ep: Episode!) {
  hero(episode: $ep) {
    name
    ... on Droid {
      primaryFunction
    }
    ... on Human {
      height
    }
  }
}

query HeroByNameContaining($nameContains: String!) {
  droid(id: $id) {
    name
    primaryFunction
  }
  human(id: $id) {
    name
    height
  }
}

the generator would generate something similar to what you wrote

class HeroForEpisode_Hero(BaseModel):
  name: str = Field(..., alias="name")


class HeroForEpisode_FragmentOnDroid(HeroForEpisode_Hero):
  primary_function: str = Field(..., alias="primaryFunction")
  

class HeroForEpisode_FragmentOnHuman(HeroForEpisode_Hero):
  height: str = Field(..., alias="height")


class HeroForEpisode(BaseModel):
  hero: list[Union[HeroForEpisode_FragmentOnDroid, HeroForEpisode_FragmentOnHuman, HeroForEpisode_Hero]] = Field(..., alias="hero")


class HeroByNameContaining_Droid(BaseModel):
  name: str = Field(..., alias="name")
  primary_function: str = Field(..., alias="primaryFunction")


class HeroByNameContaining_Human(BaseModel):
  name: str = Field(..., alias="name")
  height: str = Field(..., alias="height")


class HeroByNameContaining(BaseModel):
  droid: HeroByNameContaining_Droid = Field(..., alias="droid")  # which is not 'HeroForEpisode_FragmentOnDroid'
  human: HeroByNameContaining_Human = Field(..., alias="human")  # which is not 'HeroForEpisode_FragmentOnHuman'

HeroByNameContaining_Human and HeroForEpisode_FragmentOnHuman represent basically the same data structure, but are not compatible. Also in TypeScript's type system, you would have no problem passing HeroByNameContaining.human where typeof HeroForEpisode.hero is expected, but in Python you might not (I might be wrong on this, since I don't know about pydantic very much). But also as @barbieri said you could get around a lot by basically having all the possible data structures as named fragments that can be shared between queries.

Can I offer to help with thie issue @barbieri? What's your thoughts on this issue, since it's been some time?

@fishi0x01
Copy link

fishi0x01 commented Dec 14, 2022

@j30ng thank you for the clarification :)

In our case we did not necessarily need a gql client, but we were in urgent need of a generator for pydantic dataclasses.
For that reason, it was decided to go ahead and implement this https://github.com/app-sre/qenerate

Besides inline fragments as you showed above, it also supports fragment {} definitions. We are also using it actively in one of our bigger projects. You can find some examples there on how the generated classes and their corresponding gql definitions look like. Please excuse this shameless plug, but maybe it also helps for your use-case or it might serve as an inspiration to implement a similar functionality in sgqlc.

Note however, that it is not a gql client like sgqlc. It is purely a generator for the dataclasses. Fetching the raw, untyped data from a gql server must still happen through some client library. Sgqlc offers a more holistic approach here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants