Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(connlib): use events for pushing updated resource list #5035

Merged
merged 3 commits into from
May 20, 2024

Conversation

thomaseizinger
Copy link
Member

@thomaseizinger thomaseizinger commented May 20, 2024

The API of connlib is designed around a uni-directional dataflow where commands flow one way and events flow the other way. By design, this creates a system of eventual consistency: We don't exactly know when connlib will emit an event. This is important because it gives us flexibility in what the internals of connlib look like. It also forces the downstream apps to be able to handle any event at any point which avoids bugs where clients rely on a certain order that may just be an implementation detail.

To achieve all of this, it is important that we don't introduce APIs with return values. As soon as a function returns a value, it commits to being able to compute this return value synchronously. Any refactoring that may make the computation of the return value asynchronous is then a breaking change.

Consequently, APIs like handle_timeout should never return a value. Instead, they should queue an event that the layer above reacts to accordingly.

Copy link

vercel bot commented May 20, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
firezone ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 20, 2024 9:06pm

@github-actions github-actions bot added the kind/refactor Code refactoring label May 20, 2024
Copy link

github-actions bot commented May 20, 2024

Performance Test Results

TCP

Test Name Received/s Sent/s Retransmits
direct-tcp-client2server 238.8 MiB (-2%) 240.1 MiB (-2%) 241 (-1%)
direct-tcp-server2client 244.5 MiB (+0%) 246.0 MiB (+0%) 122 (-81%)
relayed-tcp-client2server 229.9 MiB (-0%) 230.8 MiB (-0%) 276 (+5%)
relayed-tcp-server2client 239.1 MiB (+0%) 239.6 MiB (+0%) 356 (-13%)

UDP

Test Name Total/s Jitter Lost
direct-udp-client2server 500.0 MiB (-0%) 0.03ms (+5%) 40.76% (-4%)
direct-udp-server2client 500.0 MiB (+0%) 0.01ms (+7%) 22.16% (+1%)
relayed-udp-client2server 500.0 MiB (-0%) 0.04ms (+2%) 56.49% (+3%)
relayed-udp-server2client 500.0 MiB (-0%) 0.02ms (+95%) 42.43% (+11%)

Copy link
Collaborator

@ReactorScram ReactorScram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it on Windows and it gets the resources fine.
Getting rid of the callbacks eventually sounds great.
I left a comment asking if we could even remove the resource list from the buffered events, since the Clients (AFAIK) will never care about anything but the most recent resource list.

rust/connlib/clients/shared/src/eventloop.rs Show resolved Hide resolved
Copy link
Collaborator

@conectado conectado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,

Though wrt the description, handle_timeout changes are internal to connlib, it didn't expose any of the changes to the external API, so what do you mean with:

Any refactoring that may make the computation of the return value asynchronous is then a breaking change.

@thomaseizinger
Copy link
Member Author

thomaseizinger commented May 20, 2024

LGTM,

Though wrt the description, handle_timeout changes are internal to connlib, it didn't expose any of the changes to the external API, so what do you mean with:

Any refactoring that may make the computation of the return value asynchronous is then a breaking change.

For the state machine tests, the public API is the API of the states. I consider Tunnel to be just an integration layer between Io and the states and that layer shouldn't contain any logic. handle_timeout is part of the API of the states.

Additionally to that, it is still an abstraction layer even if the module boundary isn't the crate boundary in this case. Breaking changes to module boundaries can still be annoying during refactorings because you suddenly gotta adapt lots of tests or other callsites so I think it is worth it to try and uphold them even if they are internal APIs :)
This is basically what we have here: The API of the states is what we write the tests against. So if we want confidence in the tests, we need to make sure the integration layer above is "dumb".

@conectado
Copy link
Collaborator

LGTM,
Though wrt the description, handle_timeout changes are internal to connlib, it didn't expose any of the changes to the external API, so what do you mean with:

Any refactoring that may make the computation of the return value asynchronous is then a breaking change.

For the state machine tests, the public API is the API of the states. I consider Tunnel to be just an integration layer between Io and the states and that layer shouldn't contain any logic. handle_timeout is part of the API of the states.

Gotcha, that makes sense. Maybe a re-naming is in order? Something that makes it clearer Tunnel is just an integration layer.

Though Tunnel is what makes the most sense for downstream crates, but instead of calling it ClientState we can call it ClientModel(and GatewayModel) or something like that, what do you think? I'm not convinced with that name but State ,though accurate, I'm not sure conveys the idea of "this is where the real API" is

Copy link

Terraform Cloud Plan Output

Plan: 15 to add, 15 to change, 15 to destroy.

Terraform Cloud Plan

@thomaseizinger
Copy link
Member Author

LGTM,
Though wrt the description, handle_timeout changes are internal to connlib, it didn't expose any of the changes to the external API, so what do you mean with:

Any refactoring that may make the computation of the return value asynchronous is then a breaking change.

For the state machine tests, the public API is the API of the states. I consider Tunnel to be just an integration layer between Io and the states and that layer shouldn't contain any logic. handle_timeout is part of the API of the states.

Gotcha, that makes sense. Maybe a re-naming is in order? Something that makes it clearer Tunnel is just an integration layer.

Though Tunnel is what makes the most sense for downstream crates, but instead of calling it ClientState we can call it ClientModel(and GatewayModel) or something like that, what do you think? I'm not convinced with that name but State ,though accurate, I'm not sure conveys the idea of "this is where the real API" is

Agreed! I was going to send another PR with a docs update to Tunnel to make this clearer but forgot :(

Is just Client and Gateway too simple? State and Model are both kind of "nothing-saying" modifiers (like Service, Manager etc). Back when I named this, I chose State because it at least conveys that it should be pure (in the functional sense).

I'll have a word with ChatGPT and see what we can come up with.

@thomaseizinger thomaseizinger added this pull request to the merge queue May 20, 2024
Merged via the queue into main with commit 3fba574 May 20, 2024
135 checks passed
@thomaseizinger thomaseizinger deleted the chore/handle-timeout-no-return-values branch May 20, 2024 21:34
github-merge-queue bot pushed a commit that referenced this pull request May 24, 2024
This is a follow-up to #5035. I didn't end up renaming `Tunnel`,
`GatewayState` or `ClientState` but I've added some comments with my
understanding of what the state is we are tracking and tried to group
the fields together in a logical way.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/refactor Code refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants