Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there an option to reduce the time the exabgp will re-try the connection establishment after the peering goes down (default seems to be 60s) #1176

Open
ijukic2003 opened this issue Sep 10, 2023 · 6 comments

Comments

@ijukic2003
Copy link

Describe the bug

Is there an option to reduce the time the exabgp will re-try the connection establishment after the peering goes down (default seems to be 60s). The exabgp establishes the BGP peering, and if that connection goes down (like the peer not reachable anymore), the exabgp will re-try to establish the new connection only once in 60s.
This is not enough when using exabgp to simulate the huge number of peers connecting at the same time.

@thomas-mangin
Copy link
Member

thomas-mangin commented Sep 11, 2023

Does this patch makes things work better for you?

diff --git a/src/exabgp/reactor/peer.py b/src/exabgp/reactor/peer.py
index 2dc5f5c8..7123f879 100644
--- a/src/exabgp/reactor/peer.py
+++ b/src/exabgp/reactor/peer.py
@@ -419,6 +419,7 @@ class Peer(object):
         self.neighbor.rib.outgoing.replace_restart(previous, current)
         self.neighbor.previous = None
 
+        self._delay.reset()
         while not self._teardown:
             # we are here following a configuration change
             if self._neighbor:

I wrote it without any testing at a conference ... so it may not do what it should ..
That said, it looks like we were missing a reset of the exponential backoff delay timer when we successfully established a connection, so it should work.

@ijukic2003
Copy link
Author

Hi Thomas,

Sorry for the delay with the testing, still the same, after the connection goes down, the SYN is sent exactly every 60s.

@thomas-mangin
Copy link
Member

Thank you for the feedback, I will look into it again.

@thomas-mangin
Copy link
Member

@ijukic2003 can you please tell me how you are performing your test and did you check 4.2 or main branch? (as the change was only applied to main).

Testing by causing a connection drop in the code seems to work as expected with the connection delay timer not increasing anymore when the session can be setup.

I was seeing an increase with every attempt to reconnect but nothing getting to 60s immediately, instead it increased after each failure (up to 60).

@ijukic2003
Copy link
Author

Hi Thomas,

Yes, sorry, I see now that after the connection drop, it tries to connect pretty fast, and then the time interval between the retries starts increasing exponentially with every new connection attempt.
The problem I have is that, for the scale tests I am doing, it takes around 4-5 minutes for the network to stabilize after the connection down trigger and the peer becomes ready again to accept the BGP connection. By that time the Exa re-try timer already gets increased back to 60s.
Is there any way to make this a configurable option in the code, so I can set some more aggressive fixed timer?

@thomas-mangin
Copy link
Member

As far as I know, this behaviour is now fixed on master, if it need backporting let me know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants