Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Active streams affect yamux::connection::rtt #5387

Open
0x7CFE opened this issue May 14, 2024 · 10 comments
Open

Active streams affect yamux::connection::rtt #5387

0x7CFE opened this issue May 14, 2024 · 10 comments

Comments

@0x7CFE
Copy link
Contributor

0x7CFE commented May 14, 2024

Summary

I noticed that yamux::connection::rtt reports roundtrip times up to 70 seconds when stream is active, especially on slower connections.

When stream stops transferring data, rtt drops to normal values of few hundred ms.

Expected behavior

I expect that streams should make no difference and RTT messages should fly freely nevertheless.

Actual behavior

PONG message can be delayed up to 1 minute even if the channel itself is 12 Mbit/s and real latency is around 250 ms.

Relevant log output

2024-05-14T16:35:29.026725Z DEBUG new_established_connection{remote_addr=/ip4/65.109.94.118/tcp/9000/p2p/12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y id=1 peer=12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y}:Connection::poll: yamux::connection::rtt: sending ping 3710280974    
2024-05-14T16:35:31.799514Z  INFO p2p::protocol: Received 63.2% of state data (47185920 of 74680770) ...    
2024-05-14T16:35:35.369081Z  INFO p2p::protocol: Received 64.6% of state data (48234496 of 74680770) ...    
2024-05-14T16:35:35.562127Z DEBUG p2p::service: Timer tick    
2024-05-14T16:35:38.503010Z  INFO p2p::protocol: Received 66.0% of state data (49283072 of 74680770) ...    
2024-05-14T16:35:42.043910Z  INFO p2p::protocol: Received 67.4% of state data (50331648 of 74680770) ...    
2024-05-14T16:35:45.555739Z  INFO p2p::protocol: Received 68.8% of state data (51380224 of 74680770) ...    
2024-05-14T16:35:45.555755Z DEBUG yamux::connection::stream::flow_control: old window_max: 32 mb, new window_max: 64 mb    
2024-05-14T16:35:45.562532Z DEBUG p2p::service: Timer tick    
2024-05-14T16:35:49.110664Z  INFO p2p::protocol: Received 70.2% of state data (52428800 of 74680770) ...    
2024-05-14T16:35:53.237703Z  INFO p2p::protocol: Received 71.6% of state data (53477376 of 74680770) ...    
2024-05-14T16:35:55.562952Z DEBUG p2p::service: Timer tick    
2024-05-14T16:35:57.297695Z  INFO p2p::protocol: Received 73.0% of state data (54525952 of 74680770) ...    
2024-05-14T16:36:00.752351Z  INFO p2p::protocol: Received 74.4% of state data (55574528 of 74680770) ...    
2024-05-14T16:36:04.265680Z  INFO p2p::protocol: Received 75.8% of state data (56623104 of 74680770) ...    
2024-05-14T16:36:05.562676Z DEBUG p2p::service: Timer tick    
2024-05-14T16:36:07.709742Z  INFO p2p::protocol: Received 77.2% of state data (57671680 of 74680770) ...    
2024-05-14T16:36:10.735264Z  INFO p2p::protocol: Received 78.6% of state data (58720256 of 74680770) ...    
2024-05-14T16:36:15.066334Z  INFO p2p::protocol: Received 80.0% of state data (59768832 of 74680770) ...    
2024-05-14T16:36:15.562633Z DEBUG p2p::service: Timer tick    
2024-05-14T16:36:19.116259Z  INFO p2p::protocol: Received 81.4% of state data (60817408 of 74680770) ...    
2024-05-14T16:36:22.563326Z  INFO p2p::protocol: Received 82.8% of state data (61865984 of 74680770) ...    
2024-05-14T16:36:25.562763Z DEBUG p2p::service: Timer tick    
2024-05-14T16:36:25.720506Z  INFO p2p::protocol: Received 84.2% of state data (62914560 of 74680770) ...    
2024-05-14T16:36:28.427539Z  INFO p2p::protocol: Received 85.6% of state data (63963136 of 74680770) ...    
2024-05-14T16:36:31.106017Z  INFO p2p::protocol: Received 87.1% of state data (65011712 of 74680770) ...    
2024-05-14T16:36:34.106333Z  INFO p2p::protocol: Received 88.5% of state data (66060288 of 74680770) ...    
2024-05-14T16:36:35.562594Z DEBUG p2p::service: Timer tick    
2024-05-14T16:36:38.267050Z  INFO p2p::protocol: Received 89.9% of state data (67108864 of 74680770) ...    
2024-05-14T16:36:42.433313Z DEBUG new_established_connection{remote_addr=/ip4/65.109.94.118/tcp/9000/p2p/12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y id=1 peer=12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y}:Connection::poll: yamux::connection::rtt: received pong 3710280974, estimated round-trip-time 73.406589082s    

Possible Solution

Check if stream API uses/obeys multiplexing. RTT as well as other control messages should probably have higher priority.

Version

libp2p = { version = "0.53.2", features = [
    "tokio",
    "quic",
    "dns",
    "kad",
    "mdns",
    "noise",
    "macros",
    "tcp",
    "websocket",
    "yamux",
    "request-response",
    "cbor",
    "serde",
] }
libp2p-swarm = "0.44.1"
libp2p-stream = "0.1.0-alpha"

Would you like to work on fixing this bug ?

Maybe

@dariusc93
Copy link
Contributor

Did you make any configuration changes to yamux?

@mxinden
Copy link
Member

mxinden commented May 16, 2024

Thank you for reporting @0x7CFE!

Did you make any configuration changes to yamux?

Please link or post your configuration.

Which yamux version are you using? Can you check your Cargo.lock?

The latest yamux version should prioritize responding with PONG:

https://github.com/libp2p/rust-yamux/blob/24561a62ba6e244ef65f72883c0d791e6ec17b1f/yamux/src/connection.rs#L402-L411

That said, it does prioritize sending over receiving. Though especially on slow connections, I expect the socket to return Poll::Pending regularly, and thus giving a PONG the chance to be queued.

Based on the logs above, I assume you are seeing the large rtt on the receiver side.

In case you are using yamux v0.13.1, my suspicion would be that your pending_frames queue is large and thus delays pongs:

https://github.com/libp2p/rust-yamux/blob/618b141786ea29fa5b7617cd2d97544e18a67b00/yamux/src/connection.rs#L289

In case you are using yamux v0.13.2, I am out of ideas for now.

@0x7CFE
Copy link
Contributor Author

0x7CFE commented May 16, 2024

Did you make any configuration changes to yamux?

No, everything is default. The only setting I changed was idle connection timeout:

        let swarm = {
            // let pings = libp2p_ping::Behaviour::new(libp2p_ping::Config::new());
            let tasks = cbor::Behaviour::new(
                [(StreamProtocol::new("/grad/tasks/1"), ProtocolSupport::Full)],
                request_response::Config::default(),
            );

            let state = libp2p_stream::Behaviour::new();

            libp2p::SwarmBuilder::with_existing_identity(net_keypair)
                .with_tokio()
                // .with_quic()
                .with_tcp(
                    tcp::Config::default(),
                    noise::Config::new,
                    yamux::Config::default,
                )?
                .with_behaviour(|_| MyBehaviour { tasks, state })?
                .with_swarm_config(|c| c.with_idle_connection_timeout(Duration::from_secs(300)))
                .build()
        };

@0x7CFE
Copy link
Contributor Author

0x7CFE commented May 16, 2024

Which yamux version are you using? Can you check your Cargo.lock?

My cargo tree is:

│   │   ├── libp2p-yamux v0.45.1
│   │   │   ├── either v1.10.0
│   │   │   ├── futures v0.3.30 (*)
│   │   │   ├── libp2p-core v0.41.2 (*)
│   │   │   ├── thiserror v1.0.57 (*)
│   │   │   ├── tracing v0.1.40 (*)
│   │   │   ├── yamux v0.12.1
│   │   │   │   ├── futures v0.3.30 (*)
│   │   │   │   ├── log v0.4.21 (*)
│   │   │   │   ├── nohash-hasher v0.2.0
│   │   │   │   ├── parking_lot v0.12.1 (*)
│   │   │   │   ├── pin-project v1.1.4 (*)
│   │   │   │   ├── rand v0.8.5 (*)
│   │   │   │   └── static_assertions v1.1.0
│   │   │   └── yamux v0.13.1
│   │   │       ├── futures v0.3.30 (*)
│   │   │       ├── instant v0.1.12 (*)
│   │   │       ├── log v0.4.21 (*)
│   │   │       ├── nohash-hasher v0.2.0
│   │   │       ├── parking_lot v0.12.1 (*)
│   │   │       ├── pin-project v1.1.4 (*)
│   │   │       ├── rand v0.8.5 (*)
│   │   │       └── static_assertions v1.1.0

Based on the logs above, I assume you are seeing the large rtt on the receiver side.

Yes, it's the receiver that sent ping ages ago.

In case you are using yamux v0.13.2, I am out of ideas for now.

Thanks for the hint, I will try to update and see what would happen.

@0x7CFE
Copy link
Contributor Author

0x7CFE commented May 16, 2024

I did cargo update -p yamux (along with libp2p, libp2p-yamux, libp2p-stream, libp2p-swarm) and see the following now:

2024-05-16T09:50:54.958461Z DEBUG yamux::connection::stream::flow_control: old window_max: 0.5 mb, new window_max: 1 mb    
2024-05-16T09:51:02.677801Z DEBUG p2p::service: Timer tick    
2024-05-16T09:51:05.775626Z DEBUG new_established_connection{remote_addr=/ip4/x.x.x.x/tcp/9000/p2p/12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y id=1 peer=12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y}:Connection::poll: yamux::connection::rtt: sending ping 1034422600    
2024-05-16T09:51:12.678448Z DEBUG p2p::service: Timer tick    
2024-05-16T09:51:20.814787Z  INFO p2p::protocol: Received 1.4% of state data (1048576 of 74680770) ...    
2024-05-16T09:51:22.678269Z DEBUG p2p::service: Timer tick    
2024-05-16T09:51:32.677825Z DEBUG p2p::service: Timer tick    
2024-05-16T09:51:42.677642Z DEBUG p2p::service: Timer tick    
2024-05-16T09:51:52.678642Z DEBUG p2p::service: Timer tick    
2024-05-16T09:52:02.678515Z DEBUG p2p::service: Timer tick    
2024-05-16T09:52:12.678062Z DEBUG p2p::service: Timer tick    
2024-05-16T09:52:22.677501Z DEBUG p2p::service: Timer tick    
2024-05-16T09:52:32.678470Z DEBUG p2p::service: Timer tick    
2024-05-16T09:52:33.683171Z DEBUG new_established_connection{remote_addr=/ip4/x.x.x.x/tcp/9000/p2p/12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y id=1 peer=12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y}:Connection::poll: yamux::connection::rtt: received pong 1034422600, estimated round-trip-time 87.907605325s    
2024-05-16T09:52:37.918470Z DEBUG yamux::connection::stream::flow_control: old window_max: 1 mb, new window_max: 2 mb    
2024-05-16T09:52:39.291560Z  INFO p2p::protocol: Received 2.8% of state data (2097152 of 74680770) ...    
2024-05-16T09:52:42.678381Z DEBUG p2p::service: Timer tick    
2024-05-16T09:52:43.913931Z DEBUG new_established_connection{remote_addr=/ip4/x.x.x.x/tcp/9000/p2p/12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y id=1 peer=12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y}:Connection::poll: yamux::connection::rtt: sending ping 2572943090    
2024-05-16T09:52:45.086762Z DEBUG yamux::connection::stream::flow_control: old window_max: 2 mb, new window_max: 4 mb    
2024-05-16T09:52:45.225339Z  INFO p2p::protocol: Received 4.2% of state data (3145728 of 74680770) ...    
2024-05-16T09:52:46.450208Z DEBUG new_established_connection{remote_addr=/ip4/x.x.x.x/tcp/9000/p2p/12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y id=1 peer=12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y}:Connection::poll: yamux::connection::rtt: received pong 2572943090, estimated round-trip-time 2.536338607s    
2024-05-16T09:52:52.677568Z DEBUG p2p::service: Timer tick    
2024-05-16T09:52:54.503397Z  INFO p2p::protocol: Received 5.6% of state data (4194304 of 74680770) ...    
2024-05-16T09:52:56.607530Z DEBUG new_established_connection{remote_addr=/ip4/x.x.x.x/tcp/9000/p2p/12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y id=1 peer=12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y}:Connection::poll: yamux::connection::rtt: sending ping 2577459376    
2024-05-16T09:53:00.015636Z DEBUG new_established_connection{remote_addr=/ip4/x.x.x.x/tcp/9000/p2p/12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y id=1 peer=12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y}:Connection::poll: yamux::connection::rtt: received pong 2577459376, estimated round-trip-time 3.408063635s    
2024-05-16T09:53:00.388459Z  INFO p2p::protocol: Received 7.0% of state data (5242880 of 74680770) ...    
2024-05-16T09:53:02.678509Z DEBUG p2p::service: Timer tick    
2024-05-16T09:53:06.145671Z  INFO p2p::protocol: Received 8.4% of state data (6291456 of 74680770) ...    
2024-05-16T09:53:07.668796Z  INFO p2p::protocol: Received 9.8% of state data (7340032 of 74680770) ...    
2024-05-16T09:53:09.015326Z  INFO p2p::protocol: Received 11.2% of state data (8388608 of 74680770) ...    
2024-05-16T09:53:10.022202Z DEBUG new_established_connection{remote_addr=/ip4/x.x.x.x/tcp/9000/p2p/12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y id=1 peer=12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y}:Connection::poll: yamux::connection::rtt: sending ping 4237553985    
2024-05-16T09:53:10.225469Z DEBUG yamux::connection::stream::flow_control: old window_max: 4 mb, new window_max: 8 mb    
2024-05-16T09:53:10.275404Z  INFO p2p::protocol: Received 12.6% of state data (9437184 of 74680770) ...    
2024-05-16T09:53:11.512246Z DEBUG new_established_connection{remote_addr=/ip4/x.x.x.x/tcp/9000/p2p/12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y id=1 peer=12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y}:Connection::poll: yamux::connection::rtt: received pong 4237553985, estimated round-trip-time 1.490016195s    
2024-05-16T09:53:11.644362Z  INFO p2p::protocol: Received 14.0% of state data (10485760 of 74680770) ...    
2024-05-16T09:53:12.677855Z DEBUG p2p::service: Timer tick    
2024-05-16T09:53:13.347167Z  INFO p2p::protocol: Received 15.4% of state data (11534336 of 74680770) ...    
2024-05-16T09:53:14.903350Z  INFO p2p::protocol: Received 16.8% of state data (12582912 of 74680770) ...    
2024-05-16T09:53:22.178847Z DEBUG new_established_connection{remote_addr=/ip4/x.x.x.x/tcp/9000/p2p/12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y id=1 peer=12D3KooWGaXrPiY8TbiNvLHXEvS8P9yFJF5gUoLEahzXWdk9if9y}:Connection::poll: yamux::connection::rtt: sending ping 2924600396    
2024-05-16T09:53:22.678122Z DEBUG p2p::service: Timer tick    
2024-05-16T09:53:24.332585Z  INFO p2p::protocol: Received 18.3% of state data (13631488 of 74680770) ...    
2024-05-16T09:53:32.678050Z DEBUG p2p::service: Timer tick    

I am sitting in a café with a really crappy internet (~300 Kb/s) and apparently pings are now moving faster. It could just be a TCP windowing fluke, but still, you can see that the first ping traveled for 87 seconds. At very the same time, the stream was transferring a megabyte chunk of data.

After that hiccup, subsequent pings were delivered within a few seconds. It's way better than before, but still inconsistent.

Could it be that stream granularity prevents multiplexer to schedule other packets somehow?

@mxinden
Copy link
Member

mxinden commented May 21, 2024

I am sitting in a café with a really crappy internet (~300 Kb/s) and apparently pings are now moving faster. It could just be a TCP windowing fluke, but still, you can see that the first ping traveled for 87 seconds. At very the same time, the stream was transferring a megabyte chunk of data.

Can you reproduce this behavior in other networks? What latency do you get with ICMP's ping?

@0x7CFE
Copy link
Contributor Author

0x7CFE commented May 21, 2024

Can you reproduce this behavior in other networks? What latency do you get with ICMP's ping?

Yes, I saw similar behavior on other networks (see the issue description). The ping was a few hundred ms.

@mxinden
Copy link
Member

mxinden commented May 21, 2024

Are you able to run with a patched yamux version? If so, can you test with:

Unstaged changes (1)
modified   yamux/src/connection.rs
@@ -416,6 +416,33 @@ impl<T: AsyncRead + AsyncWrite + Unpin> Active<T> {
                 Poll::Pending => {}
             }
 
+            if self.pending_read_frame.is_none() {
+                match self.socket.poll_next_unpin(cx) {
+                    Poll::Ready(Some(frame)) => {
+                        match self.on_frame(frame?)? {
+                            Action::None => {}
+                            Action::New(stream) => {
+                                log::trace!("{}: new inbound {} of {}", self.id, stream, self);
+                                return Poll::Ready(Ok(stream));
+                            }
+                            Action::Ping(f) => {
+                                log::trace!("{}/{}: pong", self.id, f.header().stream_id());
+                                self.pending_read_frame.replace(f.into());
+                            }
+                            Action::Terminate(f) => {
+                                log::trace!("{}: sending term", self.id);
+                                self.pending_read_frame.replace(f.into());
+                            }
+                        }
+                        continue;
+                    }
+                    Poll::Ready(None) => {
+                        return Poll::Ready(Err(ConnectionError::Closed));
+                    }
+                    Poll::Pending => {}
+                }
+            }
+
             if self.pending_write_frame.is_none() {
                 match self.stream_receivers.poll_next_unpin(cx) {
                     Poll::Ready(Some((_, Some(StreamCommand::SendFrame(frame))))) => {
@@ -448,33 +475,6 @@ impl<T: AsyncRead + AsyncWrite + Unpin> Active<T> {
                 }
             }
 
-            if self.pending_read_frame.is_none() {
-                match self.socket.poll_next_unpin(cx) {
-                    Poll::Ready(Some(frame)) => {
-                        match self.on_frame(frame?)? {
-                            Action::None => {}
-                            Action::New(stream) => {
-                                log::trace!("{}: new inbound {} of {}", self.id, stream, self);
-                                return Poll::Ready(Ok(stream));
-                            }
-                            Action::Ping(f) => {
-                                log::trace!("{}/{}: pong", self.id, f.header().stream_id());
-                                self.pending_read_frame.replace(f.into());
-                            }
-                            Action::Terminate(f) => {
-                                log::trace!("{}: sending term", self.id);
-                                self.pending_read_frame.replace(f.into());
-                            }
-                        }
-                        continue;
-                    }
-                    Poll::Ready(None) => {
-                        return Poll::Ready(Err(ConnectionError::Closed));
-                    }
-                    Poll::Pending => {}
-                }
-            }
-
             // If we make it this far, at least one of the above must have registered a waker.
             return Poll::Pending;
         }

@mxinden
Copy link
Member

mxinden commented May 21, 2024

Do I understand correctly, that you are using the same yamux version on both the server and the client?

@0x7CFE
Copy link
Contributor Author

0x7CFE commented May 21, 2024

Are you able to run with a patched yamux version?

Will try when I have time. Probably this week.

Do I understand correctly, that you are using the same yamux version on both the server and the client?

Yes, both nodes run the same binary, so I'm pretty sure they both use the same version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants