-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSL Radio::ReceiveAt failed in some cases #9025
Comments
Hi @Irving-cl, thanks for the investigation. In the past I've seen cases in which This seems to be more severe, with all subsequent receive slots missed as well. Do you think the 2 ms margin will be enough or will the times eventually overlap again? On the other hand, if the problem is harder to reproduce with shorter CSL periods, does it make sense to apply the same margin? Could the margin be proportional to the CSL period value? |
Thanks! Setting the margin proportional to the CSL period sounds good to me. I'll make more tests and see what is the proper ratio. Just want to check with you: If the margin is fixed, does it have some impact for short CSL periods? Because as I understand, during the margin time, the radio will still keep sleeping. |
It shouldn't have a big impact, just doesn't seem natural to increase the margin unnecessarily. Since this is added to |
@edmont Make sense to me. |
@Irving-cl Thanks for looking further into this. It could be useful to include the build/commit id for both openthread and the Nordic platform code used for the experiments. Looking at the latest Nordic platform code, it has a setting for Maybe it would be useful to find out or describe here why the task processor calls tasks too late in this particular case; and/or which tasks are taking so long to execute in this case. And why these tasks don't take long with the shorter sleep periods :-) |
Thanks! You're right. It's weird that Note that this issue only happens with the Regarding the build/commit, I used the latest commit for the CSL receiver. However, as I mentioned, the CSL Transmitter is a Thread product. When I tested with two DKs, the issue doesn't occur. This is kind of interesting. But as I checked, I'm sure that the issue is on the receiver because the tx time of the transmitter is correct and I can see the errors of |
@Irving-cl Maybe it's worth trying to measure this again after #9322 has been completed (on CSL timestamp reference points). Also the CSL code has meanwhile changed; it now uses |
Describe the bug
Recently I'm doing some more strict tests for CSL transmission. The CSL transmitter is a Thread product (a border router using host+RCP mode) and the CSL receiver is
ot-cli-ftd
on nRF52840DK. I can sometimes seeRadio::ReceiveAt
failed with an error and as a result, CSL transmission failed. This doesn't happen all the time. When CSL period is set to 5s, this happens a lot. When CSL period is 1,2,3s, it can work well.Debug and Analysis
I have debugged this issue by adding logs on the CSL receiver and checking the logs from JLink. And it was indeed caused by the failure of calling
Radio::ReceiveAt
. I added logs here (SubMac::HandleCslTimer
):For the failed cases, I can see such output:
param
is the value ofmCslSampleTime.GetValue() - periodUs - timeAhead
, which is the parameter ofReceiveAt
. We can see that in the failed cases, whenReceiveAt
is called,param
is close to (or even later than) the timestamp of the logs. For example:The log timestamp (when
ReceiveAt
is called) is 17855, param is also 17855(811).That means, when
ReceiveAt
is called, the expected time has passed. I guess that's the reasonReceiveAt
failed.Proposed Solution
I think we can let the timer fires a little bit earlier than
CslSampleTime - Ahead
. This advance is not for the window, it's for the accuracy of the SubMac timer. If the timer is set to fire exactly at a time and ReceiveAt is also called with that time, sometimes the timer fires later than the expected time andReceiveAt
failed.So I think we can fire the timer 1-2 ms earlier. Like:
As I understand, this advance time won't cause any negative effect since Receive will still start at the time specified.
I made some tests with 2ms as the timer margin and get some good results:
In this case, we can see
param
is 1~2 ms after the log timestamp.Any thoughts? @EskoDijk @edmont
If you think it's fine I'll create a PR.
The text was updated successfully, but these errors were encountered: