Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reef: qa: add a YAML to ignore MGR_DOWN warning #57565

Open
wants to merge 1 commit into
base: reef
Choose a base branch
from

Conversation

dparmar18
Copy link
Contributor

@dparmar18 dparmar18 commented May 20, 2024

Backport of #56944
Backport tracker: https://tracker.ceph.com/issues/66061
Parent tracker: https://tracker.ceph.com/issues/65265

RCA showed that it is not the NFS code that lead to the warning since the warning occurred before the test cases started to execute, later on after some discussion with the venky and greg, it was found that there were some clog changes made recently which leads to this warning being added to the clog.

Digging more further, it was found that the warning is generated when mgr fail is run when there is no mgr available. The reason for unavailability is when setup_mgrs() in class MgrTestCase stops the mgr daemons, sometimes the mgr just crashes - mgr handle_mgr_signal *** Got signal Terminated *** and after which mgr fail (again part of setup_mgrs()) is run and the MGR_DOWN warning is generated.

This warning is only evident in nfs is because this is the only fs suite that makes use of class MgrTestCase. To support my analysis, I had ran about eight jobs in teuthology and I could not reproduce this warning. Since this is not harming the NFS test cases execution and the logs do mention that the mgr daemon did get restarted (INFO:tasks.cephadm.mgr.x:Restarting mgr.x (starting--it wasn't running)...), it is good to conclude that ignoring this warning is the simplest solution.

Fixes: https://tracker.ceph.com/issues/65265
Signed-off-by: Dhairya Parmar dparmar@redhat.com
(cherry picked from commit 7d954ce)

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

RCA showed that it is not the NFS code that lead to the warning since the
warning occurred before the test cases started to execute, later on after
some discussion with the venky and greg, it was found that there were some
clog changes made recently which leads to this warning being added to the
clog.

Digging more further, it was found that the warning is generated when mgr fail
is run when there is no mgr available. The reason for unavailability is when
`setup_mgrs()` in class `MgrTestCase` stops the mgr daemons, sometimes the mgr
just crashes - `mgr handle_mgr_signal  *** Got signal Terminated ***`  and
after which `mgr fail` (again part of `setup_mgrs()`) is run and the `MGR_DOWN`
warning is generated.

This warning is only evident in nfs is because this is the only fs suite that
makes use of class `MgrTestCase`. To support my analysis, I had ran about eight
jobs in teuthology and I could not reproduce this warning. Since this is not
harming the NFS test cases execution and the logs do mention that the mgr
daemon did get restarted (`INFO:tasks.cephadm.mgr.x:Restarting mgr.x
(starting--it wasn't running)...`), it is good to conclude that ignoring this
warning is the simplest solution.

Fixes: https://tracker.ceph.com/issues/65265
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
(cherry picked from commit 7d954ce)
@github-actions github-actions bot added the cephfs Ceph File System label May 20, 2024
@github-actions github-actions bot added this to the reef milestone May 20, 2024
@leonid-s-usov
Copy link
Contributor

jenkins test api

@leonid-s-usov
Copy link
Contributor

jenkins test make check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cephfs Ceph File System
Projects
None yet
2 participants