do not panic when recover from disk failure #354
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We use the dragonboat to start an on-disk statemachine. When we test the case of remove all dragonboat data, including the raft log and NodeHost data, it is the same as the physical disk failure, and then we try to start the dragonboat process, we found it panic at handleHeartbeatMessage and it is not resonable because the disk failure cause replica failure forever.
Version: v4.0.0-20231222133740-1d6e2d76cd57
Action: 1. stop process 2. rm -fr /path/to/dragonboat-data/* 3. start process
Log:
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.872609 D | dragonboat: [00002:00001] on disk SM is beng initialized
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882507 I | rsm: [00003:00001] opened disk SM, index 0
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882526 I | rsm: [00003:00001] no snapshot available during launch
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882541 D | dragonboat: [00003:00001] completed recoverRequested
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882556 I | rsm: [00002:00001] opened disk SM, index 0
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882560 I | rsm: [00002:00001] no snapshot available during launch
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882566 I | rsm: [00004:00001] opened disk SM, index 0
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882570 I | rsm: [00004:00001] no snapshot available during launch
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882577 I | dragonboat: [00003:00001] initialized using <00003:00001:0>
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882597 I | dragonboat: [00003:00001] initial index set to 0
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882605 I | dragonboat: [00004:00001] initialized using <00004:00001:0>
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882633 I | dragonboat: [00004:00001] initial index set to 0
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882645 I | dragonboat: [00002:00001] initialized using <00002:00001:0>
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882677 I | dragonboat: [00002:00001] initial index set to 0
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882692 D | dragonboat: [00004:00001] completed recoverRequested
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882722 D | dragonboat: [00002:00001] completed recoverRequested
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882830 I | rsm: [00001:00001] opened disk SM, index 0
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882846 I | rsm: [00001:00001] no snapshot available during launch
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882862 D | dragonboat: [00001:00001] completed recoverRequested
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882882 I | dragonboat: [00001:00001] initialized using <00001:00001:0>
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.882900 I | dragonboat: [00001:00001] initial index set to 0
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.895241 W | raft: [f:1,l:3,t:1,c:3,a:0] [00004:00001] t3 received Heartbeat with higher term (66) from n00002
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.895281 W | raft: [f:1,l:3,t:1,c:3,a:0] [00004:00001] t3 become follower after receiving higher term from n00002
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.895318 I | raft: [f:1,l:3,t:1,c:3,a:0] [00004:00001] t66 became follower
5月 20 17:21:08 kk1 xx[193242]: 2024-05-20 17:21:08.895325 C | raft: invalid commitTo index 46355, lastIndex() 3
5月 20 17:21:08 kk1 xx[193242]: panic: invalid commitTo index 46355, lastIndex() 3
5月 20 17:21:08 kk1 xx[193242]: goroutine 318 [running]:
5月 20 17:21:08 kk1 xx[193242]: github.com/lni/goutils/logutil/capnslog.(*PackageLogger).Panicf(0x20?, {0x36444c5?, 0xc00012a110?}, {0xc0002b22a0?, 0xc000e5c410?, 0xc001159a18?})
5月 20 17:21:08 kk1 xx[193242]: /root/go/pkg/mod/github.com/lni/goutils@v1.4.0/logutil/capnslog/pkg_logger.go:88 +0xbb
5月 20 17:21:08 kk1 xx[193242]: github.com/lni/dragonboat/v4/logger.(*capnsLog).Panicf(0xc000e5c3f0?, {0x36444c5?, 0x41e225?}, {0xc0002b22a0?, 0x2ef8700?, 0xc18ae361355d7800?})
5月 20 17:21:08 kk1 xx[193242]: /root/go/pkg/mod/github.com/lni/dragonboat/v4@v4.0.0-20231222133740-1d6e2d76cd57/logger/capnslogger.go:74 +0x26
5月 20 17:21:08 kk1 xx[193242]: github.com/lni/dragonboat/v4/logger.(*dragonboatLogger).Panicf(0xb513?, {0x36444c5, 0x29}, {0xc0002b22a0, 0x2, 0x2})
5月 20 17:21:08 kk1 xx[193242]: /root/go/pkg/mod/github.com/lni/dragonboat/v4@v4.0.0-20231222133740-1d6e2d76cd57/logger/logger.go:135 +0x57
5月 20 17:21:08 kk1 xx[193242]: github.com/lni/dragonboat/v4/internal/raft.(*entryLog).commitTo(0xc0002b4310, 0xb513)
5月 20 17:21:08 kk1 xx[193242]: /root/go/pkg/mod/github.com/lni/dragonboat/v4@v4.0.0-20231222133740-1d6e2d76cd57/internal/raft/logentry.go:341 +0x102
5月 20 17:21:08 kk1 xx[193242]: github.com/lni/dragonboat/v4/internal/raft.(*raft).handleHeartbeatMessage(, {0x11, 0x1, 0x2, 0x4, 0x42, 0x0, 0x0, 0xb513, 0x0, ...})
5月 20 17:21:08 kk1 xx[193242]: /root/go/pkg/mod/github.com/lni/dragonboat/v4@v4.0.0-20231222133740-1d6e2d76cd57/internal/raft/raft.go:1398 +0x48
5月 20 17:21:08 kk1 xx[193242]: github.com/lni/dragonboat/v4/internal/raft.(*raft).handleFollowerHeartbeat(, {0x11, 0x1, 0x2, 0x4, 0x42, 0x0, 0x0, 0xb513, 0x0, ...})
5月 20 17:21:08 kk1 xx[193242]: /root/go/pkg/mod/github.com/lni/dragonboat/v4@v4.0.0-20231222133740-1d6e2d76cd57/internal/raft/raft.go:2134 +0x85
5月 20 17:21:08 kk1 xx[193242]: github.com/lni/dragonboat/v4/internal/raft.defaultHandle(, {0x11, 0x1, 0x2, 0x4, 0x42, 0x0, 0x0, 0xb513, 0x0, ...})
5月 20 17:21:08 kk1 xx[193242]: /root/go/pkg/mod/github.com/lni/dragonboat/v4@v4.0.0-20231222133740-1d6e2d76cd57/internal/raft/raft.go:2332 +0x7a
5月 20 17:21:08 kk1 xx[193242]: github.com/lni/dragonboat/v4/internal/raft.(*raft).Handle(, {0x11, 0x1, 0x2, 0x4, 0x42, 0x0, 0x0, 0xb513, 0x0, ...})
5月 20 17:21:08 kk1 xx[193242]: /root/go/pkg/mod/github.com/lni/dragonboat/v4@v4.0.0-20231222133740-1d6e2d76cd57/internal/raft/raft.go:1601 +0x102
5月 20 17:21:08 kk1 xx[193242]: github.com/lni/dragonboat/v4/internal/raft.(*Peer).Handle(_, {0x11, 0x1, 0x2, 0x4, 0x42, 0x0, 0x0, 0xb513, 0x0, ...})