-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ceph logs are not pruned/cleaned up for components that are no longer scheduled on a node #14202
Comments
This is definitely a challenging edge condition. On one hand, when a component moves to another node, we don't necessarily want to immediately delete that component's logs, in case the move to another node was caused by a failure someone would want to debug. And at the same time, I do think that since Rook is keeping and rotating on-disk logs, Rook should also take care to clean up any old lingering things with some regularity. Relying on user environments to do cleanup would risk a too-many-cooks scenario that could end up messing with the existing log rotation mechanism. Logs are currently managed and rotated via a container sidecar mechanism. To handle broader-scope cleanup, Rook would probably need to run a rotator daemon on each node. Or Rook could periodically run a rotation job on every node. If Rook were to implement either mechanism, I feel like that basically makes the existing log rotator sidecar containers obsolete. Overall, that is probably a good thing since those sidecars each eat up a portion of node CPU/mem resources. A per-node rotator daemonset would be more resource efficient. A Job that runs on a cron would be even more resource efficient, but Rook would have to create a Job per node, which would be harder to implement in code. A good initial implementation could use a daemonset. I think Seb added the collector sidecars initially, but @subhamkrai made some modifications in recent memory. Let's get his and @travisn 's inputs here also. As a first glance, I don't think we can put a super high priority on work for this. However, there are likely other users who are (or are about to) encounter this scenario themselves. And I think that being able to reduce overall resource consumption is a compelling factor. For me, I'd like to see this make it into 1.15 or 1.16 if possible, and this seems like something we could consider backporting to 1.14. |
Without requiring any code changes, what if we had a script running in a daemonset (added to the examples folder) that does the following:
Users who notice this issue could choose to deploy the daemonset. |
@BlaineEXE I think its fair to say this isn't a high priority. In my env, this issue takes months to years to become noticeable. However, it has occurred on several different clusters and it manifests as a node becoming tainted with disk-pressure. It has happened enough times that I'm considering implementing a periodic cleanup external to k8s. E.g. A cron job that removes files older than 30 days. @travisn I suspect that a daemonset is the best that can be done without adding a dependency on an external controller. It doesn't sound like I use a label on storage nodes of |
@travisn @BlaineEXE @jhoblitt Additionally Most of the log rotation will have an option to keep the logs to specific duration or count, like keep logs for the last one week or keep the last 5 logs file etc. Should we also have that mechanism as well? |
@Madhu-1 we do have the I think the above problem is for the case if the pod got re-scheduled to another node, so the older log file on old node isn't deleted for that daemon... |
@parth-gr is that used to delete the old logs as well after a certain time? yes i agree re-schedule is problematic which need to be looked into as well. |
yes that's correct, it deletes it in certain interval |
That would be a good idea.... we can yaml in the example folder which do this simpler cleaning job, and later rook can use to run this with the logrotate if needed |
A very simple option would also be to update the logrotate sidecar script so that it periodically deletes really old files. Then we don't need any redesign or separate daemonset. The only case that doesn't cover is if no ceph daemons are running on a node, but that doesn't seem worth worrying about. |
We learned in huddle today that logrotate has a mechanism that sends a HUP signal to a process after it rotates logs that tells the process to reload its logs. This prevents files from getting corrupted, to my understanding. I think what this means is that we truly do need the sidecar containers for logrotate to work well with ceph. So perhaps it would be a good thing to have the script also periodically check the fuller-scope log directory to delete files older than ~some criteria~, like 3-6 months. This assumes that the sidecars have access to (or can be given access to) the full log dir. I'm not sure how configurable logrotate is in that respect though. Is it possible to tell it to rotate Maybe Travis' suggestion from here to have a daemonset with a cleanup script would be good if we have trouble getting logrotate to do the wide-scale cleanup. |
Agreed, this is what I was trying to suggest with my previous comment. We should have some flexibility in our script, independent of whether |
@Madhu-1 As an operator, the main concern I have is that the total space used by logs doesn't grow unbounded. IOW - require periodic human intervention. The specific case here is not being caused by log rotation not working for running pods but that logs for pods which no longer exist (on that node) are never cleaned up and that over time this leads to unbounded growth. @travisn Another, possibly cheap, solution would be for the operator to launch a job once per day on nodes which have any rook/ceph component running on them that delete files over a certain age threshold. |
To aid in discussion, this is what the default logrotate file looks like in the ceph container.
Then Rook does some sed replacements to make it only apply to the current daemon, and apply some of the user config overrides, and then calls logrotate via script every 15 mins. rook/pkg/operator/ceph/controller/spec.go Lines 103 to 131 in 3d0049c
I believe all log collector pods have /var/log/ceph mounted without a specific subdir, so I think travis' suggestion that the scripts could be tailored to do overall cleanup would be fairly straightforward. I would expect that there will be race conditions where old files are being cleaned up by 2 log rotators at once, but we can probably just ensure that the script doesn't return an error in that case. I think it's fine if the log cleanup is a best-effort task -- eventually the cleanup should succeed if it fails for any non-race reason. |
Is this a bug report or feature request?
Deviation from expected behavior:
The size of the logs in the hostpath
/var/lib/rook/rook-ceph/log
continuing to grow unbounded over time.Expected behavior:
That there would be an eventually high water mark for logs that isn't exceeded.
How to reproduce it (minimal and precise):
Basically, run rook-ceph cluster for years and have various components (mons, rgw, etc.) be rescheduled between nodes because of drains.
File(s) to submit:
cluster.yaml
, if necessaryLogs to submit:
Operator's logs, if necessary
Crashing pod(s) logs, if necessary
To get logs, use
kubectl -n <namespace> logs <pod name>
When pasting logs, always surround them with backticks or use the
insert code
button from the Github UI.Read GitHub documentation if you need help.
Cluster Status to submit:
Output of kubectl commands, if necessary
To get the health of the cluster, use
kubectl rook-ceph health
To get the status of the cluster, use
kubectl rook-ceph ceph status
For more details, see the Rook kubectl Plugin
Environment:
rook version
inside of a Rook Pod): 1.3-ish through 1.14.3The text was updated successfully, but these errors were encountered: