Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

throttling not working as one would expect? #1353

Open
thisIsLoading opened this issue May 20, 2024 · 2 comments
Open

throttling not working as one would expect? #1353

thisIsLoading opened this issue May 20, 2024 · 2 comments

Comments

@thisIsLoading
Copy link

Hi,

i wanted to use throttling for a job that downloads files from a website. i dont want to limit the amount of jobs to be scheduled, just the amount of jobs that are executed within a certain amount of time. i figured one job every 5 seconds would be good enough to not trigger any 429s on the remote server, so i set this:

  good_job_control_concurrency_with(
    # Maximum number of jobs with the concurrency key to be
    # concurrently performed (excludes enqueued jobs)
    # Can be an Integer or Lambda/Proc that is invoked in the context of the job
    perform_limit: 1,

    # Maximum number of jobs with the concurrency key to be performed within
    # the time period, looking backwards from the current time. Must be an array
    # with two elements: the number of jobs and the time period.
    perform_throttle: [12, 1.minute],

    key: -> { self.class.name }
  )

which i thought would exactly do what i need.

however, when running the jobs i realized, that the concurrency control seems to be via an exception where it throws an error about an exceeded throttle:

image

and it re-schedules the job about 3 hours later(?)

There definitely are a lot of moving parts and my simple world view isnt enough. so, i clearly dont seem to understand how i would need to configure the way i want it to perform.

I thought, it would take this arry, would devide the duration by the number in [0] and then executes a job, waits the calculated amount and executes the next job.

can you help me out what i do wrong and how i get it to constantly executing jobs until the queue is empty, without long pauses?

thank you!

@bensheldon
Copy link
Owner

Your understanding of how Throttling actually works is correct. I tried to explain that in this section in the Readme: https://github.com/bensheldon/good_job?tab=readme-ov-file#how-concurrency-controls-work

You're seeing 3 hours because it is using retry_on ... wait: :polynomially_longer. You can add your own retry_on handler to your job with a fixed retry e.g.

retry_on(
          GoodJob::ActiveJobExtensions::Concurrency::ConcurrencyExceededError,
          attempts: Float::INFINITY,
          wait: -> (executions) { 30.seconds + (10 * Kernel.rand) } 
        )

The challenge with throttling and concurrency control is that there's a conflict between the goal of a general job queue (run tasks as quickly as possible) and a throttled queue (run tasks at a managed rate). GoodJob's "dequeue, check constraints, retry" pattern is the same one I've seen implemented elsewhere, but I'm open to contributions or outside inspiration.

@thisIsLoading
Copy link
Author

i understand. thank you @bensheldon

as this was just the early stages of my project, i must admit i jumped ship to sidekiq after i found https://github.com/ixti/sidekiq-throttled which is doing exactly what i needed.

i still feel good_job is doing a better job than sidekiq, just not for this particular use case. unfortunately i dont feel ready enough to contribute anything (yet), so i had to take the easy exit.

with that said, thanks a lot for doing all this and providing this gem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Inbox
Development

No branches or pull requests

2 participants