Skip to main content

Configure Puma

Sometimes the Ruby agent pushes the application over the timeout threshold and that prevents the server from startup. This can be prevented by server configuration.

Puma can be configured directly through the CLI, in the config/puma.rb or files.


While the agent should work with the default or your custom configuration, it adds overhead to the first request. As such, you may need to increase timeouts, and here’s how:


Some of the options are available only in Cluster Mode. All of the available options for the timeouts are listed in the puma/dsl.rb.

  • persistent_timeout(seconds) - Define how long persistent connections can be idle before Puma closes them. The seconds are passed as integers.

  • first_data_timeout(seconds) - Define how long the tcp socket stays open, if no data has been received. The seconds are passed as integers.

  • *force_shutdown_after(val=:forever) *- How long to wait for threads to stop when shutting them down. You can pass seconds too, but you can pass symbols between :forever and :immediately.

    • :forever - the value is set to -1

    • :immediately - the value is set to 0

    • seconds - it sets them directly as the timeout


    Puma always waits a couple of seconds before shutdown, even in immediately mode.

The following options are only available in cluster mode:

  • worker_timeout(seconds) - Verifies that all workers have checked in to the master process within the given timeout. This timeout is to protect against dead or hung processes. Setting this value will not protect against slow requests. The minimum value is 6 seconds, the default value is 60 seconds.

  • worker_boot_timeout(seconds) - change the default worker timeout for booting. If unspecified - it will default to the value of worker_timeout.

  • worker_shutdown_timeout(seconds) - Set the timeout for worker shutdown.

  • wait_for_less_busy_worker(val=0.005) - attempts to route traffic to less-busy workers by causing them to delay listening on the socket, allowing workers which are not processing any requests to pick up new requests first.


    This setting only works with MRI. For all other interpreters, this setting does nothing.

Puma initially sets two default timeout values:

  • DefaultWorkerTimeout = 60

  • DefaultWorkerShutdownTimeout = 30

To apply all of the timeouts settings, Puma must be configured to work in Cluster Mode.


Cluster mode is introduced in Puma 5, which allows Puma to fork workers from worker 0, instead of directly from the master process.

Similar to the preload_app option, the fork_worker option allows your application to be initialized only once for copy-on-write memory savings.

This actual mode has couple of advantages, and the first one is that it’s compatible with a phased restart. The master process initially does not preload the application and that's why this mode works with phased restart. When worker 0 reloads as part of a phased restart, it initializes a new copy of your application first, then the other workers reload by forking from this new worker already containing the new preloaded application.


A phased restart replaces all running workers in Puma cluster. It is done by first killing an old worker, then starting a new worker, waiting until the new worker has successfully started before proceeding to the next worker, until it goes through all workers. The master process is not restarted.

This allows a phased restart to complete as quickly as a hot restart while still minimizing downtime by staggering the restart across cluster workers.

The other advantage is that a refork command is added for additional copy-on-write improvements in running applications and the idea is that it re-loads all nonzero workers by re-forking them from worker 0.

This command can potentially improve memory utilization in large or complex applications that don't fully pre-initialize on startup, because the re-forked workers can share copy-on-write memory with a worker that has been running for a while and serving requests.

A refork will also automatically trigger once, after a certain number of requests have been processed by worker 0 (default 1000). To configure the number of requests before the auto-refork, pass a positive integer argument to fork_worker (e.g., fork_worker 1000), or 0 to disable.


  • Cluster mode is not compatible with preload_app.

  • In order to fork new workers cleanly, worker 0 shuts down its server and stops serving requests so there are no open file descriptors or other kinds of shared global state between processes, and to maximize copy-on-write efficiency across the newly-forked workers. This may temporarily reduce total capacity of the cluster during a phased restart / refork.

After going through fork_worker and re-fork commands, these are other clustered (fork worker) commands:

  • *workers(count) *- How many worker processes to run. Typically this is set to the number of available cores. The default is the value of the environment variable WEB_CONCURRENCY if set, otherwise 0.

  • before_fork(&block) - code to run immediately before master process forks workers (once on boot). These hooks can block if necessary to wait for background operations unknown to Puma to finish before the process terminates.

  • on_worker_boot(&block) - code to run in a worker when it boots to setup the process before booting the app.

  • on_worker_shutdown(&block) - code to run immediately before a worker shuts down (after it has finished processing HTTP requests)

  • on_worker_fork(&block) - code to run in the master right before a worker is started. The worker's index is passed as an argument.

  • after_worker_fork(&block) - code to run in the master after a worker has been started. The worker's index is passed as an argument.

  • on_refork(&block) - When enabled, code to run in Worker 0 before all other workers are re-forked from this process, after the server has temporarily stopped serving requests (once per complete refork cycle).This can be used to trigger extra garbage-collection to maximize copy-on-write efficiency, or close any connections to remote servers(database, Redis, ...) that were opened while the server was running.

  • out_of_band(&block) - code to run out-of-band when the worker is idle.These hooks run immediately after a request has finished processing and there are no busy threads on the worker. The worker doesn't accept new requests until this code finishes. This hook is useful for running out-of-band garbage collection or scheduling asynchronous tasks to execute after a response.

  • fork_worker(after_request=1000) - When enabled, workers will be forked from worker 0 instead of from the master process. This option is similar to `preload_app` because the app is preloaded before forking, but it is compatible with phased restart. This option also enables the refork command.

  • nakayoshi_fork(enabled=true) - This is kind of different, but when enabled, Puma will GC 4 times before forking workers. It will increase time to boot and fork. See your logs for details on how much time this adds to your boot process. For most apps, it will be less than one second.This fork method is based on the work of Koichi Sasada and Aaron Patterson and this option may decrease memory utilization of preload-enabled cluster-mode Pumas.


    If available (Ruby 2.7+), it will also call GC.compact.

    Not recommended for non-MRI Rubies.

See also