Faster parallel execution of Ansible playbooks

Published in

kiratech.io

5 min readAug 2, 2021

In this blog post I will show you a technique that can be used to parallelize any task loop in Ansible.

Ansible is an automatization platform, (relatively) simple, and very versatile. Unfortunately, it is quite lacking when it comes to parallelization. There are fundamentally two ways in which you can parallelize operations in Ansible:

work on multiple hosts at the same time
launch (and poll) multiple tasks concurrently with the poll: 0 keyword.

Unfortunately these standard solutions have drawbacks. In order to parallelize work across several hosts, you need to have different hosts in the first place: this means that if you aim to parallelize a loop of task within a single host, you are out of luck. The second option requires significant modifications of the playbook, because concurrent task must be polled explicitly for completion with theasync_status task, which is very unpractical. Most importantly, async_status is not supported for all tasks, and in particular it is not supported in tasks that include roles or other task lists. This means that there is no way to run concurrently third-party roles, which largely limits the usability of this feature.

The most reliable and effective way to parallelize computations in Ansible is indeed by using the implicit host loop, possibly paired with a large value of the parameter and the free strategy. How can we (ab)use the host loop to parallelize tasks? It turns out that is possible and quite easy, with minimal modifications to the playbook file.

Virtual host slices

The idea is to be able to create multiple virtual host copies, from your real inventory hosts, and to distribute work among them. Several virtual hosts will own a slice of your original workload, and just run that, concurrently with the other virtual hosts, according to the number of forks and the play strategy.

Fortunately, it is possible to add new hosts during the execution of a playbook! You just use the ansible.builtin.add_host task. The new hosts can then be selected in a specific play of the playbook.

Consider this example: your Ansible playbook is configuring some hosts. The configuration consists in running some jobs or workloads with a fictious role run-job, and the job items are are listed in some variable job_ids. Here is a playbook that implements this logic:

We want to parallelize the task Launch all job items, without editing the task files of role run-job. This requires creating some virtual copies to your hosts, each holding a part of the job_ids array. Then, you define a second play in your playbook, which selects on the new host copies, and runs the Launch all job items task unmodified.

To showcase the result, I have written a ready-to-use Ansible role that you can pull from the Galaxy:

This role takes these two mandatory input:

payload_var: the variable holding the array that should be sliced
batch: the size of each slice.

The number of slice hosts is implicitly determined by this parameter batch. The new hosts will be part (by default) of the new group virtual, which can be used as a host pattern to select on the new virtual hosts.

Here is how you would modify the playbook:

The modifications are pretty simple. Now we can try to benchmark the playbooks.

Benchmark

The code in this post is available here. The generate-jobs role simulates work IDs by creating a list of strings of length job_count. The run-job role simulates running a job by sleeping for a random delay, between job_duration_ms_min and job_duration_ms_max (milliseconds, default values respectively 10 and 200).

Let’s write an inventory for three hosts, and let’s generate 100 jobs for each, of a duration between 0 and 200 milliseconds:

We can now time the execution of the first playbook, which naively includes the run-job role in a loop:

That is 113 seconds! And we also notice something interesting: Ansible got 100 iterations of the loop for each task, and it is running all of them sequentially, disregarding host-level parallelization. This is because of a known bug (which is not going to be addressed anytime soon I think…) that causes Ansible break host parallelization when the loop items are not the same. Since the generate-jobs mock role creates all different job items in the format of a string{{ inventory_hostname }}-{{ job_id }}, you get this completely serialized execution.

Now, let’s try to run the modified playbook with virtual hosts, with a job_batch_size of 10, and --forks 20:

That is 21 seconds, for a 80% reduction of wall-clock execution time! And we can observe that all the tasks are in parallel across all (virtual) hosts, because we can see in the console interlaced task output from different hosts.

Implementation

But what is under the hood of the pisto.virtual_slice_hosts slices? It is rather simple:

The role essentially loops over all hosts, and all slices for each host of the payload variable payload_var, and creates a new host in the specified group_name with its own slice of the content of payload_var.

The important implementation detail here is that in order to copy a host into a virtual slice host, one needs to know which variables should be applied to the new host. These obviously include several connection and privilege escalation variables such as ansible_host or ansible_become. The role copies by default a set of known connection variables (specified in the role default variable implicit_copy_vars), and all variables explicitly declared in role variable copy_vars. For example, you may want to run a benchmark as the one above, but setting different amounts of work for each host. In order to do that, you can specify host-specific values for job_duration_ms_min and job_duration_ms_max in the inventory, then you need to modify the Create virtual slice hosts task into:

Of course the list of variables depends on your original playbook. You may want to override copy_vars once and for all at the playbook level.

Conclusion

In this post I showed how you can parallelize arbitrary sections and loops of your playbooks, with minimal modifications to the code of the playbook itself. In particular, loops over items that change on a per-host basis are particularly slow to execute with a naive implementation in Ansible. The trick is to generate new hosts during the playbook execution, using the add_host task, and then execute operations in parallel across these virtual copies of the original host.

The implementation is logic is trivial, and a basic but versatile implementation is available from the Ansible Galaxy as the role pisto.virtual_slice_hosts. The speedup to serial Ansible code is significant: in the synthetic benchmarks shown here there is an 80% reduction of the wall-clock execution time.

The only significant requirements for employing this technique is that the user must know which variables should be copied over to the virtual hosts, and that in most cases the ansible.builtin.free strategy must be used to unlock the performance gains; otherwise, Ansible will most likely serialize your loops due to long standing limitations of Ansible itself (#30816 #36978).

Faster parallel execution of Ansible playbooks

Virtual host slices

Benchmark

Implementation

Conclusion

Written by Lorenzo Pistone