We’ve learned so far, both from coarrays in chapter 7 and from developing the parallel tsunami simulator, that synchronizing images is crucial for writing correct parallel programs. Recall that when we have data dependency between parallel images, one image must wait for data from another image before proceeding with its own calculation. This subsection explains how synchronization of images works within teams, and how to synchronize multiple teams as a whole.
Synchronizing images within a team
The essential synchronization mechanism you learned in chapter 7 was the sync all statement, which placed a barrier in the code at which every image had to wait for all others before proceeding. At the point of a sync all statement, we considered all images to be synchronized. Another option that’s available to us, when we need to synchronize the current image with some but not all other images, is the sync images statement. For example, we used sync all in the sync_edges method of the Field type in the tsunami simulator (see section 10.4) to synchronize every image with all other images. Using sync images, we can instead synchronize each image only with its four neighbors, in mod_field.f 90, subroutine sync_edges:
...
sync images(set(neighbors)) ❶
edge(1:je-js+1,1)[neighbors(1)] = self % data(is,js:je) ❷
edge(1:je-js+1,2)[neighbors(2)] = self % data(ie,js:je) ❷
edge(1:ie-is+1,3)[neighbors(3)] = self % data(is:ie,js) ❷
edge(1:ie-is+1,4)[neighbors(4)] = self % data(is:ie,je) ❷
sync images(set(neighbors)) ❸
self % data(is-1,js:je) = edge(1:je-js+1,2) ❹
self % data(ie+1,js:je) = edge(1:je-js+1,1) ❹
self % data(is:ie,js-1) = edge(1:ie-is+1,4) ❹
self % data(is:ie,je+1) = edge(1:ie-is+1,3) ❹
...
❶ Synchronizes with neighbors before copy into buffer
❷ Copies data into the coarray buffer, edge
❸ Synchronizes with neighbors again before copying out of buffer
❹ Copies data from coarray buffer into the field array
The same behavior holds in the context of teams: sync all and sync images statements now operate within the team in which they’re executed. For example, if you have two teams and you’ve switched the images to them using the change team construct, issuing sync all synchronizes the images within each team, but not the teams themselves. Ditto for sync images. Although this may be confusing at first, you’ll get used to it over time as you practice working with teams. Just remember: sync all and sync images statements always operate only within the current team and can’t affect the images outside of the team. In the next subsection, you’ll see how you can synchronize between teams.
In the sync images snippet, set(neighbors) ensures that we pass unique values of neighbors to sync images. We’ll define set in the same module in mod_field.f 90, as shown in the following listing.
Listing 12.3 Function set to return unique elements of an array
pure recursive function set(a) result(res) ❶
integer, intent(in) :: a(:)
integer, allocatable :: res(:)
if (size(a) > 1) then
res = [a(1), set(pack(a(2:), .not. a(2:) == a(1)))] ❷
else
res = a
end if
end function set
❶ The recursive attribute allows a function to call itself.
❷ Eliminates nonunique elements from the array, one at a time
This is the first time we encounter the recursive attribute. This attribute allows a function or subroutine to invoke itself. The crux of this function is in the fifth line of the listing, where we recursively reduce the array by removing duplicate elements, one by one, using the built-in function pack. For a refresher on pack, see section 5.4, where we used it for the first time. Note that Fortran 2018–the latest iteration of the language as of this writing–makes all procedures recursive by default, so specifying the recursive attribute won’t be necessary anymore. I still include it here because most Fortran compilers have yet to catch up with this recent development.
Synchronizing whole teams
Having established that sync all and sync image statements operate only within the current team and can’t affect the images outside of it, we need a mechanism to synchronize between the teams. Back to our working tsunami example from listing 12.1, where we began incorporating teams for the simulation and logging tasks:
change team(new_team)
if (team_num == 1) then
... ❶
else if (team_num == 2) then
... ❷
end if
end team
As logging depends on the data from the simulation team, we need a way to synchronize images between different teams. This is where the new sync team statement comes in, as shown in the following listing.
Listing 12.4 Synchronizing images within the initial team using the sync team statement
use iso_fortran_env, only: initial_team, team_type ❶
...
change team(new_team)
if (team_num == 1) then
... ❷
sync team(get_team(initial_team)) ❸
else if (team_num == 2) then
sync team(get_team(initial_team)) ❸
... ❹
end if
end team
❶ Imports the initial_team constant from the module
❸ Synchronizes with all images that belong to the initial team
sync team has been introduced to the language to allow synchronizing images within the parent team without leaving the change team construct. To use it, we need to provide it a team value over which to synchronize. In practice, this will typically be a parent team or some other ancestor team (see the “Exercise 1” sidebar for an example of multiple levels of teams), but can also be the current team or the child team. To refer to a team such as the initial team, which we never defined as a variable, we use the get _team built-in function, and pass it the initial_team constant available from the iso_fortran_env module. Besides the initial_team integer constant, iso_fortran _env also provides the parent_team and current_team constants.
For brevity, we won’t get bogged down with the exact code that the logging team will execute. In practice, it could be monitoring the time stepping progress of the simulation team, checking and processing files written to disk, printing simulation statistics to the screen, and perhaps even serving them as a web server. An important element to most of these activities is getting the data from the simulation team.
Exchanging data between teams
I mentioned in the previous subsection that one of the activities the logging team could be performing is monitoring the time stepping of the simulation team. If they’re operating independently and concurrently, how can the logging team know each time the simulation team steps forward? To demonstrate the exchange of data between teams, let’s send the time step count from the simulation team to the logging team. To do this, we’ll make our time step count variable a coarray, and we’ll use the team number in the image selector when referencing that coarray, as shown in the following listing.
Listing 12.5 Exchanging data between teams using image selectors
integer(ik) :: time_step_count[*] ❶
...
change team(new_team)
if (team_num == 1) then
...
time_loop: do n = 1, num_time_steps
...
time_step_count[1, team_number=2] = n ❷
end do time_loop
else if (team_num == 2) then
n = 0
time_step_count = 0
do ❸
if (time_step_count > n) then ❹
n = time_step_count
print *, 'tsunami logger: step ', n, 'of', num_time_steps, 'done'
if (n == num_time_steps) exit ❺
end if
end do
end if
end team
❶ Declares time step count as a coarray
❷ Copies n into time_step_count on image 1 of team 2
❹ Runs this code if time_step_count has been updated
❺ Leaves the loop if we’ve reached the end
In listing 12.5, we’ve declared the time_step_count integer coarray, which we’ll use to exchange the time step count between the simulation team and the logging team. To send the data, we’ll use the usual coarray indexing syntax from chapter 7, with a twist: here, we also specify the team number in the image selector (the values between square brackets). When we write time_step_count[1, team_number=2] = n, we’re saying “Copy the value of n into the time_step_count variable on image 1 of team number 2.” This means that the image number is relative to the team in question–image 1 on team 1 is different from image 1 on team 2. On the logging team, we initialize the local value of time_step_count to zero, loop indefinitely, and check for its value in each iteration. Every time time_step_count is incremented by the simulation team, we print its value to the screen.
While this is a somewhat trivial example–printing a single integer to the screen is not that much work–it illustrates how to effectively offload heavy compute work to other teams. In a real-world app, while the simulation team is busy crunching numbers, one team could be writing the output files to disk, while another could be serving them as a web server. The results of the tsunami simulator won’t change with introduction of teams into the code, because they affect only how the code and its order of execution are organized. The simulation part of the code, which is responsible for producing numerical results, is now running in its dedicated team rather than on all images. While teams don’t necessarily unlock any new capability relative to original image control and synchronization mechanisms, they allow you to more cleanly express distribution of work among images. This becomes especially important for larger, more complex apps.