Let’s set some simple objectives for our data analysis exercise. This should be challenging enough to require data decomposition and communication in parallel mode, but simple enough to not get bogged down in the details of math or statistics. To that end, I’d like to know
- What was the maximum measured wind speed in the Gulf of Mexico in the 2005-2017 period? Which buoy recorded the maximum value?
- Which buoys had the strongest average winds, and which had the lowest average winds? What were their respective values?
To find the answers, we’ll need our program to have a few elements:
- Reading each CSV file and storing the wind speed data in arrays
- Finding the maximum and mean (average) wind speed values for each buoy
- Comparing the maximum and mean wind speed between all buoys
You could program each of these tasks without parallel considerations. However, if we execute this program serially, each file will be processed in order, one at a time. For many large files, this approach can become infeasible or even impossible. This is where parallel data decomposition will come to our aid!
If we implement our program correctly, we should get output like this:
Maximum wind speed measured is 40.9000015 at station 42001
Highest mean wind speed is 6.47883749 at station 42020
Lowest mean wind speed is 5.43456125 at station 42036
Objectives