Before we do any data analysis, we need to take care of the logistics. These steps will generally apply to each of the three challenges in this exercise:
- Define the arrays to hold the data. We’ll learn how to declare dynamic arrays whose size isn’t known at compile time. We’ll use built-in types:
realfor stock prices andcharacterfor timestamps. - Read data from the CSV files. We’ll implement a custom subroutine to read the files and store the data into arrays defined in step 1.
- Calculate statistics from raw data. Most of the number crunching described in this chapter will be in relation to this step.
To start, we can estimate the performance of different stocks by calculating their gain over the whole period of the time series data–from January 2000 to May 2018. When we implement the solution to our first challenge, the output of the program will look like this:
2000-01-03 through 2018-05-14
Symbol, Gain (USD), Relative gain (%)
-------------------------------------
AAPL 184.594589 5192
AMZN 1512.16003 1692
CRAY 9.60000038 56
CSCO 1.71649933 4
HPQ 1.55270004 7
IBM 60.9193039 73
INTC 25.8368015 89
MSFT 59.4120979 154
NVDA 251.745300 6964
ORCL 20.3501987 77
For each stock, we’ll calculate its gain; that is, the difference between the closing price at the end of the beginning of the time series, and the gain in percent relative to the starting price. The main program (not including the utility functions) looks like the following listing.
Listing 5.1 Calculating stock gains over the whole time series
program stock_gain
use mod_arrays, only: reverse ❶
use mod_io, only: read_stock ❷
implicit none
character(len=4), allocatable :: symbols(:) ❸
character(len=:), allocatable :: time(:) ❹
real, allocatable :: open(:), high(:), low(:),& ❺
close(:), adjclose(:), volume(:) ❺
integer :: n
real :: gain
symbols = ['AAPL', 'AMZN', 'CRAY', 'CSCO', 'HPQ ',& ❻
'IBM ', 'INTC', 'MSFT', 'NVDA', 'ORCL'] ❻
do n = 1, size(symbols)
call read_stock( & ❼
'data/' // trim(symbols(n)) // '.csv', & ❼
time, open, high, low, close, adjclose, volume) ❼
adjclose = reverse(adjclose) ❽
gain = (adjclose(size(adjclose)) - adjclose(1)) ❾
if (n == 1) then ❿
print *, & ❿
time(size(time)) // ' through ' // time(1) ❿
print *, 'Symbol, Gain (USD), Relative gain (%)' ❿
print *, '-------------------------------------' ❿
end if ❿
print *, symbols(n), gain, & ⓫
nint(gain / adjclose(1) * 100) ⓫
end do
end program stock_gain
❶ Function to reverse an array
❷ Function to read data from CSV files
❸ Array to store stock symbols
❺ Array to store stock price data
❻ List of stocks that we’ll analyze
❼ Uses a custom subroutine to read the data from CSV files
❽ Reverses the order of adjusted close price
❾ Calculates the absolute stock gain
❿ Writes the table header to the screen, only in first iteration
⓫ Calculates relative gain and prints results to the screen
In this program, we first declare the dynamic (allocatable) arrays to hold the list of stock symbols and the timestamps and stock price data for each stock. We then loop over each stock and read the data from the CSV files one at a time. Finally, we calculate the difference between the end and start price and print the results to the screen. The following subsections go into detail about how dynamic Fortran arrays work. Specifically, we’ll examine how to declare them, allocate them in memory, initialize their values, and finally clear them from memory when done.