So how do we update the script to get the desired output files? The answer lies at the bottom of the code below the comment "# User Modifiable Subroutines" in csvref.pl. There are 3 subroutines that need to be modified which are the following:
1
writenewheader();
2
processline($line, $file1prefix);
3
setdatetimestring($dts);
Before editing these routines, the first thing that needs to be done is to record the column format of the input csv files. Lets use my example files from first post. So in ANZ.csv we have:
<PER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>
Note: if there is no header then it's still a good idea to record what your looking at.
And say we want to display the result output as
Ticker,Date,Open,High,Low,Close,Volume
Lets go through the code changes in the above order:
1) In subroutine writenewheader(), we need to simply change the print statement to the following:
print FILE2 "Ticker,Date,Open,High,Low,Close,Volume";
FILE2 is just an output file handle for writing the above line.
If you enter "y" to "Write New Header (y/n) ?" when running the script, the above header will be written to the beginning of the resultant output file/(s).
2) processline($line, $fileprefix) takes 2 arguments, $line which is a variable that contains the line (or row) from the input file and $fileprefix which contains the input file name (without the extension) from which the line/row comes from. The only time we need to use the $fileprefix is if we want to store this as a column like in this example.
The $line variable contains all the column values seperated by the delimiter "," and the code in the while statement stores these column values in array @tmp. $tmp[0] contains the first column value, $tmp[1] contains the second column value and so on. So in the above example we have:
$tmp[0] = value of column <Per>
$tmp[1] = value of column <DTYYYYMMDD>
$tmp[2] = value of column <TIME>
$tmp[3] = value of column <OPEN>
$tmp[4] = value of column <HIGH>
$tmp[5] = value of column <LOW>
$tmp[6] = value of column <CLOSE>
$tmp[7] = value of column <VOL>
$tmp[8] = value of column <OPENINT>
Now the columns we want displayed are "Ticker,Date,Open,High,Low,Close,Volume". Since the Ticker column is not found here we need to generate it some how (A special case). Fortunately the Ticker just happens to be the input file name (2nd argument passed in), so we can use variable $fileprefix. All we need to change in the code now is the print statement so the following should do the trick:
print FILE2 "$fileprefix,$tmp[1],$tmp[3],$tmp[4],$tmp[5],$tmp[6],$tmp[7]";
This will display output values like so:
ANZ,20141017,31.84,32.12,31.78,31.93,7450100
Note that the second value here is "20141017" (from $tmp[1]) and when you run the script you get "17/10/2014". This is because I wanted to change the way the date was displayed. The line "$datetimestr = setdatetimestring($tmp[1])" does the change. So variable $datetimestr contains the newly formated date value and subroutine setdatetimestring() is discussed in the next section. Once you are happy with the newly created date column we can update the print statement to:
print FILE2 "$fileprefix,$datetimestr,$tmp[3],$tmp[4],$tmp[5],$tmp[6],$tmp[7]";
3) setdatetimestring() is probably the most difficult code change to make because you have to seperate all the small things that make up the date and/or time and then merge them back together again. This function takes one argument which is the original date/time value from the input file. This argument is stored as $_[0]. As there are many combinations for date format values, I shall only explain the current code so you get an idea of what to do.
So the original date format looks something like "20141017".
The code $yyyy = substr($_[0],0,4) takes the first 4 characters of the original date starting at the first character (position 0) and stores the result in variable $yyyy.
The code $mm = substr($_[0],4,2) takes 2 characters of the original date starting at the 5th character (position 4) and stores the result in variable $mm.
The code $dd = substr($_[0],6,2) takes 2 characters of the original date starting at the 7th character (position 6) and stores the result in variable $dd.
We then merge the variables into one with the following code "$datetimestr = "$dd/$mm/$yyyy" and return this value to the calling subroutine.
Hopefully this helps clear up the code changes people need to make and here ends the Lesson
Andrew