Using TSDB to store U.S. Equities data from TickData

2012-07-06 17:18:20 -0400

In this article, I'll show you how to use my Time Series Database</a> to import some sample U.S. equities data from TickData</a>, a data provider of high-frequency historical financial data. First, get the sample data</a> and unzip it on your computer. Look in the Quotes folder for a file called GLP_27667.asc. This is file has quotes for Global Partners LP, a petroleum company. If you are running on Mac OS X or Linux, you can use the head</code> command to see the first few lines, which in my download look like this:

If you look on TickData's website, you can find a
document describing the format</a> of these files in detail, but for this example, we're only going to look at the first 7 fields.

  • Date (MM/DD/YYYY)</li>
  • Time</li>
  • Exchange indicator (P means ARCA, M means Chicago, T means NASD)</li>
  • Bid Price</li>
  • Ask Price</li>
  • Bid Amount (100 share lots)</li>
  • Ask Amount (100 share lots)</li>
    Let's create a new TSDB file to store this data. To do this, use the tsdbcreate command.

    tsdbcreate glp.tsdb quotes char exchange double bid_price double ask_price int32 bid_size int32 ask_size</pre>
    This command is creating a new tsdb database called glp.tsdb with a series called "quotes". That series has 5 columns: exchange (a character), bid_price (floating point double), ask_price (floating point double), bid_size and ask_size (both 32-bit integers).

    You can add more than one series to a TSDB file. If you were to call the tsdbcreate command and glp.tsdb already existed, then it would add another series to the file.

    Now that you have an empty TSDB file, we'll create some import instructions and import the data. TSDB has a program called TSDB import that uses an XML file to describe how to parse the delimited data in a file like a CSV file and append it to the end of a time series.  Whenever you import data, you are always appending it to the end of a time series. </strong>You can import from more than one file into one series in a TSDB file, but you need to start with the earliest file. You also must make sure each of the files you import are in chronological order from oldest timestamp to newest (repeated timestamps are okay).

    Here's an example XML file that shows how you could import the TickData quotes files.

    tokens="0,1" format_string="%m/%d/%Y %H:%M:%S%F" />

    The XML import tells the tsdbimport command how to map the comma-separated values in each line into columns in the database. Notice in the timestamp fieldparser</code> block, we specify to use tokens 0 and 1 to make the timestamp. This tells the parser to join tokens zero and one with a space and then to parse the combined string as a timestamp.

    You can save that XML as a text file called "instructions.xml" and then import the GLP_27667.asc file like this:

    tsdbimport instructions.xml GLP_27667.asc glp.tsdb quotes</pre>
    After you run this command, you'll have the complete GLP time series imported. Now, you can access this data quickly via either the R or MATLAB TSDB bindings. Stay tuned for a article on how to do that within the next few days.