U. S. Flight Track Database: Data Summary


Data Qualification, Slicing, and Validation

The data files consist of raw downloaded data that have been quality checked and sorted. The quality checking is essentially the elimination of corrupt, excess, redundant, or inadequate data. Corrupt data takes the form of an occasional isolated en route report that does not follow the correct format. Corrupt data frequency may approach as much one per one hundred thousand. It is detected by automated scanning of the data file and eliminated only after human examination. Some records detected by the scanning procedure are only run-on lines (no line feed character) that can easily be repaired. Excess data are those that fall outside the analysis volume defined by latitude, longitude, and altitude limits. Redundant data are those reports that exactly duplicate others. Excess and redundant data are eliminated by a procedure delineated here. Inadequate data are single en route reports, orphans if you will, that are apparently unrelated to any others. Since you can't make a line with just one point, inadequate data are eliminated.

The qualified data are sliced into pieces by a procedure delineated here, and a series of files are produced with descriptions provided here.

Validation takes place after processed data files have been created and saved. No changes are made to those processed data files on the basis of the validation procedure, delineated here. There is only a monthly validation summary file with a name in the form tmoYYYYMM.val that has the following format. The only way to tell which hours of data are considered valid is to examine the columns identifying valid flight segments in the appropriate tmoYYYYMM.val file. Since estimated data validity is intended to help identify data useful for further analysis, validation results are summarized below.


The following links give a limited statistical summary of the data available on the FTP server.


NOTE: The is no way to determine if the data set is complete or incomplete from information provided only by the data. Data were identified as 'not valid' or 'valid' by a statistical procedure based on the assumption that weeks of data consisting of sequential collections of hourly data from the entire analysis volume could be collected in stationary ensembles that exhibit reasonable behavior. It might be better to think of the process as identifying data as 'probably invalid' or 'probably not invalid'.

If you have questions about this site, you may send email to Don Garber at donald.p.garber@nasa.gov.


This page was last modified on 29 September 2004.