U. S. Flight Track Database: Data Summary
Data Qualification, Slicing, and Validation
The data files consist of raw downloaded data that have been quality checked and sorted.
The quality checking is essentially the elimination of corrupt, excess, redundant,
or inadequate data. Corrupt data takes the form of an occasional isolated en route report
that does not follow the correct format. Corrupt data frequency may approach as much one per
one hundred thousand. It is detected by automated scanning of the data file and
eliminated only after human examination. Some records detected by the scanning procedure
are only run-on lines (no line feed character) that can easily be repaired.
Excess data are those that fall outside the analysis volume defined by latitude,
longitude, and altitude limits. Redundant data are those reports that exactly duplicate
others. Excess and redundant data are eliminated by a procedure delineated
here. Inadequate
data are single en route reports, orphans if you will, that are apparently unrelated
to any others. Since you can't make a line with just one point, inadequate data are
eliminated.
The qualified data are sliced into pieces by a procedure delineated
here, and a series of
files are produced with descriptions provided
here.
Validation takes place after processed data files have been created and saved. No changes
are made to those processed data files on the basis of the validation procedure, delineated
here. There is only a
monthly validation summary file with a name in the form tmoYYYYMM.val that has
the following
format. The only way to
tell which hours of data are considered valid is to examine the columns identifying valid
flight segments in the appropriate tmoYYYYMM.val file. Since estimated data validity
is intended to help identify data useful for further analysis, validation
results are summarized below.
The following links give a limited statistical summary of the data available on the
FTP server.
- The
hourly summary table has a separate row for each month and a number of columns for
various monthly statistics. The third column shows the total hours in the month. The
next shows the number of hours in which there are no processed data. The next shows the
number of hours for which there are some processed data in the data files. The following
columns show the results of the validation procedure. The sixth column shows
the number of hours which are considered incomplete (or partial) and therefore not valid as
complete or representative of a full hour. The next shows the number of hours which are
considered complete (or full) hours considered to be valid or useful for further analysis.
The next shows the valid hours as a percentage of the total hours in the month. The last
two columns show the percentage of data (non-empty) hours that were identified as invalid
(rejected) or valid (remaining).
- The
segment summary table has a separate row for each month and a number of columns for
various monthly statistics. The third column shows the total number of data segments in the
month. The next shows the percentage of segments that were rejected because they were in
hours considered incomplete (or partial) and therefore not valid as complete or
representative of a full hour. The next shows the percentage of segments remaining in hours
considered complete (or full) and therefore valid or useful for further analysis. The next
shows the number of segments remaining. The next four columns give the same information
about the length of all data segments in the month.
- The
daily summary table has a separate row for each month and a number of columns for
various monthly statistics. The third column shows the total days in the month. The
next shows the number of days in which there are no valid hourly data. The next shows the
number of days for which there are some valid hourly data. The next shows the
number of days for which there is a full day of valid hourly data.
The next shows the full days of valid data as a percentage of the total hours in the month.
The next two columns show the average daily track length and the average daily number of
flights. These numbers were only computed for months in which there was at least one of
each day of the week (i.e. at least one Monday, at least one Tuesday, . . .) with a full
24 hours of valid data.
NOTE: The is no way to determine if the data set is complete or incomplete from
information provided only by the data. Data were identified as 'not valid' or 'valid' by a
statistical procedure based on the assumption that weeks of data consisting of sequential
collections of hourly data from the entire analysis volume could be collected in stationary
ensembles that exhibit reasonable behavior. It might be better to think of the process as
identifying data as 'probably invalid' or 'probably not invalid'.
If you have questions about this site, you may send email to Don Garber at
donald.p.garber@nasa.gov.
This page was last modified on 29 September 2004.