Charles is a student at the University of San Francisco, majoring in data science and minoring in design. As a keen person in the field of both photography and computer science, he always enjoys engaging in various activities, taking challenges, and meeting new people. His organization and communicational skills also allow him to collaborate with other people in the group to achieve goals and objectives. He seldom hesitates to share the knowledge of his own with people in need, or with the public through volunteering services.
The dataset used in this project is about the landing statistics of San Francisco International Airport (SFO), published by the SF Airport Commission through DataSF. It has 18.7K rows and 14 columns, with each row being of a landing record of a specific model of aircraft during a specific activity period between July 2005 and December 2016. The reason for me to choose this dataset is because I am always fascinated with airplanes when I was little. I decided that, by looking closely at the data released by SFO, I get to know more about the air traffic situation around the San Francisco Bay Area. Thus, the visualizations derived from this dataset should satisfy my curiosity. Specifically, I would like to know the aircraft that are commonly found in SFO, as well as the time period in which the most air traffic takes place.
The dataset has the following columns:
For this project, only Activity Period, Published Airline, GEO Summary, Aircraft Manufacturer, Aircraft Model and Landing Count columns are used.
The original dataset can be accessed at https://data.sfgov.org/Transportation/Air-Traffic-Landings-Statistics/fpux-q53t. It is available under the Public Domain Dedication and License.
The reason for me to choose this dataset is because I am always fascinated with airplanes when I was little. I decided that, by looking closely at the data released by SFO, I get to know more about the air traffic situation around the San Francisco Bay Area. Thus, the visualizations derived from this dataset should satisfy my curiosity. Specifically, I would like to know the aircraft that are commonly found in SFO, as well as the time period in which the most air traffic takes place.
A number of steps were taken to clean up the data from the original dataset.
Trifacta Wrangler was used namely to transform the Activity Period column from a YYYYMM
string format to a YYYY-MM-DD
date format, as well as to append B
to the model number of all Boeing aircraft for consistency. Meanwhile, all Belair-operated flights were incorrectly marked with an IATA code of BBB
, which were corrected with the code 4T
. Quotation marks in airline names are also removed in the process. A full recipe of the operation is available here.
Further data processing was performed due to a change in the format of the Aircraft Model that took place after the June 2013 activity period, in which the ICAO aircraft type designators (which typically concatenate the aircraft model and version) were used to document the aircraft model number of each landing, as oppose to landings before June 2013 that have a separate model and version number. For example, a landing that involves a Boeing 737-800 prior to June 2013 was recorded as B737
in the Aircraft Model column and 800
in the Aircraft Version column, whereas landings after that were recorded as B738
in the Aircraft Model column and -
in the Aircraft Version column. As a result, it is not possible to compare landings before and after the June 2013 activity period. Therefore, a manual cleanup was performed on landings took place after the June 2013 activity period, which involves a lookup of the ICAO aircraft type designator of each landing from this table published by ICAO and a modification to the associated Aircraft Model and Aircraft Version columns.
Finally, all United Airlines flights that have United Airlines - Pre 07/01/2013
in the Operating Airline and Published Airline columns are merged with the rest of the United Airlines flights for easy comparison.