About the Author

Charles Chun Hei Siu

Charles is a student at the University of San Francisco, majoring in data science and minoring in design. As a keen person in the field of both photography and computer science, he always enjoys engaging in various activities, taking challenges, and meeting new people. His organization and communicational skills also allow him to collaborate with other people in the group to achieve goals and objectives. He seldom hesitates to share the knowledge of his own with people in need, or with the public through volunteering services.

About the Dataset

Introduction

The dataset used in this project is about the landing statistics of San Francisco International Airport (SFO), published by the SF Airport Commission through DataSF. It has 18.7K rows and 14 columns, with each row being of a landing record of a specific model of aircraft during a specific activity period between July 2005 and December 2016. The reason for me to choose this dataset is because I am always fascinated with airplanes when I was little. I decided that, by looking closely at the data released by SFO, I get to know more about the air traffic situation around the San Francisco Bay Area. Thus, the visualizations derived from this dataset should satisfy my curiosity. Specifically, I would like to know the aircraft that are commonly found in SFO, as well as the time period in which the most air traffic takes place.

The dataset has the following columns:

  • Activity Period contains the data period of the flight landed.
  • Operating Airline and Operating Airline IATA Code contain the airline that operates the flight landed and its IATA code.
  • Published Airline and Published Airline IATA Code contain the airline being published of the flight landed and its IATA code.
  • GEO Summary indicates whether the flight landed is international or domestic.
  • GEO Region indicates which region the landed flight is originated from.
  • Landing Aircraft Type indicates whether the flight landed is a passenger, freighter or combi aircraft.
  • Aircraft Body Type contains the body type info of the aircraft landed.
  • Aircraft Manufacturer, Aircraft Model and Aircraft Version contain the info of the aircraft landed.
  • Landing Count contains the number of times the same type of aircraft has landed.
  • Total Landed Weight contains the sum of weight of the same type of aircraft landed.

For this project, only Activity Period, Published Airline, GEO Summary, Aircraft Manufacturer, Aircraft Model and Landing Count columns are used.

The original dataset can be accessed at https://data.sfgov.org/Transportation/Air-Traffic-Landings-Statistics/fpux-q53t. It is available under the Public Domain Dedication and License.

Motivation

The reason for me to choose this dataset is because I am always fascinated with airplanes when I was little. I decided that, by looking closely at the data released by SFO, I get to know more about the air traffic situation around the San Francisco Bay Area. Thus, the visualizations derived from this dataset should satisfy my curiosity. Specifically, I would like to know the aircraft that are commonly found in SFO, as well as the time period in which the most air traffic takes place.

Data Processing

A number of steps were taken to clean up the data from the original dataset.

Trifacta Wrangler was used namely to transform the Activity Period column from a YYYYMM string format to a YYYY-MM-DD date format, as well as to append B to the model number of all Boeing aircraft for consistency. Meanwhile, all Belair-operated flights were incorrectly marked with an IATA code of BBB, which were corrected with the code 4T. Quotation marks in airline names are also removed in the process. A full recipe of the operation is available here.

Further data processing was performed due to a change in the format of the Aircraft Model that took place after the June 2013 activity period, in which the ICAO aircraft type designators (which typically concatenate the aircraft model and version) were used to document the aircraft model number of each landing, as oppose to landings before June 2013 that have a separate model and version number. For example, a landing that involves a Boeing 737-800 prior to June 2013 was recorded as B737 in the Aircraft Model column and 800 in the Aircraft Version column, whereas landings after that were recorded as B738 in the Aircraft Model column and - in the Aircraft Version column. As a result, it is not possible to compare landings before and after the June 2013 activity period. Therefore, a manual cleanup was performed on landings took place after the June 2013 activity period, which involves a lookup of the ICAO aircraft type designator of each landing from this table published by ICAO and a modification to the associated Aircraft Model and Aircraft Version columns.

Finally, all United Airlines flights that have United Airlines - Pre 07/01/2013 in the Operating Airline and Published Airline columns are merged with the rest of the United Airlines flights for easy comparison.