29 Oct After reading the chapter by Capri (2015) on manual data collection.? Answer the following questions: What were the traditional methods of data collection in the transit system
After reading the chapter by Capri (2015) on manual data collection. Answer the following questions:
- What were the traditional methods of data collection in the transit system?
- Why are the traditional methods insufficient in satisfying the requirement of data collection?
- Give a synopsis of the case study and your thoughts regarding the requirements of the optimization and performance measurement requirements and the impact to expensive and labor-intensive nature.
In an APA7 format answer all questions above. There should be headings to each of the questions above as well. Ensure there are at least two-peer reviewed sources to support your work. The paper should be at least 2 pages of content (this does not include the cover page or reference page).
In: Data Mining ISBN: 978-1-63463-738-1
Editor: Harold L. Capri © 2015 Nova Science Publishers, Inc.
Chapter 1
TRANSIT PASSENGER ORIGIN INFERENCE
USING SMART CARD DATA AND GPS DATA
Xiaolei Ma1, Ph.D. and Yinhai Wang 2 , Ph.D.
1 School of Transportation Science and Engineering,
Beihang University, Beijing, China 2 Department of Civil and Environmental Engineering,
University of Washington, Seattle, WA, US
ABSTRACT
To improve customer satisfaction and reduce operation costs, transit
authorities have been striving to monitor their transit service quality and
identify the key factors to attract the transit riders. Traditional manual
data collection methods are unable to satisfy the transit system
optimization and performance measurement requirement due to their
expensive and labor-intensive nature. The recent advent of passive data
collection techniques (e.g., Automated Fare Collection and Automated
Vehicle Location) has shifted a data-poor environment to a data-rich
environment, and offered the opportunities for transit agencies to conduct
comprehensive transit system performance measures. Although it is
possible to collect highly valuable information from ubiquitous transit
data, data usability and accessibility are still difficult. Most Automatic
Fare Collection (AFC) systems are not designed for transit performance
monitoring, and additional passenger trip information cannot be directly
Email: [email protected]
C o p y r i g h t 2 0 1 4 . N o v a S c i e n c e P u b l i s h e r s , I n c .
A l l r i g h t s r e s e r v e d . M a y n o t b e r e p r o d u c e d i n a n y f o r m w i t h o u t p e r m i s s i o n f r o m t h e p u b l i s h e r , e x c e p t f a i r u s e s p e r m i t t e d u n d e r U . S . o r a p p l i c a b l e c o p y r i g h t l a w .
EBSCO Publishing : eBook Collection (EBSCOhost) – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS AN: 956104 ; Ma, Xiaolei, Capri, Harold L..; Data Mining: Principles, Applications and Emerging Challenges Account: s8501869.main.ehost
Xiaolei Ma and Yinhai Wang 2
retrieved. Interoperating and mining heterogeneous datasets would
enhance both the depth and breadth of transit-related studies. This study
proposed a series of data mining algorithms to extract individual transit
rider’s origin using transit smart card and GPS data. The primary data
source of this study comes from the AFC system in Beijing, where a
passenger’s boarding stop (origin) and alighting stop (destination) on a
flat-rate bus are not recorded on the check-in and check-out scan. The bus
arrival time at each stop can be inferred from GPS data, and individual
passenger’s boarding stop is then estimated by fusing the identified bus
arrival time with smart card data. In addition, a Markov chain based
Bayesian decision tree algorithm is proposed to mine the passengers’
origin information when GPS data are absent. Both passenger origin
mining algorithms are validated based on either on-board transit survey
data or personal GPS logger data. The results demonstrates the
effectiveness and efficiency of the proposed algorithms on extracting
passenger origin information. The estimated passenger origin data are
highly valuable for transit system planning and route optimization.
Keywords: Automated fare collection system, transit GPS, passenger origin
inference, Bayesian decision tree, Markov chain
INTRODUCTION
According to the Census of 2000 in the United States, approximately 76%
people chose privately owned vehicles to commute to work in 2000 (ICF
consulting, 2003). Recent studies conducted by the 2009 American
Community Survey indicate 79.5% of home-based workers drive alone for
commuting (McKenzie and Rapino, 2009). Many developing countries, e.g.,
China, also rely on privately owned vehicles to commute. For example, more
than 34% of the Beijing residents chose cars as their primary travel mode
while only 28.2% chose transit in 2010 (Beijing Transportation Research
Center, 2012). Public transit has been considered as an effective
countermeasure to reduce congestion, air pollution, and energy consumption
(Federal Highway Administration, 2002). According to 2005 urban mobility
report conducted by Texas Transportation Institute (2005), travel delay in
2003 would increase by 27 percent without public transit, especially in those
most congested metropolitan cites of U.S., public transit services have saved
more than 1.1 billion hours of travel time. Moreover, public transit can help
enhance business, reduce city sprawl through the transit oriented development
(TDO). During certain emergency scenarios, public transit can even act as a
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Transit Passenger Origin Inference Using Smart Card Data … 3
safe and efficient transportation mode for evacuation (Federal Highway
Administration, 2002). Based on the aforementioned reasons, it is of critical
importance to improve the efficiency of public transit system, and promote
more roadway users to utilize public transit. To fulfill these objectives, transit
agencies need to understand the areas where improvements can be further
made, and whether community goals are being met, etc. A well-developed
performance measure system will facilitate decision making for transit
agencies. Transit agencies can evaluate the transit ridership trends with fare
policy changes and identify where and when better transit service should be
provided. In addition, transit agencies are also required to summarize transit
performance statistics for reporting to either the National Transit Database
(Kittelson & Associates et al., 2003), or the general public who are interested
knowing how well transit service is being provided. Nevertheless, developing
a set of structured performance measures often requires a large amount of data
and the corresponding domain knowledge to process and analyze these data.
These obstacles create challenges for transit agencies to spend time and effort
undertaking. Traditionally, transit agencies heavily rely on manual data
collection methods to gather transit operation and planning data (Ma et al.,
2012). However, traditional data collection methods (e.g., travel diary, survey,
etc.) are fairly costly and difficult to implement at a multiday level due to their
low response rate and accuracy. Transit agencies have spent tremendous
manpower and resource undertaking manual data collections, and consumed a
significant amount of energy and time to post-process the raw data. With
advances in information technologies in intelligent transportation systems
(ITS), the availability of public transit data has been increasing in the past
decades, which has gradually shifted public transit system into a data-rich
paradigm. Automatic Fare Collection (AFC) system and Automatic Vehicle
Track (AVL) system are two common passive data collection methods. AFC
system, also known as Smart Card system, records and processes the fare
related information using either contactless or contact card to complete the
financial transaction (Chu, 2010). There exist two typical types of AFC
systems: entry-only AFC system and distance-based AFC system. In the entry-
only AFC system, passengers are only required to swipe their smart cards over
the card reader during boarding, while passengers need to check in and check
out during both their boarding and alighting procedures for the distance-based
AFC system. AVL and AFC technologies hold substantial promise for transit
performance analysis and management at a relative low cost. However,
historically, both AVL and AFC data have not been used to their full
potentials. Many AVL and AFC systems do not archive data in a readily
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Xiaolei Ma and Yinhai Wang 4
utilized manner (Furth, 2006). AFC system is initially designed to reduce
workloads of tedious manual fare collections, not for transit operation and
planning purposes, and thereby, certain critical information, such as specific
spatial location for each transaction, may not be directly captured. AVL
system tracks transit vehicles’ geospatial locations by Global Positioning
System (GPS) at either a constant or varying time interval. The accuracy of
GPS occasionally suffers from signal loss due to tall building obstructions in
the urban area (Ma et al., 2011). Both of the AFC system and AVL system
have their inherent drawbacks in monitoring transit system performance, and
require analytical approaches to eliminate the erroneous data, remedy the
missing values, and mine the unseen and indirect information.
The remainder of this paper is organized as follows: transit smart card data
and GPS data are described in the section 2. Based on these data sets, a data
fusion method is initially proposed to integrate with roadway geospatial data
to estimate transit vehicles arrival information. And then, a Bayesian decision
tree algorithm is presented to estimate each passenger’s boarding stop when
GPS data are unavailable. Considering the expensive computational burden of
decision tree algorithms, Markov-chain property is taken into account to
reduce the algorithm complexity. On-board survey and GPS data from the
Beijing transit system are used to test and verify the proposed algorithms.
Conclusion and future research efforts are summarized at the end of this paper.
RESEARCH BACKGROUND
Data from AFC system and AVL system are the two primary sources in
this study. Beijing Transit Incorporated began to issue smart cards in May 10,
2006. The smart card can be used in both the Beijing bus and subway systems.
Due to discounted fares (up to 60% off) provided by the smart card, more than
90% of the transit riders pay for their transit trips with their smart cards in
2010 (Beijing Transportation Research Center, 2010). Two types of AFC
systems exist in Beijing transit: flat fare and distance-based fare. Transit riders
pay at a fixed rate for those flat fare buses when entering by tapping their
smart cards on the card reader. Thus, only check-in scans are necessary. For
the distance-based AFC system, transit riders need to swipe their smart cards
during both check-in and check-out processes. Transit riders need to hold their
smart cards near the card reader device to complete transactions when entering
or exiting buses. Smart card can be used in Beijing subway system as well,
where passengers need to tap their smart card on top of fare gates during
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Transit Passenger Origin Inference Using Smart Card Data … 5
entering and existing subway stations. Both boarding and alighting
information (time and location) are recorded by the fare gates. Although transit
smart card exhibits its superiority on its convenience and efficiency, there are
still the following issues to prevent transit agencies fully taking advantages of
smart card for operational purposes:
Passenger boarding and alighting information missing
Due to a design deficiency in the smart card scan system, the AFC system
on flat fare buses does not save any boarding location information, whereas
the AFC system stores boarding and alighting location, except for boarding
time information on distance-based fare buses. Key information stored in the
database includes smart card ID, route number, driver ID, transaction time,
remaining balance, transaction amount, boarding stop (only available for
distance-based fare buses), and alighting stop (only available for distance-
based fare buses).
Massive data sets
More than 16 million smart card transactions data are generated per day.
Among these transactions, 52% are from flat-rate bus riders. These smart card
transactions are scattered in a large-scale transit network with 52386 links and
43432 nodes as presented in figure 1:
Figure 1. Beijing Transit GIS Network.
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Xiaolei Ma and Yinhai Wang 6
Limited external data with poor quality
Only approximate 50% of transit vehicles in Beijing are equipped with
GPS devices for tracking. GPS data are periodically sent to the central server
at a pre-determined interval of 30 seconds. However, the collected GPS data
suffer from two major data quality issues: (1) vehicle direction information is
missing; (2) GPS points fluctuation (Lou, et al., 2009). Map matching
algorithms are needed to align the inaccurate GPS spatial records onto the road
network. In addition, most of transit routes are not designed to have fixed
schedules because of high ridership demands, and only certain routes with a
long distance or headway follow schedules at each stop (Chen, 2009). The
above characteristics of the Beijing AFC and AVL systems create more
challenges to process and mine useful information.
It is noteworthy that the AFC system used in Beijing is not a unique case.
Most cities in China also employ the similar AFC system where passengers’
origin information is absent, such as Chongqing City (Gao and Wu, 2011),
Nanning City (Chen, 2009), Kunming City (Zhou et al., 2007). In other
developing countries, such as Brazil, AFC system does not record any
boarding location information as well (Farzin, 2008). Therefore, a solution for
passenger boarding and alighting information extraction is beneficial to those
transit agencies with imperfect SC data internationally.
TRANSIT PASSENGER ORIGIN INFERENCE
Because smart card readers in the flat-rate buses do not record passengers’
boarding stops, it is desired to infer individual boarding location using smart
card transaction data. In this section, two primary approaches are presented to
achieve this goal. Approximately 50% transit vehicles are equipped with GPS
devices in Beijing entry-only AFC system. Therefore, a data fusion method
with GPS data, smart card data and GIS data is firstly developed to estimate
each bus’s arrival time at each stop and infer individual passenger’s boarding
stop. And then, for those buses without GIS devices, a Bayesian decision tree
algorithm is proposed to utilize smart card transaction time and apply
Bayesian inference theory to depict the likelihood of each possible boarding
stop. In order to expand the usability of proposed Bayesian decision tree
algorithm in large-scale datasets, Markov chain optimization is used to reduce
the algorithm’s computational complexity. Both two transit passenger origin
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Transit Passenger Origin Inference Using Smart Card Data … 7
inference algorithms are validated using external data (e.g., on-board survey
data and GPS data).
Passenger Origin Inference with GPS Data
In the first step, a GPS-based arrival information inference algorithm is
presented to estimate the arrival time for each transit stop, and then, the
inferred stop-level arrival time will be matched with the timestamp recorded in
AFC system. The temporally closest smart card transaction record will be
assigned with each known stop ID. The logic flow chart is demonstrated in
Figure 2. The major data processing procedure will be detailed below.
Figure 2. Flow Chart for Passenger Origin Inference with GPS Data.
Bus Arrival Time Extraction
Three primary data sources are involved in the passenger information
extraction: vehicle GPS data; transit stop spatial location data; and flat-fare-
based smart card transaction data. A transit GIS network contains the
geospatial location of each stop for any transit routes. The GPS device
mounted in the bus can record each bus’s location and timestamp every 30
seconds, but the data quality of collected GPS records is not satisfying: No
directional information is recorded in Beijing AVL system; GPS points are off
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Xiaolei Ma and Yinhai Wang 8
the roadway network due to the satellite signal fluctuation. Data preprocessing
is required prior to bus arrival time estimation. A program is written to parse
and import raw GPS data into a database in an automatic manner. Key fields
of a GPS record are shown in Table 1.
Table 1. Examples of GPS raw data
Vehicle ID Date time Latitude Longitude Spot speed Route ID
00034603 2010-04-07
09:28:57 39.73875 116.1355 9.07 00022
00034603 2010-04-07
09:29:27 39.73710 116.1358 14.26 00022
00034603 2010-04-07
09:29:58 39.73592 116.1357 19.63 00022
00034603 2010-04-07
09:30:28 39.73479 116.1357 0 00022
00034603 2010-04-07
09:30:58 39.73420 116.1357 3.52 00022
The first step is to estimate the bus arrival time for each stop by joining
GPS data and the stop-level geo-location data. A buffer area can be created
around each particular stop for a certain transit route using the GIS software.
Within this area, several GPS records are likely to be captured. However,
identifying the geospatially closest GPS record to each particular stop is
challenging since there could be a certain number of unknown directional GPS
records within the specified buffer zone. Thanks to the powerful geospatial
analysis function in GIS, each link (i.e., polyline) where each transit stop is
located is composed of both start node and end node, and this implies that the
directional information for each GPS record is able to infer by comparing the
link direction and the direction changes from two consecutive GPS records.
With the identified direction, the distance from each GPS point to this
particular stop can be calculated, and the timestamp with the minimum
distance will be regarded as the bus arrival time at the particular stop. Figure 2
visually demonstrates the above algorithm procedure. Inbound stop represents
the physical location of a particular transit stop, and this stop is snapped to a
transit link, whose direction is regulated by both a start node and an end node.
By comparing the driving direction from GPS records with the link direction,
the nearest GPS records to this particular stop can be identified, and marked by
the red five-pointed star on the map. The timestamp associated with this five-
pointed star will be considered as the arrival time for this inbound stop. The
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Transit Passenger Origin Inference Using Smart Card Data … 9
merit of the bus arrival time estimation algorithm lies in its efficiency. Rather
than searching all the GPS data to identify the traveling direction for each stop,
the proposed algorithm shrinks down the searching area, and filters out those
unlikely GPS data. The operation greatly alleviates the computational burden,
and is relatively easy to implement in the large-scale datasets, which is
particularly critical to process the tremendous amount of datasets within an
acceptable time period.
Figure 3. Boarding Time Estimation with GPS Data and Transit Stop Location Data.
Passenger Boarding Location Identification with Smart Card Data
For each smart card data transaction record, the boarding stop can be
estimated by matching the recorded timestamp and the identified bus arrival
time. As presented in Figure 4, for each smart card transaction record, the
transaction time is compared with the inferred bus arrival time at each stop.
This record will be assigned to a particular stop where the bus arrival time is
the most temporally closed with its transaction time. Since passengers begin to
embark the bus at a relative short time interval, this data fusion method is able
to capture almost all missing boarding stops.
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Xiaolei Ma and Yinhai Wang 10
Figure 4. Boarding Stop Identification with Bus Arrival Time.
In addition, because all the arrival time for all stops of a particular transit
route can be estimated, the average travel time between two adjacent stops can
be calculated as well. This speed statistics is not only critical for transit
performance measures, but also provides prior information for passenger
origin inference when GPS data are absent.
Validation
Compared with bus arrival time, door opening time can be more
accurately matched with smart card transaction time. This is because each bus
may not exactly stop at each transit stop for passenger boarding. The inferred
bus arrival time is subject to incur errors when it is used to match with smart
card data. To validate the accuracy of the proposed data fusion algorithm for
passenger origin inference, on-board transit survey was undertaken to collect
bus door opening time and arrival location for each stop of route 651 on
January, 13th, 2013. Hand holding GPS devices were used to track the
geospatial location of moving buses every 15 seconds. The survey duration
was from 8:00 AM to 1: 00 PM, and a total of 75 bus door opening time was
manually recorded. These bus door opening time records were then compared
with smart card transactions from 417 passengers, and these estimated stops
can be considered as the ground-truth data. By comparing the ground-truth
EBSCOhost – printed on 10/28/2022 9:45 AM via UNIVERSITY OF THE CUMBERLANDS. All use subject to https://www.ebsco.com/terms-of-use
Transit Passenger Origin Inference Using Smart Card Data … 11
data with the results from the proposed GPS data fusion approach, 406
boarding stops were accurately inferred and 11 boarding stops differ from the
ground-truth data within one-stop-error range. The proposed algorithm
demonstrates its accuracy as high as 97.4%.
Passenger Origin Inference with Smart Card Data
There are still a fair amount of buses without GPS devices, and thus the
bus arrival time at each transit stop is not directly measured. However, most
passengers scan their cards immediately when boarding and almost all
passengers should complete the check-in scan before arriving to the next stop.
This indicates that the first passenger’s transaction time can be safely assumed
as the group of passengers’ boarding time at the same stop. The challenge is
then to identify the bus location at the moment of the SC transaction so that we
can infer the onboard stop for that passenger. However, this is not easy
because the SC system for the flat-rate bus does not record bus location. We
know the time each transaction occurred on a bus of a particular route under
the operation of a particular driver, but nothing else is known from the SC
transaction database. Nonetheless, we are able to extract boarding volume
changes with time and passengers who made transfers. By mining these data
and combining transit route maps, we may be able to accomplish our goal.
Therefore, a two-step approach is designed for passenger origin data
extraction: smart card data clustering and transit stop recognition. To
implement the proposed algorithm in an efficient manner, a Markov Chain
based optimization approach is applied to reduce the computational
complexity.
Smart Card Data Clustering
Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.
Fill in all the assignment paper details that are required in the order form with the standard information being the page count, deadline, academic level and type of paper. It is advisable to have this information at hand so that you can quickly fill in the necessary information needed in the form for the essay writer to be immediately assigned to your writing project. Make payment for the custom essay order to enable us to assign a suitable writer to your order. Payments are made through Paypal on a secured billing page. Finally, sit back and relax.
About Wridemy
We are a professional paper writing website. If you have searched a question and bumped into our website just know you are in the right place to get help in your coursework. We offer HIGH QUALITY & PLAGIARISM FREE Papers.
How It Works
To make an Order you only need to click on “Order Now” and we will direct you to our Order Page. Fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.
Are there Discounts?
All new clients are eligible for 20% off in their first Order. Our payment method is safe and secure.