Skip to content

data preprocessing for uber-movement travel times data

License

Notifications You must be signed in to change notification settings

siljuovix/uber-movement

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

Data extraction pipeline to add geographic features from OpenStreetMap to the trips information from Uber movement Uber movement . Possible applications for this workflow are ETA prediction models or traffic and seasonality analysis.

Requirements

  • Hourly aggregate csv file from Uber movement download page. This file contains the aggregated data of the trips that start and end within the same cadastral zone of a city. For downloads, users must register for free beforehand.
  • Cadastral distribution of the city in hand, also downloadable from Uber movement. This file is a json file with the names of the zones of the cadastral zones in the hourly aggregate csv and the coordinates of the polygon that delimits the zones.
  • Distance between the cadastral zones can be queried using OSRM, an routing engine based on OpenStreetMap (OSM) maps. The OSRM docker image is probably the simplest way to get the routing engine up and accepting requests.
  • OSM extracts for the places to be queried. Geofabrik offers different map segmentation options for the whole planet.

Both the hourly aggregate csv and the cadastral information json are expected in a folder named data/ in the root folder.

In addition, run inside the dev environment:

pip install -r requirements.txt

Example usage

The code in the pipeline was developed for the city of Madrid. The files in Uber movement for other cities might be slightly different, a thing to check before running the extraction.

from uber_movement import data_extraction

madrid_trips = data_extraction.extraction_pipeline()

In the example, madrid trips will be a pandas dataframe with the following columns:

  • hod (hour of the day)

  • mean_travel_time

  • latitude_dest (the latitude of the center of the polygon of the destination zone)

  • longitude_dest (the longitude of the center of the polygon of the zone)

  • postcode_dest (postal code of the destination zone)

  • latitude_source

  • longitude_source

  • postcode_source

  • distance (fastest driving route distance between source and destination according to OSRM)

About

data preprocessing for uber-movement travel times data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages