New York Taxi Analysis

Team member(s): Naohiro Miyaguchi
Modified by Naohiro Miyaguchi on April 11, 2023

As the Submission about Data Digital Tools & Big Data II, we analyzed the new york taxi data.

We were given the data about Taxi Infomation in New York, in 2016.
In this data, there is information,

the test data
- id
- passenger_count
- pickup_longitude
- pickup_latitude
- dropoff_longitude
- dropoff_latitude
- trip_duration
- pick_month
- pick_day
- pick_hour
- drop_hour
- trip_distance

from the data, I made new information,

new data columns
- pick_month
- pick_day
- pick_hour
- drop_hour
- trip_distance

Erased some columns, and fix the data.

Reading the data

Data matrix. I compared the data from each columns.

the data pick up taxi per day.

it decreases 31th.

the data pick up taxi per hour.

peple use many at 18-21.

linear regression

First, I tried simply linear regression.

I set “drop hour” as X axis and “trip duration” as Y.
R2 = -1.18832232287591

I set “trip distance” as X axis and “trip duration” as Y.
R2 = 0.02318791130349851

I can’t find good score by linear regrassion.I can’t find good score by linear regrassion, then, I tried to make k-nearest neighbors algorithm.

K_nearest neighbors.

The K-NN algorithm works by comparing an input data point to the K closest data points in the training set, where K is a positive integer specified by the user. The algorithm then predicts the class (in the case of classification) or the value (in the case of regression) of the input data point based on the classes or values of the K nearest neighbors.

I set “Trip_distance”, “Pick up hour” and “Trip_disitance” as X axis, and set “passenger count” as Y axis.

knn.score = 0.7202628926879543

*still continure writing.

Tags: Big Data, Python

New York Taxi Analysis is a project of IAAC, Institute for Advanced Architecture of Catalonia developed in the Master in City & Technology 01 - 2022-2023 by the student(s) Naohiro Miyaguchi during the course MaCT01 2022/23 Digital Tools & Big Data II with Andre Resende.