Interpreting coefficients
Remember that origin airport, org
, has eight possible values (ORD, SFO, JFK, LGA, SMF, SJC, TUS and OGG) which have been one-hot encoded to seven dummy variables in org_dummy
.
The values for km
and org_dummy
have been assembled into features
, which has eight columns with sparse representation. Column indices in features
are as follows:
- 0 —
km
- 1 —
ORD
- 2 —
SFO
- 3 —
JFK
- 4 —
LGA
- 5 —
SMF
- 6 —
SJC
and - 7 —
TUS
.
Note that OGG
does not appear in this list because it is the reference level for the origin airport category.
An instance of LinearRegression
is available in regression
. In this exercise you'll be using the intercept
and coefficients
attributes to interpret the model.
The coefficients
attribute is a list, where the first element indicates how flight duration changes with flight distance.
This exercise is part of the course
Machine Learning with PySpark
Exercise instructions
- Find the average speed in km per hour. This will be different to the value that you got earlier because your model is now more sophisticated.
- What's the average time on the ground at OGG?
- What's the average time on the ground at JFK?
- What's the average time on the ground at LGA?
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Average speed in km per hour
avg_speed_hour = ____
print(avg_speed_hour)
# Average minutes on ground at OGG
inter = regression.____
print(inter)
# Average minutes on ground at JFK
avg_ground_jfk = ____ + regression.____[____]
print(avg_ground_jfk)
# Average minutes on ground at LGA
avg_ground_lga = ____ + regression.____[____]
print(avg_ground_lga)