Get startedGet started for free

Predicting movie revenue

Let's begin the challenge of predicting movie revenue by building a simple linear regression to estimate the log-revenue of movies based on the 'budget' feature. The metric you will use here is the RMSE (root mean squared error). To calculate this using scikit-learn, you can use the mean_squared_error() function from the sklearn.metrics module and then take its square root using numpy.

The movies dataset has been loaded for you and split into train and test sets. Additionally, the missing values have been replaced with zeros. We also standardized the input feature by using StandardScaler(). Check out DataCamp's courses on cleaning data and feature engineering if you want to learn more about preprocessing for machine learning.

This exercise is part of the course

Ensemble Methods in Python

View Course

Exercise instructions

  • Instantiate the default LinearRegression model.
  • Calculate the predictions on the test set.
  • Calculate the RMSE. The mean_squared_error() function requires two arguments: y_test, followed by the predictions.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Build and fit linear regression model
reg_lm = ____
reg_lm.fit(X_train, y_train)

# Calculate the predictions on the test set
pred = ____

# Evaluate the performance using the RMSE
rmse = np.sqrt(____)
print('RMSE: {:.3f}'.format(rmse))
Edit and Run Code