1. Learn
  2. /
  3. Courses
  4. /
  5. Ensemble Methods in Python

Exercise

Predicting movie revenue

Let's begin the challenge of predicting movie revenue by building a simple linear regression to estimate the log-revenue of movies based on the 'budget' feature. The metric you will use here is the RMSE (root mean squared error). To calculate this using scikit-learn, you can use the mean_squared_error() function from the sklearn.metrics module and then take its square root using numpy.

The movies dataset has been loaded for you and split into train and test sets. Additionally, the missing values have been replaced with zeros. We also standardized the input feature by using StandardScaler(). Check out DataCamp's courses on cleaning data and feature engineering if you want to learn more about preprocessing for machine learning.

Instructions

100 XP
  • Instantiate the default LinearRegression model.
  • Calculate the predictions on the test set.
  • Calculate the RMSE. The mean_squared_error() function requires two arguments: y_test, followed by the predictions.