Create features and targets
We almost have features and targets that are machine-learning ready -- we have features from current price changes (5d_close_pct
) and indicators (moving averages and RSI), and we created targets of future price changes (5d_close_future_pct
). Now we need to break these up into separate numpy arrays so we can feed them into machine learning algorithms.
Our indicators also cause us to have missing values at the beginning of the DataFrame due to the calculations. We could backfill this data, fill it with a single value, or drop the rows. Dropping the rows is a good choice, so our machine learning algorithms aren't confused by any sort of backfilled or 0-filled data. Pandas has a .dropna()
function which we will use to drop any rows with missing values.
Cet exercice fait partie du cours
Machine Learning for Finance in Python
Instructions
- Drop the missing values from
lng_df
with.dropna()
from pandas. - Create a variable containing our targets, which are the
'5d_close_future_pct'
values. - Create a DataFrame containing both targets (
5d_close_future_pct
) and features (contained in the existing listfeature_names
) so we can check the correlations.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Drop all na values
lng_df = lng_df.____
# Create features and targets
# use feature_names for features; '5d_close_future_pct' for targets
features = lng_df[feature_names]
targets = lng_df[____]
# Create DataFrame from target column and feature columns
feature_and_target_cols = ['5d_close_future_pct'] + ____
feat_targ_df = lng_df[feature_and_target_cols]
# Calculate correlation matrix
corr = feat_targ_df.corr()
print(corr)