Sagemaker is a Machine Learning Pipeline by AWS.

## Data Collection & Integration

- Predition: Lable / Target.
- Good Data: Good Data will contain a signal about the phenomenon you’re trying to model.
- Observation: A single data point, made up of the label and the features.
- Dataset: Stacked up with a bunch of observations.
- Data points : Features – Ratio: You need at least 10 times the number of data points as features. So if you’ve got five features, you should have 50 data points minimum in your training data.

## Data Preparation

## Data Visualization & Analysis

- Histograms: Histograms are effective visualizations for spotting outliers in data.
- Imputation: Imputation is going to make a best guess as to what the value actually should be. In a regression problem, you can deal with outliers or even missing data by just assigning a new value using imputation.
- Scatter Plots: Visualize the relationship between the features and the labels. It’s important to understand if there’s a strong correlation between features and labels.

## Feature Selection & Engineering

## Model Training

- Randomize Data: Randomize it during your split to help your model avoid bias. This is especially true with structured data, if your data coming in a specific order.
- Underfitting: Low variance and high bias. These models are overly simple and they can’t really see the underlying patterns in the data.
- Overfitting: High-variance and low bias. These models are overly complex, and while they can detect patterns in the training data, they’re not accurate outside of the training data.
- Parameter:
- Internal of the model and it’s something the model can learn or estimate purely off of the data.
- An example of a parameter could be the weight of an ANN or the coefficients in linear regression.
- The model has to have parameters to make predictions, and most often, these aren’t set by humans.

- Hyperparameters: Set by humans, and typically, you can’t really know the best value of the hyperparameter, but you can trial and error and use that to get there.
- It could be the learning rate for training a neural network.

- Hyperparameter Tuning: One technique that can be used to combat underfitting and overfitting.
- Types of Hyperparameter Tuning:
- Loss function
- Regularization
- Learning Parameters

## Model Evaluation

## Prediction