[필기] Hands on machine learning - chapter 1.
the machine learning landscape
<machine learning 사용 예시>
- OCR optical character recognition
- spam filter
- recommendation
- voice search
<main regions>
supervised learning
unsupervised learning
online/batch learning
instance-based/model-based learning
workflow of typical ML project
<what is Machine Learning>
the science of programming computers so they can learn from data
training set
training instance = training sample
: training data, acurracy, training task로 구성된다.
<why you may want to use it>
a filter based on machine learning techniques automatically learns which words and phrases are good predictors of spam by detecting unusually frequent pattens of words in spam
data mining
- digging into large amounts of data to discover pattens that were not immediately apparent
<types of machine learning systems>
1) supervised, unsupervised, semisupervised, reinforcement learning
** supervised
the training data includes labels
labels : the desired solutions
ex) classification, regression(predict a target numberic value), linear regression, logistic regression, support vector machines, decision trees and random forests, neural networks
** unsupervised learning
learns without a label
ex) clustering, visualization and dimensionality reduction(feature extraction, anomaly detection), association rule learning
** semisupervised learning
one lable per person
** reinforcement learning
agent : observe the environment, select and perform actions, get rewards/penalties
policy : and learn what the best strategy is
ex) alphago
2) online learning, batch learning
** batch learning
**online learning
feed the data sequentially by mini-batches
incremental learning
learning rate - how fast should they adapt to changing data?
challenge - bad data > performance gradually declines
3) instance-based, model-based learning
Q. how ML generalize?
true goal is to perform well on new instances
** instance-based learing
measure of similarity
** model-based learning
build a model
use the model to make predictions
model selection - e.g) linear model
define parameter values
specify a performance measure - e.g) linear regression algorithm
<main challenges of machine learning>
** bad algorithm < bad data
** Insufficient quantity of training data
** nonrepresentative training data
cf) sampling bias - nonresponse bias
** poor-quality data
** irrelevant features
cf) feature engineering
feature selection : selecting the most useful features to train
feature extraction : combining existing features to produce a more useful one
** overfitting the training data
overfitting : overgeneralizing
regularization
** underfitting the training data
** stepping back
<Testing and validating>
<exercises>