Data Science/AI

[필기] Hands on machine learning - chapter 1.

토마토. 2021. 7. 13. 13:20

the machine learning landscape

<machine learning 사용 예시>

- OCR optical character recognition

- spam filter

- recommendation

- voice search

<main regions>

supervised learning

unsupervised learning

online/batch learning

instance-based/model-based learning

workflow of typical ML project

<what is Machine Learning>

the science of programming computers so they can learn from data

training set

training instance = training sample

: training data, acurracy, training task로 구성된다. 

<why you may want to use it>

a filter based on machine learning techniques automatically learns which words and phrases are good predictors of spam by detecting unusually frequent pattens of words in spam

data mining

- digging into large amounts of data to discover pattens that were not immediately apparent

<types of machine learning systems>

1) supervised, unsupervised, semisupervised, reinforcement learning

** supervised

the training data includes labels

labels : the desired solutions

ex) classification, regression(predict a target numberic value), linear regression, logistic regression, support vector machines, decision trees and random forests, neural networks

** unsupervised learning

learns without a label

ex) clustering, visualization and dimensionality reduction(feature extraction, anomaly detection), association rule learning

** semisupervised learning

one lable per person

** reinforcement learning

agent : observe the environment, select and perform actions, get rewards/penalties

policy : and learn what the best strategy is

ex) alphago

2) online learning, batch learning

** batch learning

 

**online learning

feed the data sequentially by mini-batches

incremental learning

learning rate - how fast should they adapt to changing data? 

challenge - bad data > performance gradually declines

3) instance-based, model-based learning

Q. how ML generalize?

true goal is to perform well on new instances

** instance-based learing

measure of similarity

** model-based learning

build a model

use the model to make predictions

model selection - e.g) linear model

define parameter values

specify a performance measure - e.g) linear regression algorithm

<main challenges of machine learning>

** bad algorithm < bad data

** Insufficient quantity of training data

** nonrepresentative training data

cf) sampling bias - nonresponse bias

** poor-quality data

** irrelevant features

cf) feature engineering

feature selection : selecting the most useful features to train

feature extraction : combining existing features to produce a more useful one

** overfitting the training data

overfitting : overgeneralizing

regularization

** underfitting the training data

** stepping back

<Testing and validating>

 

<exercises>