the machine learning landscape
<machine learning 사용 예시>
- OCR optical character recognition
- spam filter
- recommendation
- voice search
<main regions>
supervised learning
unsupervised learning
online/batch learning
instance-based/model-based learning
workflow of typical ML project
<what is Machine Learning>
the science of programming computers so they can learn from data
training set
training instance = training sample
: training data, acurracy, training task로 구성된다.
<why you may want to use it>
a filter based on machine learning techniques automatically learns which words and phrases are good predictors of spam by detecting unusually frequent pattens of words in spam
data mining
- digging into large amounts of data to discover pattens that were not immediately apparent
<types of machine learning systems>
1) supervised, unsupervised, semisupervised, reinforcement learning
** supervised
the training data includes labels
labels : the desired solutions
ex) classification, regression(predict a target numberic value), linear regression, logistic regression, support vector machines, decision trees and random forests, neural networks
** unsupervised learning
learns without a label
ex) clustering, visualization and dimensionality reduction(feature extraction, anomaly detection), association rule learning
** semisupervised learning
one lable per person
** reinforcement learning
agent : observe the environment, select and perform actions, get rewards/penalties
policy : and learn what the best strategy is
ex) alphago
2) online learning, batch learning
** batch learning
**online learning
feed the data sequentially by mini-batches
incremental learning
learning rate - how fast should they adapt to changing data?
challenge - bad data > performance gradually declines
3) instance-based, model-based learning
Q. how ML generalize?
true goal is to perform well on new instances
** instance-based learing
measure of similarity
** model-based learning
build a model
use the model to make predictions
model selection - e.g) linear model
define parameter values
specify a performance measure - e.g) linear regression algorithm
<main challenges of machine learning>
** bad algorithm < bad data
** Insufficient quantity of training data
** nonrepresentative training data
cf) sampling bias - nonresponse bias
** poor-quality data
** irrelevant features
cf) feature engineering
feature selection : selecting the most useful features to train
feature extraction : combining existing features to produce a more useful one
** overfitting the training data
overfitting : overgeneralizing
regularization
** underfitting the training data
** stepping back
<Testing and validating>
<exercises>
'Data Science > AI' 카테고리의 다른 글
BOAZ BIGDATA CONFERENCE 2021 / 빅데이터 연합동아리 14회 컨퍼런스 (0) | 2021.07.19 |
---|---|
[필기] 머신러닝 프로젝트 End-to-End 진행의 과정 (핸즈온 머신러닝 챕터 2) (0) | 2021.07.15 |
[정리] 머신러닝이란 무엇인가? 머신러닝의 종류, 활용법 (핸즈온 머신러닝 챕터 1 : 사이킷런과 텐서플로를 활용한 머신러닝, 딥러닝) (0) | 2021.07.15 |
[7.13] Hands-on Machine Learning - preface (0) | 2021.07.13 |
[7.5] BOAZ 데이터 분석 17기 합격 후기 : 1차 자기소개서, 2차 면접 (2) | 2021.07.05 |