[필기] Hands on machine learning

Data Science/AI

[필기] Hands on machine learning - chapter 1.

토마토. 2021. 7. 13. 13:20

the machine learning landscape

<machine learning 사용 예시>

- OCR optical character recognition

- spam filter

- recommendation

- voice search

supervised learning

unsupervised learning

online/batch learning

instance-based/model-based learning

workflow of typical ML project

the science of programming computers so they can learn from data

training set

training instance = training sample

: training data, acurracy, training task로 구성된다.

a filter based on machine learning techniques automatically learns which words and phrases are good predictors of spam by detecting unusually frequent pattens of words in spam

data mining

- digging into large amounts of data to discover pattens that were not immediately apparent

1) supervised, unsupervised, semisupervised, reinforcement learning

** supervised

the training data includes labels

labels : the desired solutions

ex) classification, regression(predict a target numberic value), linear regression, logistic regression, support vector machines, decision trees and random forests, neural networks

** unsupervised learning

learns without a label

ex) clustering, visualization and dimensionality reduction(feature extraction, anomaly detection), association rule learning

** semisupervised learning

one lable per person

** reinforcement learning

agent : observe the environment, select and perform actions, get rewards/penalties

policy : and learn what the best strategy is

ex) alphago

2) online learning, batch learning

** batch learning

**online learning

feed the data sequentially by mini-batches

incremental learning

learning rate - how fast should they adapt to changing data?

challenge - bad data > performance gradually declines

3) instance-based, model-based learning

Q. how ML generalize?

true goal is to perform well on new instances

** instance-based learing

measure of similarity

** model-based learning

build a model

use the model to make predictions

model selection - e.g) linear model

define parameter values

specify a performance measure - e.g) linear regression algorithm

<main challenges of machine learning>

** bad algorithm < bad data

** Insufficient quantity of training data

** nonrepresentative training data

cf) sampling bias - nonresponse bias

** poor-quality data

** irrelevant features

cf) feature engineering

feature selection : selecting the most useful features to train

feature extraction : combining existing features to produce a more useful one

** overfitting the training data

overfitting : overgeneralizing

regularization

** underfitting the training data

** stepping back

<Testing and validating>

<exercises>

'Data Science > AI' 카테고리의 다른 글

BOAZ BIGDATA CONFERENCE 2021 / 빅데이터 연합동아리 14회 컨퍼런스 (0)	2021.07.19
[필기] 머신러닝 프로젝트 End-to-End 진행의 과정 (핸즈온 머신러닝 챕터 2) (0)	2021.07.15
[정리] 머신러닝이란 무엇인가? 머신러닝의 종류, 활용법 (핸즈온 머신러닝 챕터 1 : 사이킷런과 텐서플로를 활용한 머신러닝, 딥러닝) (0)	2021.07.15
[7.13] Hands-on Machine Learning - preface (0)	2021.07.13
[7.5] BOAZ 데이터 분석 17기 합격 후기 : 1차 자기소개서, 2차 면접 (2)	2021.07.05

현재글[필기] Hands on machine learning - chapter 1.

HappyTomatoLife

기록하는 토마토

REACT, react.js, 교육상담, 함수형 언어, Doubly Linked List, singly linked list, linked Queue, JavaScript, 조건문, binary search, linear DS, maze problem, Expression evaluation, Deque, OCaml, DS, SQL, 자료구조, linked stack, 반복문,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

HappyTomatoLife