๐Ÿ’ป My Work/๐Ÿง  AI

[์ธ๊ณต์ง€๋Šฅ/ํ˜ผ๊ณต๋จธ์‹ ] 03-2. ์„ ํ˜• ํšŒ๊ท€

Jaeseo Kim 2022. 12. 5. 21:25

์šฐ์„  ๋ฐ์ดํ„ฐ ์ค€๋น„ ํ›„ ํ›ˆ๋ จ๊นŒ์ง€.. (03-1 ๋‚ด์šฉ ์ฐธ๊ณ )

 

[์ธ๊ณต์ง€๋Šฅ/ํ˜ผ๊ณต๋จธ์‹ ] 03-1. k-์ตœ๊ทผ์ ‘ ์ด์›ƒ ํšŒ๊ท€

์ง€๋„ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ถ„๋ฅ˜์™€ ํšŒ๊ท€๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค. ์ด ๊ธ€์€ ํšŒ๊ท€์— ๋Œ€ํ•ด ์ž‘์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ํšŒ๊ท€ ์ž„์˜์˜ ์ˆซ์ž๋ฅผ ์˜ˆ์ธก ์ž„์˜์˜ ์ˆซ์ž = ํƒ€๊นƒ๊ฐ’ k-์ตœ๊ทผ์ ‘ ์ด์›ƒ ํšŒ๊ท€ ๋ชจ๋ธ ๊ฐ€๊นŒ์šด k๊ฐœ์˜ ์ด์›ƒ ์ฐพ๊ธฐ ์ด์›ƒ ์ƒ˜ํ”Œ์˜

avoc-o-d.tistory.com

# http://bit.ly/perch_data

# ์šฐ์„ , ํ›ˆ๋ จ, ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ๋‚˜๋ˆ„๊ณ 
# ํŠน์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ 2์ฐจ์› ๋ฐฐ์—ด๋กœ ๋ณ€ํ™˜

import numpy as np

perch_length = np.array([8.4, 13.7, 15.0, 16.2, 17.4, 18.0, 18.7, 19.0, 19.6, 20.0, 21.0,
       21.0, 21.0, 21.3, 22.0, 22.0, 22.0, 22.0, 22.0, 22.5, 22.5, 22.7,
       23.0, 23.5, 24.0, 24.0, 24.6, 25.0, 25.6, 26.5, 27.3, 27.5, 27.5,
       27.5, 28.0, 28.7, 30.0, 32.8, 34.5, 35.0, 36.5, 36.0, 37.0, 37.0,
       39.0, 39.0, 39.0, 40.0, 40.0, 40.0, 40.0, 42.0, 43.0, 43.0, 43.5,
       44.0])

perch_weight = np.array([5.9, 32.0, 40.0, 51.5, 70.0, 100.0, 78.0, 80.0, 85.0, 85.0, 110.0,
       115.0, 125.0, 130.0, 120.0, 120.0, 130.0, 135.0, 110.0, 130.0,
       150.0, 145.0, 150.0, 170.0, 225.0, 145.0, 188.0, 180.0, 197.0,
       218.0, 300.0, 260.0, 265.0, 250.0, 250.0, 300.0, 320.0, 514.0,
       556.0, 840.0, 685.0, 700.0, 700.0, 690.0, 900.0, 650.0, 820.0,
       850.0, 900.0, 1015.0, 820.0, 1100.0, 1000.0, 1100.0, 1000.0,
       1000.0])

from sklearn.model_selection import train_test_split

# ํ›ˆ๋ จ ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ๋‚˜๋ˆ„๊ธฐ
train_input, test_input, train_target, test_target = train_test_split(perch_length, perch_weight, random_state=42)

# 2์ฐจ์› ๋ฐฐ์—ด๋กœ ๋ณ€ํ™˜
train_input = train_input.reshape(-1, 1)
test_input = test_input.reshape(-1, 1)

# KNeighborsRegressor
from sklearn.neighbors import KNeighborsRegressor

# ์ด์›ƒ ๊ฐœ์ˆ˜๋ฅผ 3๊ฐœ๋กœ ํ•˜๋Š” ๋ชจ๋ธ
knr=KNeighborsRegressor(n_neighbors=3)

# ํ›ˆ๋ จ
knr.fit(train_input, train_target)

 

k-์ตœ๊ทผ์ ‘ ์ด์›ƒ์˜ ํ•œ๊ณ„

์ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ด์„œ ๊ธธ์ด๊ฐ€ 50cm์ธ ๋†์–ด์˜ ๋ฌด๊ฒŒ๋ฅผ ์˜ˆ์ธกํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

# ๊ธธ์ด 50cm ์ธ ๋†์–ด ๋ฌด๊ฒŒ ์˜ˆ์ธกํ•˜๊ธฐ
print(knr.predict([[50]]))

๊ฒฐ๊ณผ

ํ•ด๋‹น ๋ชจ๋ธ์€ 1033g ์ •๋„๋กœ ์˜ˆ์ธกํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ, ์‹ค์ œ ๋†์–ด์˜ ๋ฌด๊ฒŒ๋Š” ์ด๋ณด๋‹ค ํ›จ์”ฌ ๋” ๋งŽ์ด ๋‚˜๊ฐ€์•ผ ํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

๋ฌธ์ œ๊ฐ€ ์ƒ๊ฒผ๋„ค์š”...^^

 

ํ›ˆ๋ จ ์„ธํŠธ๋Š” ํŒŒ๋ž€์ƒ‰ ์ , ํ›ˆ๋ จ ์„ธํŠธ ์ค‘ ์˜ˆ์ธก๊ฐ’์˜ ์ด์›ƒ ์ƒ˜ํ”Œ์€ ์ฃผํ™ฉ์ƒ‰ ์ , ์˜ˆ์ธก๊ฐ’(50cm, 1033g)์€ ์ดˆ๋ก์ƒ‰ ์ ์œผ๋กœ ํ™•์ธํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์‚ฐ์ ๋„๋ฅผ ํ™•์ธํ–ˆ์„ ๋•Œ, ๋‹น์—ฐํ•˜๊ฒŒ๋„ ๊ธธ์ด๊ฐ€ ์ปค์ง์— ๋”ฐ๋ผ ๋ฌด๊ฒŒ๊ฐ€ ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿฐ๋ฐ, ์˜ˆ์ธก๊ฐ’(50cm, 1033g)์€ 45cm๋ณด๋‹ค ๊ธธ๊ธฐ ๋•Œ๋ฌธ์— ๋ฌด๊ฒŒ๋„ ์ปค์•ผ ํ•˜๋Š”๋ฐ, ํฌ์ง€ ์•Š์Šต๋‹ˆ๋‹ค..?

์šฐ์„ , ๋ชจ๋ธ์€ 50cm์— ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ด์›ƒ์ด๋ผ๊ณค 45cm ๊ทผ๋ฐฉ๋ฐ–์— ์—†๊ธฐ ๋•Œ๋ฌธ์— 45cm ๊ทผ๋ฐฉ ์ƒ˜ํ”Œ๋“ค์„ ์ด์šฉํ•˜์—ฌ ํ‰๊ท ์„ ๊ตฌํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๊ทธ๋Ÿผ, 45cm ๊ทผ๋ฐฉ ์ƒ˜ํ”Œ๋“ค์˜ ํ‰๊ท ์„ ์ง์ ‘ ๊ตฌํ•ด์„œ ๋ชจ๋ธ์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ๊ฐ’๊ณผ ๊ฐ™์€์ง€ ๋น„๊ตํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

print(np.mean(train_target[indexes]))

์ง์ ‘ ๊ตฌํ•œ ํ‰๊ท ๊ฐ’

์šฐ์™€ ๋˜‘๊ฐ™๋„ค์š”^^

๊ทธ๋Ÿฌ๋‹ˆ๊นŒ, ์ƒˆ๋กœ์šด ์ƒ˜ํ”Œ์ด ํ›ˆ๋ จ ์„ธํŠธ์˜ ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚˜๋ฉด ์—‰๋šฑํ•œ ๊ฐ’์„ ์˜ˆ์ธกํ•  ์ˆ˜ ๋ฐ–์— ์—†๋Š” ๊ฒƒ์ด์ฃ .

์ฆ‰, ํ›ˆ๋ จ ์„ธํŠธ ๋ฒ”์œ„ ๋ฐ–์˜ ์ƒ˜ํ”Œ์„ ์˜ˆ์ธกํ•  ์ˆ˜ ์—†๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋Š” ๊ฒƒ์„ ํŒŒ์•…ํ–ˆ์Šต๋‹ˆ๋‹ค.

๊ทน๋‹จ์ ์œผ๋กœ, ์ด ๋ชจ๋ธ์€ ๊ธธ์ด๊ฐ€ 1000cm ์ด์–ด๋„ 1033g ์œผ๋กœ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.๐Ÿฅฒ

 

๐Ÿ“Œ์„ ํ˜• ํšŒ๊ท€

  • ํšŒ๊ท€ ์•Œ๊ณ ๋ฆฌ์ฆ˜
  • ํŠน์„ฑ์ด ํ•˜๋‚˜์ผ ๋•Œ, ๋ถ„์„ํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฐ์ดํ„ฐ์˜ ๊ด€๊ณ„์„ฑ์ด ์ง์„ ์ธ ๊ฒฝ์šฐ
  • y = ax + b
๐Ÿ“Œ LinearRegression : ์„ ํ˜• ํšŒ๊ท€ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ตฌํ˜„ํ•œ ํด๋ž˜์Šค

 

์„ ํ˜• ํšŒ๊ท€ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜์—ฌ ์˜ˆ์ธกํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

from sklearn.linear_model import LinearRegression
lr = LinearRegression()

# ์„ ํ˜• ํšŒ๊ท€ ๋ชจ๋ธ์„ ํ›ˆ๋ จ
lr.fit(train_input, train_target)

# 50cm ๋†์–ด ์˜ˆ์ธก
print(lr.predict([[50]]))

๊ธธ์ด ์˜ˆ์ธก๊ฐ’

๐Ÿ“Œ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ

y = ax + b

  • a(๊ธฐ์šธ๊ธฐ) : coef_
  • b(์ ˆํŽธ) : intercept_

coef_, intercept_๋ฅผ ๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์ฐพ์€ ๊ฐ’์ด๋ผ๋Š” ์˜๋ฏธ๋กœ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋ผ๊ณ  ๋ถ€๋ฆ„

๋ชจ๋ธ ๊ธฐ๋ฐ˜ ํ•™์Šต - ์ตœ์ ์˜ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ฐพ๋Š” ํ›ˆ๋ จ ๊ณผ์ •

# y = ax + b
# a : coef ๊ธฐ์šธ๊ธฐ (coefficient ๊ณ„์ˆ˜ ํ˜น์€ ๊ฐ€์ค‘์น˜)
# b : intercept ์ ˆํŽธ

print(lr.coef_, lr.intercept_)

๋ฌด๊ฒŒ = 39.02 x ๊ธธ์ด - 709.02 ๋ฅผ ํ•™์Šตํ–ˆ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

๋†์–ด์˜ ๊ธธ์ด 15 ~ 50๊นŒ์ง€๋ฅผ ์ง์„ ์œผ๋กœ ๊ทธ๋ ค๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

plt.scatter(train_input, train_target)

# 15~ 50 1์ฐจ ๋ฐฉ์ •์‹ ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ
plt.plot([15, 50], [15*lr.coef_+lr.intercept_, 50*lr.coef_+lr.intercept_])

# 50cm ๋†์–ด ๋ฐ์ดํ„ฐ
plt.scatter(50, 1241.8, marker="^")
plt.xlabel("length")
plt.ylabel("weight")
plt.show()

์˜ค! ์ง์„ ์˜ ์—ฐ์žฅ์„ ์— 50cm ๋†์–ด์— ๋Œ€ํ•œ ์˜ˆ์ธก์ด ์žˆ์–ด์š”!

๊ทธ๋Ÿผ, R^2 ์ ์ˆ˜๋ฅผ ํ™•์ธํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์ ์ˆ˜ ๊ฒฐ๊ณผ

์Œ,, ๋‘ ์„ธํŠธ ๋ชจ๋‘ ๊ฒฐ์ • ๊ณ„์ˆ˜๊ฐ€ ๋‚ฎ๋„ค์š”! ๊ณผ์†Œ์ ํ•ฉ ๋˜์—ˆ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์–ด์š”.๐Ÿ˜‚

 

๐Ÿค” ๋ฌธ์ œ? ์šฐ์„  ์„ ํ˜• ํšŒ๊ท€๋กœ ๋งŒ๋“  ์ง์„ ์€ ๋ฌด๊ฒŒ๊ฐ€ 0g ์ดํ•˜๋กœ ๋‚ด๋ ค๊ฐ€๋Š” ๋ฌธ์ œ๊ฐ€ ์ƒ๊น๋‹ˆ๋‹ค. 

์ฆ‰, ๋ฌด๊ฒŒ๊ฐ€ ์Œ์ˆ˜๋กœ ๋‚˜์˜ฌ ์ˆ˜ ์žˆ๋Š” ๋ฌธ์ œ์ ์ด ์žˆ๋Š” ๊ฒƒ์„ ํŒŒ์•…ํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ’กํ•ด๊ฒฐ ! ์‚ฐ์ ๋„๋กœ๋Š” ์ผ์ง์„ ๋ณด๋‹จ ๊ณก์„ ์˜ ๋ชจ์–‘์— ๊ฐ€๊น์ฃ . ๋‹คํ•ญ์‹์„ ์‚ฌ์šฉํ•œ ์„ ํ˜• ํšŒ๊ท€๋ฅผ ํ•ฉ๋‹ˆ๋‹ค.

 

๐Ÿ“Œ๋‹คํ•ญ ํšŒ๊ท€

  • ๋‹คํ•ญ์‹์„ ์‚ฌ์šฉํ•œ ์„ ํ˜• ํšŒ๊ท€
  • ํŠน์„ฑ์ด ํ•˜๋‚˜์ผ ๋•Œ, ๋ถ„์„ํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฐ์ดํ„ฐ์˜ ๊ด€๊ณ„์„ฑ์ด ์ง์„ ์ด ์•„๋‹Œ ๊ฒฝ์šฐ
  • y = axยฒ + bx + c ...
  • ํ•ด๋‹น ์˜ˆ์ œ๋Š” 2์ฐจ ๋ฐฉ์ •์‹์œผ๋กœ, ์ œ๊ณฑํ•œ ํ•ญ์ด ํ›ˆ๋ จ ์„ธํŠธ์— ์ถ”๊ฐ€๋˜์–ด์•ผ ํ•จ
๐Ÿ“Œ ๋‹คํ•ญ ํšŒ๊ท€๋„ ์„ ํ˜• ํšŒ๊ท€๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
xยฒ๋ฅผ ๊ฐ„๋‹จํ•˜๊ฒŒ ๋‹ค๋ฅธ ๋ณ€์ˆ˜๋กœ ์น˜ํ™˜ํ•˜๊ฒŒ ๋˜๋ฉด, y = aA + bx +c ์™€ ๊ฐ™์ด ์“ธ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, y๋Š” A์™€ x์˜ ์„ ํ˜• ๊ด€๊ณ„๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค.

๐Ÿ“์ฃผ์˜ ! ๋ฐ์ดํ„ฐ ์…‹์„ 2์ฐจ ๋ฐฉ์ •์‹์— ๋งž๊ฒŒ ์ค€๋น„ํ•˜๋”๋ผ๋„ ํƒ€๊นƒ๊ฐ’์€ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

ํƒ€๊นƒ๊ฐ’์€ ์–ด๋–ค ๊ทธ๋ž˜ํ”„๋ฅผ ํ›ˆ๋ จํ•˜๋“ ์ง€๊ฐ„์— ๋ฐ”๊ฟ€ ํ•„์š”๊ฐ€ ์—†์–ด์š”!

# ์ œ๊ณฑํ•œ ํ•ญ์ด ํ›ˆ๋ จ์„ธํŠธ์— ์ถ”๊ฐ€๋˜์–ด์•ผ ํ•˜๋‹ˆ๊นŒ..(๋ฐ์ดํ„ฐ ์…‹ ๊ฐœ์ˆ˜, 2) ์ธ ๋ฐฐ์—ด ๋งŒ๋“ค์–ด์•ผ ํ•จ!

# ๊ฐ ์ œ๊ณฑํ•œ ๊ฑธ ๋‚˜๋ž€ํžˆ ๋ถ™์ด๊ธฐ~!
train_poly = np.column_stack((train_input ** 2 , train_input))
test_poly = np.column_stack((test_input ** 2 , test_input))

๋ฐ์ดํ„ฐ๊ฐ€ ์ค€๋น„ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค! ๊ทธ๋Ÿผ ๋‹ค์‹œ ํ›ˆ๋ จํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

lr = LinearRegression()

# ๋‹ค์‹œ ํ›ˆ๋ จ ํ›„, ๋” ๋†’์€ ๋ฌด๊ฒŒ๊ฐ’์„ ์˜ˆ์ธก!
lr.fit(train_poly, train_target)
lr.predict([[50**2, 50]])

๋ฌด๊ฒŒ ์˜ˆ์ธก ๊ฒฐ๊ณผ

๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

๋ฌด๊ฒŒ = 1.01 x ๊ธธ์ดยฒ - 21.6 x ๊ธธ์ด + 116.05 ๋ฅผ ํ•™์Šตํ–ˆ์Šต๋‹ˆ๋‹ค.

 

์‚ฐ์ ๋„๋กœ ๊ทธ๋ ค๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

# ๊ตฌ๊ฐ„๋ณ„ ์ง์„ ์„ ๊ทธ๋ฆฌ๊ธฐ ์œ„ํ•ด 15~49 ์ •์ˆ˜ ๋ฐฐ์—ด ๋งŒ๋“ค๊ธฐ
point = np.arange(15, 50)

# ํ›ˆ๋ จ ์„ธํŠธ ์‚ฐ์ ๋„
plt.scatter(train_input, train_target)

# 15~ 49 2์ฐจ ๋ฐฉ์ •์‹ ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ
plt.plot(point, 1.01*point**2 - 21.6*point + 116.05)

# 50cm ๋†์–ด ๋ฐ์ดํ„ฐ
plt.scatter(50, 1574, marker="^")
plt.xlabel("length")
plt.ylabel("weight")
plt.show()

์‚ฐ์ ๋„ ๊ฒฐ๊ณผ

๊ณก์„ ์œผ๋กœ ์˜ˆ์˜๊ฒŒ ์ž˜ ๊ทธ๋ ค์กŒ๋„ค์šฅ๐Ÿ˜

๊ฒฐ์ • ๊ณ„์ˆ˜๋„ ์‚ดํŽด๋ณผ๊นŒ์š”?

์•„์ฃผ ์ž˜ ํ›ˆ๋ จ๋˜์—ˆ์Šต๋‹ˆ๋‹ค~!

๊ทธ๋ ‡์ง€๋งŒ, ์•„์ง ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ ์ ์ˆ˜๊ฐ€ ์กฐ์˜ค๊ธˆ ๋” ๋†’์Šต๋‹ˆ๋‹ค. ๊ณผ์†Œ์ ํ•ฉ์ด ์•„์ง ๋‚จ์•˜๋‹ค๋Š” ๊ฒƒ์ด์ฃ ..

์กฐ๊ธˆ ๋” ๋ณต์žกํ•œ ๋ชจ๋ธ์ด ํ•„์š”ํ•  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

๐Ÿค”๐Ÿค”