๐Ÿ’ป My Work/๐Ÿง  AI

[์ธ๊ณต์ง€๋Šฅ/ํ˜ผ๊ณต๋จธ์‹ ] 02-2. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

Jaeseo Kim 2022. 12. 4. 04:45
- ์˜ฌ๋ฐ”๋ฅธ ๊ฒฐ๊ณผ ๋„์ถœ์„ ์œ„ํ•ด์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์ „, ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์„ ๊ฑฐ์นฉ๋‹ˆ๋‹ค.
- ํ‘œ์ค€์ ์ˆ˜๋กœ ํŠน์„ฑ์˜ ์Šค์ผ€์ผ์„ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐฉ๋ฒ•๋„ ๋ฐฐ์›๋‹ˆ๋‹ค.

 

๋ฐ์ดํ„ฐ ์ค€๋น„ํ•˜๊ธฐ

์ž…๋ ฅ, ํƒ€๊นƒ ๋ฐ์ดํ„ฐ๋ฅผ ์ค€๋น„ํ•ด๋ด…์‹œ๋‹ค.

# http://bit.ly/bream_smelt

# 49๊ฐœ์˜ ๋ฐ์ดํ„ฐ
# 35๋งˆ๋ฆฌ ๋„๋ฏธ, 14๋งˆ๋ฆฌ ๋น™์–ด

fish_length = [25.4, 26.3, 26.5, 29.0, 29.0, 29.7, 29.7, 30.0, 30.0, 30.7, 31.0, 31.0, 
                31.5, 32.0, 32.0, 32.0, 33.0, 33.0, 33.5, 33.5, 34.0, 34.0, 34.5, 35.0, 
                35.0, 35.0, 35.0, 36.0, 36.0, 37.0, 38.5, 38.5, 39.5, 41.0, 41.0, 9.8, 
                10.5, 10.6, 11.0, 11.2, 11.3, 11.8, 11.8, 12.0, 12.2, 12.4, 13.0, 14.3, 15.0]
fish_weight = [242.0, 290.0, 340.0, 363.0, 430.0, 450.0, 500.0, 390.0, 450.0, 500.0, 475.0, 500.0, 
                500.0, 340.0, 600.0, 600.0, 700.0, 700.0, 610.0, 650.0, 575.0, 685.0, 620.0, 680.0, 
                700.0, 725.0, 720.0, 714.0, 850.0, 1000.0, 920.0, 955.0, 925.0, 975.0, 950.0, 6.7, 
                7.5, 7.0, 9.7, 9.8, 8.7, 10.0, 9.9, 9.8, 12.2, 13.4, 12.2, 19.7, 19.9]

 

๋„˜ํŒŒ์ด์˜ column_stack, concatenate, ones, zeros ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์™€ ํƒ€๊นƒ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ„ํŽธํ•˜๊ฒŒ ๋งŒ๋“ค์–ด๋ด…์‹œ๋‹ค!

๐Ÿ“Œ ๋„˜ํŒŒ์ด์˜ column_stack, concatenate, ones, zeros
- column_stack : ์ „๋‹ฌ๋ฐ›์€ ๋ฆฌ์ŠคํŠธ๋ฅผ ๊ฐ๊ฐ ์ผ๋ ฌ๋กœ ์„ธ์šด ๋‹ค์Œ, ์ฐจ๋ก€๋Œ€๋กœ ๋‚˜๋ž€ํžˆ ์—ฐ๊ฒฐ
- concatenate : ์ „๋‹ฌ๋ฐ›์€ ๋ฆฌ์ŠคํŠธ๋ฅผ 1์ฐจ์›์œผ๋กœ ๋‚˜๋ž€ํžˆ ์—ฐ๊ฒฐ
- ones : 1๋กœ ์ฑ„์šด ๋ฐฐ์—ด ๋ฆฌํ„ด
- zeros : 0์œผ๋กœ ์ฑ„์šด ๋ฐฐ์—ด ๋ฆฌํ„ด

์‚ฌ์šฉ ์˜ˆ์‹œ)

import numpy as np

print("์›๋ž˜ ๋ฐ์ดํ„ฐ")
print([1,2,3], [4,5,6])
# column_stack : ์ „๋‹ฌ๋ฐ›์€ ๋ฆฌ์ŠคํŠธ๋ฅผ ๊ฐ๊ฐ ์ผ๋ ฌ๋กœ ์„ธ์šด ๋‹ค์Œ, ์ฐจ๋ก€๋Œ€๋กœ ๋‚˜๋ž€ํžˆ ์—ฐ๊ฒฐ
print("\ncolumn_stack ๊ฒฐ๊ณผ")
print(np.column_stack(([1,2,3], [4,5,6])))
# concatenate : ์ „๋‹ฌ๋ฐ›์€ ๋ฆฌ์ŠคํŠธ๋ฅผ 1์ฐจ์›์œผ๋กœ ๋‚˜๋ž€ํžˆ ์—ฐ๊ฒฐ
print("\nconcatenate ๊ฒฐ๊ณผ")
print(np.concatenate(([1,2,3], [4,5,6])))

๊ฒฐ๊ณผ

# ones : 1๋กœ ์ฑ„์šด ํฌ๊ธฐ๊ฐ€ n์ธ ๋ฐฐ์—ด ์ƒ์„ฑ
print(np.ones(5))
# zeros : 0๋กœ ์ฑ„์šด ํฌ๊ธฐ๊ฐ€ n์ธ ๋ฐฐ์—ด ์ƒ์„ฑ
print(np.zeros(5))

๊ฒฐ๊ณผ

 

์ž…๋ ฅ, ํƒ€๊นƒ ๋ฐ์ดํ„ฐ.. ์ด์ œ ์ง„์งœ ๋งŒ๋“ค์–ด๋ณผ๊ฒŒ์š”! ๐Ÿ˜๐Ÿ˜๐Ÿ˜

๊ฐ ์ƒ˜ํ”Œ๋งˆ๋‹ค length, weight๊ฐ€ ํŠน์„ฑ์ธ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๊ณ ,

๐ŸŸ ๋„๋ฏธ๋Š” 1, ๋น™์–ด๋Š” 0์œผ๋กœ ํƒ€๊นƒ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

import numpy as np

# ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ๋งŒ๋“ค๊ธฐ
fish_data = np.column_stack((fish_length, fish_weight))

# ํƒ€๊นƒ ๋ฐ์ดํ„ฐ ๋งŒ๋“ค๊ธฐ
fish_target = np.concatenate((np.ones(35), np.zeros(14)))

์ž˜ ๋งŒ๋“ค์–ด์กŒ๋Š”์ง€ 5๊ฐœ์”ฉ ์ถœ๋ ฅํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค.โœจ

๊ฒฐ๊ณผ

 

 

์‚ฌ์ดํ‚ท๋Ÿฐ์œผ๋กœ ํ›ˆ๋ จ ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ ๋‚˜๋ˆ„๊ธฐ

ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ(์ž…๋ ฅ + ํƒ€๊นƒ ๋ฐ์ดํ„ฐ)๋ฅผ ๋งŒ๋“ค์—ˆ์œผ๋‹ˆ, ์ด๋ฅผ ํ›ˆ๋ จ ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ๋‚˜๋ˆ„์–ด์ฃผ๊ฒ ์Šต๋‹ˆ๋‹ค.

ํ•œ ๋ฐฐ์—ด์„ ํ›ˆ๋ จ, ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ๋‚˜๋ˆ„๋‹ˆ๊นŒ x 2 ๊ฐœ์”ฉ ์ƒ์„ฑ๋˜๊ฒ ์ฃ ?

๐Ÿ“Œ ์‚ฌ์ดํ‚ท๋Ÿฐ์˜ train_test_split()
์ „๋‹ฌ๋œ ๋ฆฌ์ŠคํŠธ๋ฅผ ๋žœ๋ค์œผ๋กœ ์„ž๊ณ , ๋น„์œจ์— ๋งž๊ฒŒ ํ›ˆ๋ จ ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ๋‚˜๋ˆ„์–ด ์คŒ

 

train_test_split() ๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ 25%๋ฅผ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ๋–ผ์–ด๋ƒ…๋‹ˆ๋‹ค.
๐Ÿค” ๋ฌธ์ œ ? ์ผ๋ถ€ ํด๋ž˜์Šค ๊ฐœ์ˆ˜๊ฐ€ ์ ์„ ๋•Œ, ์ƒ˜ํ”Œ๋ง ํŽธํ–ฅ์ด ๋‚˜ํƒ€๋‚  ๋•Œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. (ํ•ด๋‹น ์˜ˆ์ œ์—์„œ๋Š” ๋น™์–ด๊ฐ€ ์ ์–ด์„œ ๋ฐœ์ƒ)
๐Ÿ’ก ํ•ด๊ฒฐ ! stratify ๋งค๊ฐœ๋ณ€์ˆ˜์— ํƒ€๊นƒ ๋ฐ์ดํ„ฐ๋ฅผ ์ „๋‹ฌํ•˜๋ฉด ํด๋ž˜์Šค ๋น„์œจ์— ๋งž๊ฒŒ ๋ฐ์ดํ„ฐ๋ฅผ ๋‚˜๋ˆ•๋‹ˆ๋‹ค.

 from sklearn.model_selection import train_test_split

# random_state ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ ๋žœ๋ค ์‹œ๋“œ๋ฅผ ์ง€์ •ํ•ด์คŒ
train_input, test_input, train_target, test_target = train_test_split(fish_data, fish_target, stratify = fish_target, random_state = 42)

 

์ž˜ ์„ž์˜€๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด, ํ›ˆ๋ จ, ํ…Œ์ŠคํŠธ ์„ธํŠธ ๊ฐ๊ฐ์˜ ํƒ€๊นƒ ๋ฐ์ดํ„ฐ๋ฅผ ์ถœ๋ ฅํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค!

๊ฒฐ๊ณผ

 

 

ํ˜„์žฌ ๋ฌธ์ œ ํŒŒ์•…ํ•˜๊ธฐ

์ค€๋น„ํ•œ ๋ฐ์ดํ„ฐ๋กœ k-์ตœ๊ทผ์ ‘ ์ด์›ƒ์„ ํ›ˆ๋ จํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

from sklearn.neighbors import KNeighborsClassifier

kn = KNeighborsClassifier()
kn.fit(train_input, train_target)
kn.score(test_input, test_target)
# 1.0

์ •ํ™•๋„๊ฐ€ 1.0์ด๋„ค์š”~!

 

๊ทธ๋Ÿผ, ์ƒˆ๋กœ์šด ๋„๋ฏธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์˜ˆ์ธกํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

print(kn.predict([[25, 150]]))

???!!!

์•„๋‹ˆ. . . ๐ŸŸ ๋„๋ฏธ๋Š” 1, ๋น™์–ด๋Š” 0์ด๋ผ, 1์ด ๋‚˜์™€์•ผ ํ•˜๋Š”๋ฐ ์™œ ๋น™์–ด๊ฐ€ ๋‚˜์˜ฌ๊นŒ์š”?

์‚ฐ์ ๋„๋กœ ํ™•์ธํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

import matplotlib.pyplot as plt

plt.scatter(train_input[ : , 0], train_input[ : , 1])
plt.scatter(25, 150, marker="^") # ์˜ˆ์ธกํ•˜๋ ค๋Š” ๋„๋ฏธ ๋ฐ์ดํ„ฐ
plt.xlabel("length")
plt.ylabel("weight")
plt.show()

๊ฒฐ๊ณผ

 

๐Ÿค” ๋ฌธ์ œ ? ์‚ฐ์ ๋„๋กœ ๋ดค์„ ๋• ๋„๋ฏธ์ชฝ์— ๋” ๊ฐ€๊น๊ฒŒ ๋ณด์ด๋Š”๋ฐ, ์™œ ๋น™์–ด ๋ฐ์ดํ„ฐ์— ๊ฐ€๊น๋‹ค๊ณ  ํŒ๋‹จํ•˜์—ฌ 0์ด ์ถœ๋ ฅ๋œ ๊ฒƒ์ผ๊นŒ์š”?

๐Ÿง  ๊ณผ์ •... k-์ตœ๊ทผ์ ‘ ์ด์›ƒ์€ ์ฃผ๋ณ€์˜ ์ƒ˜ํ”Œ ์ค‘์—์„œ ๋‹ค์ˆ˜์ธ ํด๋ž˜์Šค๋ฅผ ์˜ˆ์ธก์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด, ์ฃผ๋ณ€ ์ƒ˜ํ”Œ์„ ์•Œ์•„๋ณด๋„๋ก ํ•˜์ฃ !

๐Ÿ“Œ KNeighborsClassifier ํด๋ž˜์Šค์˜ kneighbors()
- ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ด์›ƒ ์ƒ˜ํ”Œ์„ ์ฐพ์•„์ฃผ๋Š” ํ•จ์ˆ˜ (์ด์›ƒ ๊ฐœ์ˆ˜ ๊ธฐ๋ณธ๊ฐ’ 5)
- ์ด์›ƒ๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ์™€ ์ด์›ƒ ์ƒ˜ํ”Œ์˜ ์ธ๋ฑ์Šค ๋ฐ˜ํ™˜

๋ฐฐ์—ด ์ธ๋ฑ์‹ฑ์„ ์ด์šฉํ•˜์—ฌ ์ฃผ๋ณ€ ์ƒ˜ํ”Œ์„ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

distances, indexes = kn.kneighbors([[25,150]]) # ์ด์›ƒ ์ƒ˜ํ”Œ ์ฐพ๊ธฐ

plt.scatter(train_input[:,0], train_input[:,1])
plt.scatter(25, 150, marker="^") # ์˜ˆ์ธกํ•˜๋ ค๋Š” ๋„๋ฏธ ๋ฐ์ดํ„ฐ
plt.scatter(train_input[indexes, 0], train_input[indexes, 1], marker='D') # ์ด์›ƒ ์ƒ˜ํ”Œ
plt.xlabel("length")
plt.ylabel("weight")
plt.show()

๊ฒฐ๊ณผ

 

๊ธฐ์ค€์„ ๋งž์ถ”๊ธฐ

๐Ÿค”๋ฌธ์ œ ?

์ด์›ƒ ์ƒ˜ํ”Œ๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๋ดค์„ ๋•Œ,,, [[ 92.00086956 130.48375378 130.73859415 138.32150953 138.39320793]]

๋„๋ฏธ๊นŒ์ง€๋Š” 92, ๋น™์–ด๊นŒ์ง€๋Š” 130~138 ์œผ๋กœ ๋‚˜์˜ค๋„ค์š”.

๊ทธ๋Ÿฐ๋ฐ, ์‚ฐ์ ๋„์—์„œ ๋„๋ฏธ๊นŒ์ง€ 92๋ผ ํ–ˆ์„ ๋•Œ, ๋น™์–ด๊นŒ์ง€ ๊ฒจ์šฐ 130์ธ๊ฒŒ ์ข€ ์ด์ƒํ•ฉ๋‹ˆ๋‹ค. ๋น„์œจ์ด ์ด์ƒํ•ด์š”.

๐Ÿ’กํ•ด๊ฒฐ ! x ์ถ•์€ ๋ฒ”์œ„๊ฐ€ ์ข๊ณ (10~40), y์ถ•์€ ๋ฒ”์œ„๊ฐ€ ๋„“์Šต๋‹ˆ๋‹ค(0~1000). ์ฆ‰, y์ถ•์œผ๋กœ ์กฐ๊ธˆ๋งŒ ๋ฉ€์–ด์ ธ๋„ ํ™•!! ๋ฉ€์–ด์ง„ ๊ฒƒ์ฒ˜๋Ÿผ ์•„์ฃผ ํฐ ๊ฐ’์œผ๋กœ ๊ณ„์‚ฐ๋˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค!

๋”ฐ๋ผ์„œ, x์ถ•์˜ ๋ฒ”์œ„๋ฅผ y์ถ•์˜ ๋ฒ”์œ„์™€ ๋™์ผํ•˜๊ฒŒ 0~1000์œผ๋กœ ๋งž์ถ”๋„๋ก ํ•ฉ์‹œ๋‹ค!

๐Ÿ“Œ ๋งทํ”Œ๋กฏ๋ฆฝ์˜ xlim()
- x ์ถ•์˜ ๋ฒ”์œ„๋ฅผ ์ง€์ •
- y์ถ• ์ง€์ •ํ•˜๋ ค๋ฉด ylim()
plt.scatter(train_input[ : , 0], train_input[ : , 1])
plt.scatter(25, 150, marker="^")
plt.scatter(train_input[indexes, 0], train_input[indexes, 1], marker='D')

# xlim : x ์ถ•์˜ ๋ฒ”์œ„๋ฅผ ์ง€์ •
plt.xlim((0, 1000))

plt.xlabel("length")
plt.ylabel("weight")
plt.show()

๊ฒฐ๊ณผ

์ด๋ ‡๊ฒŒ ๋ณด๋‹ˆ, ํ™•์‹คํžˆ x์ถ•์€ ์ด์›ƒ์ƒ˜ํ”Œ๋กœ ์ฑ„ํƒ๋˜๋Š” ๋ฐ์— ํฐ ์˜ํ–ฅ์€ ๋ชป ๋ผ์นœ๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๐Ÿ˜ฅ

 

๐Ÿ“ ์Šค์ผ€์ผ : ํŠน์„ฑ์˜ ๊ฐ’์ด ๋†“์ธ ๋ฒ”์œ„

์œ„ ์ฒ˜๋Ÿผ, ๋‘ ํŠน์„ฑ์˜ ๋ฒ”์œ„๊ฐ€ ๋‹ค๋ฅผ ๋•Œ ๋‘ ํŠน์„ฑ์˜ ์Šค์ผ€์ผ์ด ๋‹ค๋ฅด๋‹ค๊ณ  ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๊ธฐ์ค€์ด ๋‹ค๋ฅด๋ฉด ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์˜ˆ์ธกํ•˜๊ธฐ ํž˜๋“ญ๋‹ˆ๋‹ค. ๊ธฐ์ค€์„ ๋งž์ถฐ์ฃผ์–ด์•ผ๊ฒ ๋„ค์š”. ๐Ÿ˜

 

 

๐Ÿ“Œ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

  • ํŠน์„ฑ๊ฐ’์„ ์ผ์ •ํ•œ ๊ธฐ์ค€์œผ๋กœ ๋งž์ถฐ ์ฃผ๋Š” ์ž‘์—…
  • ์ „์ฒ˜๋ฆฌ ๋ฐฉ๋ฒ•์€ ๋‹ค์–‘ํ•จ
    • ํ‘œ์ค€์ ์ˆ˜ (z์ ์ˆ˜) : ๊ฐ ํŠน์„ฑ๊ฐ’์ด ํ‰๊ท ์—์„œ ํ‘œ์ค€ํŽธ์ฐจ์˜ ๋ช‡ ๋ฐฐ๋งŒํผ ๋–จ์–ด์ ธ ์žˆ๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋ƒ„
      • ๊ณ„์‚ฐ ๋ฐฉ๋ฒ• : ํ‰๊ท ์„ ๋บด๊ณ  ํ‘œ์ค€ํŽธ์ฐจ๋กœ ๋‚˜๋ˆ”

๊ทธ๋Ÿผ, ๊ธฐ์ค€์„ ๋งž์ถ”๊ธฐ ์œ„ํ•ด ํ›ˆ๋ จ ์„ธํŠธ๋ฅผ ํ‘œ์ค€ ์ ์ˆ˜๋กœ ๋ณ€ํ™˜ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

mean = np.mean(train_input, axis=0) # ํ‰๊ท 
std = np.std(train_input, axis=0) # ํ‘œ์ค€ ํŽธ์ฐจ

train_scaled = (train_input - mean) / std # ํ‘œ์ค€ ์ ์ˆ˜๋กœ ๋ณ€ํ™˜ - ๋„˜ํŒŒ์ด ๋ธŒ๋กœ๋“œ์บ์ŠคํŒ…
๐Ÿ“Œ axis = 0 ํ•˜๋Š” ์ด์œ ?
ํŠน์„ฑ๋งˆ๋‹ค ๊ฐ’์˜ ์Šค์ผ€์ผ์ด ๋‹ค๋ฅด๋ฏ€๋กœ ํ‰๊ท ๊ณผ ํ‘œ์ค€ํŽธ์ฐจ๋Š” ๊ฐ ํŠน์„ฑ๋ณ„๋กœ ๊ณ„์‚ฐํ•ด์•ผ ํ•จ
ํ•œ ์ƒ˜ํ”Œ(ํ–‰)์„ ๋”ฐ๋ผ ๊ฐ ์—ด(ํŠน์„ฑ)์˜ ํ†ต๊ณ„ ๊ฐ’์„ ๊ณ„์‚ฐํ•ด์ฃผ๊ธฐ ์œ„ํ•จ!

 

์ „์ฒ˜๋ฆฌ ๋ฐ์ดํ„ฐ๋กœ ๋ชจ๋ธ ํ›ˆ๋ จํ•˜๊ธฐ

โœจ ํ›ˆ๋ จ ์„ธํŠธ๋ฅผ ํ‘œ์ค€ ์ ์ˆ˜๋กœ ๋ณ€ํ™˜ํ•ด์ฃผ์—ˆ๊ธฐ ๋•Œ๋ฌธ์—, ์˜ˆ์ธกํ•  ๋„๋ฏธ๋ฐ์ดํ„ฐ๋„ ํ•จ๊ป˜ ํ‘œ์ค€ ์ ์ˆ˜๋กœ ๋ณ€ํ™˜ํ•ด์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค!

๐Ÿ“Œ ๋˜ํ•œ, ํ›ˆ๋ จ ์„ธํŠธ์˜ ํ‰๊ท ๊ณผ ํ‘œ์ค€ ํŽธ์ฐจ๋ฅผ ์ด์šฉํ•˜์—ฌ ํ‘œ์ค€ ์ ์ˆ˜๋กœ ๋ณ€ํ™˜ํ•ด์ฃผ์–ด์•ผ ํ•œ๋‹ค๋Š” ์  ์ฃผ์˜!!

 

์‚ฐ์ ๋„๋กœ ํ™•์ธํ•ด๋ณผ๊นŒ์š”?

new = ([25, 150] - mean) /std # ์˜ˆ์ธกํ•  ๋„๋ฏธ๋ฐ์ดํ„ฐ ํ‘œ์ค€ ์ ์ˆ˜๋กœ ๋ณ€ํ™˜

plt.scatter(train_scaled[:,0], train_scaled[:,1])
plt.scatter(new[0],new[1], marker="^")
plt.xlabel("length")
plt.ylabel("weight")
plt.show()

๊ฒฐ๊ณผ

 

ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์˜ ๋‘ ํŠน์„ฑ์ด ๋น„์Šทํ•œ ๋ฒ”์œ„๋ฅผ ์ฐจ์ง€ํ•˜๋Š” ๊ฑธ ํ™•์ธํ–ˆ์œผ๋‹ˆ, ํ›ˆ๋ จ!

๐Ÿ“Œ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋„ ํ›ˆ๋ จ ์„ธํŠธ์˜ ํ‰๊ท ๊ณผ ํ‘œ์ค€ ํŽธ์ฐจ๋ฅผ ์ด์šฉํ•˜์—ฌ ํ‘œ์ค€ ์ ์ˆ˜๋กœ ๋ณ€ํ™˜ํ•ด์ฃผ์–ด์•ผ ํ•œ๋‹ค๋Š” ์  ์ฃผ์˜!!

๐Ÿค” ์™œ? ๋ฐ์ดํ„ฐ์˜ ์Šค์ผ€์ผ์ด ๊ฐ™์•„์ง€์ง€ ์•Š์œผ๋ฏ€๋กœ

# ์ „์ฒ˜๋ฆฌ ๋ฐ์ดํ„ฐ๋กœ ํ›ˆ๋ จ
kn.fit(train_scaled, train_target)

# ํ…Œ์ŠคํŠธ ์„ธํŠธ๋ฅผ ํ›ˆ๋ จ ์„ธํŠธ์˜ ํ‰๊ท ๊ณผ ํ‘œ์ค€ํŽธ์ฐจ ์ด์šฉํ•˜์—ฌ ํ‘œ์ค€์ ์ˆ˜๋กœ ๋ณ€ํ™˜
test_scaled = (test_input - mean) / std

# ํ‰๊ฐ€
kn.score(test_scaled, test_target)
# 1.0

์ •ํ™•๋„๋Š” 1.0~~!!

 

์˜ˆ์ธก๋„ ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค!

# ์˜ˆ์ธก
print(kn.predict([new]))

๋„๋ฏธ~~!!

์‚ฐ์ ๋„๋กœ๋„ ํ™•์ธํ•ด๋ณด์ฃ !

์ด์›ƒ ์ƒ˜ํ”Œ๋กœ ๋ชจ๋‘ ๋„๋ฏธ๊ฐ€ ์žกํžˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๐Ÿ˜