๐Ÿ’ป My Work/๐Ÿง  AI

[๋”ฅ๋Ÿฌ๋‹] TensorFlow๋ฅผ ์‚ฌ์šฉํ•œ ๋”ฅ๋Ÿฌ๋‹ CNN ๋“œ๋Ÿผ ์†Œ๋ฆฌ ๋ถ„๋ฅ˜

Jaeseo Kim 2023. 11. 24. 18:58

๐Ÿƒโ€โ™‚๏ธ ํ•ด๋‹น ๊ธ€์€ Tensorflow๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ์— ํ™˜๊ฒฝ์ด ๊ตฌ์ถ•๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

Anaconda3 + tensorflow ํ‚ค์›Œ๋“œ๋กœ ๊ตฌ๊ธ€๋งํ•ด์„œ ๋‚˜์˜ค๋Š” ๋ธ”๋กœ๊ทธ๋“ค์„ ์ฐธ๊ณ  ๋ฐ”๋ž๋‹ˆ๋‹ค!


00. ๋ชฉํ‘œ

๋“œ๋Ÿผ ์†Œ๋ฆฌ ํŒŒ์ผ์„ ๋„ฃ์—ˆ์„ ๋•Œ, ๋ฌด์Šจ ๋ถ์„ ์ณค๋Š” ์ง€ ๊ตฌ๋ถ„ํ•˜๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

input : 

Tom Sample 30.wav
0.51MB

 

output : Tom


01. ๊ธฐ๋ณธ ์ง€์‹

์Œํ–ฅ์˜ ํŠน์„ฑ์— ๋Œ€ํ•œ ๊ธฐ๋ณธ ์ง€์‹

๐Ÿ“Œ ์ŠคํŽ™ํŠธ๋Ÿผ(Spectrum)
์†Œ๋ฆฌ ์‹ ํ˜ธ๋ฅผ ์ฃผํŒŒ์ˆ˜์™€ ์ง„ํญ์œผ๋กœ ๋ถ„์„
ํ‘ธ๋ฆฌ์— ๋ณ€ํ™˜์„ ์ ์šฉํ•˜์—ฌ ์‹œ๊ฐ„ ์˜์—ญ์˜ ์‹ ํ˜ธ๋ฅผ ์ฃผํŒŒ์ˆ˜ ์˜์—ญ์œผ๋กœ ๋ณ€ํ™˜
์‹œ๊ฐ„ ์˜์—ญ & ์ฃผํŒŒ์ˆ˜ ์˜์—ญ ์‹œ๊ฐํ™” (X์ถ•: ์ฃผํŒŒ์ˆ˜, Y์ถ•: ์ง„ํญ) https://ratsgo.github.io/speechbook/docs/fe/ft

 

๐Ÿ“Œ ๋ฉœ ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ(Mel Spectrogram)
์ธ๊ฐ„์˜ ์ฒญ๊ฐ ์˜์—ญ์„ ๋ฐ˜์˜ํ•œ Mel scale์„ ์ ์šฉ
- ์ธ๊ฐ„์€ ๋ณดํ†ต ์ €์ฃผํŒŒ๋ฅผ ๋” ์ž˜ ์ธ์‹ํ•จ

๐Ÿ“Œ MFCC(Mel-Frequency Cepstral Coefficient)
์˜ค๋””์˜ค ์‹ ํ˜ธ์—์„œ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ๋Š” feature๋กœ, ์†Œ๋ฆฌ์˜ ๊ณ ์œ ํ•œ ํŠน์ง•์„ ๋‚˜ํƒ€๋‚ด๋Š” ์ˆ˜์น˜
์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ ์ƒ์„ฑ โžก๏ธ Mel scale ์ ์šฉ โžก๏ธ ๋ฉœ ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ ์ƒ์„ฑ โžก๏ธ ์บก์ŠคํŠธ๋Ÿด(Cepstral) ๋ถ„์„ โžก๏ธ MFCC ํŠน์„ฑ ์ถ”์ถœ

 


 

02. ์Œํ–ฅ์˜ ํŠน์„ฑ ์ถ”์ถœ

์Œํ–ฅ์˜ ํŠน์„ฑ์„ ๋ถ„์„ํ•ด์ฃผ๋Š” ์œ ์šฉํ•œ ํŒŒ์ด์ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ธ librosa๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

librosa

 

librosa โ€” librosa 0.10.1 documentation

ยฉ Copyright 2013--2023, librosa development team.

librosa.org

 

 

librosa๋ฅผ ํ†ตํ•ด mfcc๋ฅผ ์ถ”์ถœ ํ›„, numpy๋ฅผ ์ด์šฉํ•ด์„œ ํŠน์ง•์˜ width๋ฅผ ์กฐ์ ˆํ•ด์ค๋‹ˆ๋‹ค.

โœจ ์šฐ๋ฆฌ๊ฐ€ ์ถ”์ถœํ•˜๋Š” ๋ฐ์ดํ„ฐ์˜ shape์€ (40, 174) ์ด๊ณ , ์ด shape์€ ์–ผ๋งˆ๋“ ์ง€ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ฝ”๋“œ ๊ทธ๋Œ€๋กœ 2์ฐจ์›์œผ๋กœ ์ถ”์ถœํ•ด๋„ ๋˜๊ณ  shape์„ ์กฐ์ ˆํ•ด์„œ 1์ฐจ์›์œผ๋กœ ๋Š˜์–ด๋œจ๋ ค๋„ ๋ฉ๋‹ˆ๋‹ค.

root = wav_ํŒŒ์ผ์ด_์žˆ๋Š”_ํด๋”_path
test = root + .wav_name

max_pad_len = 174

def extract_feature(file_name):
    try:
        audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
        mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
        pad_width = max_pad_len - mfccs.shape[1]
        mfccs = np.pad(mfccs, pad_width=((0,0), (0, pad_width)), mode='constant')
        print(mfccs.shape)
        
    except Exception as e:
        print("Error encountered while parsing file: ", file_name)
        print(e)
        return None
    return mfccs

extract_feature(test) # ํ™•์ธ

ํŠน์„ฑ ์ถ”์ถœ ํ™•์ธ

ํ•œ wav ํŒŒ์ผ์„ ํ…Œ์ŠคํŠธ ํ•ด๋ณด๊ณ  shape (40, 174) ์ด ๋‚˜์˜ค๋Š” ๊ฑธ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค!

 

 

์ด์ œ ์—ฌ๋Ÿฌ wav ํŒŒ์ผ์„ ๋Œ€์ƒ์œผ๋กœ ํŠน์„ฑ์„ ์ถ”์ถœํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์šฐ์„  ์ €๋Š” Overhead, Snare, Tom, Bass ๋ฐ์ดํ„ฐ์…‹์„ 40๊ฐœ์”ฉ ์ค€๋น„ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

data๋Š” ๋ฆฌ๋ธŒ๋กœ์‚ฌ๋กœ ์ถ”์ถœํ•œ ํŠน์„ฑ mfcc
class_label์€ ๊ทธ ๋“œ๋Ÿผ์˜ ์ข…๋ฅ˜ (Overhead:1, Snare:2, Tom:3, Bass:4 ์ž„์˜๋กœ ์ •ํ•จ)
โžก๏ธ ์ฆ‰, ๋”ฅ๋Ÿฌ๋‹์ด ๋ถ„๋ฅ˜ํ•  ํด๋ž˜์Šค๋Š” 4๊ฐ€์ง€
root_path = wav_ํŒŒ์ผ๋“ค์ด_์žˆ๋Š”_ํด๋”_path
wav_list = os.listdir(root_path)
wav_files = [os.path.join(root_path, file) for file in wav_list if file.endswith('.wav')]
print(len(wav_files))

features = []
for wav_file in wav_files:      
    # data๋Š” ๋ฆฌ๋ธŒ๋กœ์‚ฌ๋กœ ์ถ”์ถœํ•œ mfccs๋ผ๋Š” ํŠน์„ฑ์ด๊ณ 
    # class_label์€ ๊ทธ ๋“œ๋Ÿผ์˜ ์ข…๋ฅ˜๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.
    data = extract_feature(wav_file)
    class_label = 0
    if 'Overhead' in wav_file:
        class_label = 1
    elif 'Snare' in wav_file:
        class_label = 2
    elif 'Tom' in wav_file:
        class_label = 3
    elif 'Bass' in wav_file:
        class_label = 4
    else:
        class_label = 0
    features.append([data, class_label])

# Convert into a Panda dataframe 
featuresdf = pd.DataFrame(features, columns=['feature','class_label'])

 

 

featuresdf์— Panda ํ˜•ํƒœ๋กœ ์ €์žฅ๋˜์—ˆ์Šต๋‹ˆ๋‹ค!

 


 

03. ํ›ˆ๋ จ(Train), ๊ฒ€์ฆ(Test) Dataset ์ƒ์„ฑ

featuresdf์˜ feature๋Š” X๋กœ, class_label์€ y๋กœ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

โœจ ์—ฌ๊ธฐ์„œ y๋Š” one-hot-encoding ๋ณ€ํ™˜์„ ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“Œone-hot-encoding
์˜ˆ๋ฅผ ๋“ค์–ด, ์ž์—ฐ์ˆ˜ 1, 2, 3 ์žˆ์„ ๋•Œ 1:[1.0.0] / 2:[0.1.0] / 3:[0.0.1] ์ด๋Ÿฐ ์‹์œผ๋กœ ๋ณ€ํ™˜
์ด๋ ‡๊ฒŒ ๋ณ€ํ™˜ํ•˜๋Š” ์ด์œ ๋Š”, ํ•ด๋‹น ๊ธ€์—์„œ์˜ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์ด ๋ฉ€ํ‹ฐ ํด๋ž˜์Šค(3~ ๊ฐ€์ง€) ๋ถ„๋ฅ˜๋ฅผ ํ•˜๊ธฐ ๋•Œ๋ฌธ
โžก๏ธ ์‚ฌ๋žŒ์ด ์ดํ•ดํ•˜๊ธฐ ์‰ฌ์šด ๋ฐ์ดํ„ฐ๋ฅผ ์ปดํ“จํ„ฐ๊ฐ€ ์ดํ•ดํ•˜๊ธฐ ์‰ฌ์šด ๋ฐ์ดํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ธฐ๋ณธ์ ์ธ ๋ฐฉ๋ฒ•
from keras.utils import to_categorical

X = np.array(featuresdf.feature.tolist())
y = np.array(featuresdf.class_label.tolist())

le = LabelEncoder()
yy = to_categorical(le.fit_transform(y))

 

 

ํ›ˆ๋ จ, ๊ฒ€์ฆ Dataset ๋น„์œจ์€ 8:2๋กœ ๋ถ„๋ฅ˜ํ–ˆ์Šต๋‹ˆ๋‹ค.

x_train, x_test, y_train, y_test = train_test_split(X, yy, test_size=0.2, random_state = 42)

# ํ™•์ธ
print(x_train.shape)
print(x_test.shape)
print(y[:10])
print(yy[:10])
print(y_test[:10])

๋ฐ์ดํ„ฐ์…‹ ํ™•์ธ

 

 

๋งˆ์ง€๋ง‰์œผ๋กœ ํ›ˆ๋ จ, ๊ฒ€์ฆ Dataset ๊ฐ๊ฐ์˜ input x ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด shape์„ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์— ๋„ฃ๊ธฐ ์œ„ํ•œ ๋ชจ์Šต์œผ๋กœ ๋ณ€ํ™˜ํ•ด ์ค๋‹ˆ๋‹ค.

n_columns = 174    
n_row = 40       
n_channels = 1
n_classes = 4

# input shape ์กฐ์ •
# cpu๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์ˆ˜ํ–‰
with tf.device('/cpu:0'):
    x_train = tf.reshape(x_train, [-1, n_row, n_columns, n_channels])
    x_test = tf.reshape(x_test, [-1, n_row, n_columns, n_channels])

 


 

04. ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ์ƒ์„ฑ

CNN์ด๋ผ๋Š” ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

model = keras.Sequential()

model.add(layers.Conv2D(input_shape=(n_row, n_columns, n_channels), filters=16, kernel_size=2, activation='relu'))
model.add(layers.MaxPooling2D(pool_size=2))
model.add(layers.Dropout(0.2))
model.add(layers.Conv2D(kernel_size=2, filters=32, activation='relu'))
model.add(layers.MaxPooling2D(pool_size=2))
model.add(layers.Dropout(0.2))

model.add(layers.Conv2D(kernel_size=2, filters=64, activation='relu'))
model.add(layers.MaxPooling2D(pool_size=2))
model.add(layers.Dropout(0.2))
model.add(layers.Conv2D(kernel_size=2, filters=128, activation='relu'))
model.add(layers.MaxPooling2D(pool_size=2))
model.add(layers.Dropout(0.2))

model.add(layers.GlobalAveragePooling2D())
model.add(tf.keras.layers.Dense(units=n_classes, activation='softmax'))

model.summary() # ๋ชจ๋ธ ํ™•์ธ

model ํ™•์ธ

 


 

05. ํ›ˆ๋ จ

ํ›ˆ๋ จ...

training_epochs = 72
num_batch_size = 128

learning_rate = 0.001
opt = keras.optimizers.Adam(learning_rate=learning_rate)

model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

history = model.fit(x_train, y_train, batch_size=num_batch_size, epochs=training_epochs) # ํ›ˆ๋ จ

 

train ์ง„ํ–‰

 

trainํ›„, ๋ฐ˜ํ™˜ํ•œ history๋ฅผ ๊ทธ๋ž˜ํ”„๋กœ ํ™•์ธํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

accuracy, loss ๋“ฑ ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

import matplotlib.pyplot as plt

def vis(history, key):
    x = np.arange(0, training_epochs)
    y = list(history.history[key])
    plt.plot(x, y)
    plt.title(key)
    
def plot_history(history) :
    # 0 accuracy, 1 loss
    key_value = list(set([i.split("val_")[-1] for i in list(history.history.keys())]))
    plt.figure(figsize=(12, 4))
    for idx , key in enumerate(key_value) :
        plt.subplot(1, len(key_value), idx+1)
        vis(history, key)
    plt.tight_layout()
    plt.show()
    
plot_history(history)

history (x: epochs, y: accuracy)

์•„์ฃผ ๊ตฟ์ž…๋‹ˆ๋‹ค.

 

 


 

06. ๋ชจ๋ธ ๊ฒ€์ฆ

print('\n# Evaluate on test data')

results = model.evaluate(x_test, y_test, batch_size=128)
print('test loss, test acc:', results)

 

๊ฒ€์ฆ ๊ฒฐ๊ณผ accuracy 80!

๋ฐ์ดํ„ฐ๊ฐ€ ๊ทธ๋ ‡๊ฒŒ ๋งŽ์ง€ ์•Š๊ณ , ์ƒ˜ํ”Œ ์ •๋„๋กœ๋งŒ ์žˆ์–ด์„œ ๊ทธ๋ฆฌ ์‹ ๋ขฐ์„ฑ ์žˆ๋Š” ๋ชจ๋ธ์€ ์•„๋‹ˆ์ง€๋งŒ..


07. ์˜ˆ์ธก

root = ์˜ˆ์ธกํ• _wav_ํŒŒ์ผ์ด_์žˆ๋Š”_ํด๋”_path
test = root + ์˜ˆ์ธกํ• _.wav_name

n_columns = 174    
n_row = 40       
n_channels = 1

# input shape ์กฐ์ •
# cpu๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์ˆ˜ํ–‰ํ•œ๋‹ค
test = np.array(extract_feature(test))
with tf.device('/cpu:0'):
    test = tf.reshape(test, [-1, n_row, n_columns, n_channels])

# Overhead:1, Snare:2, Tom:3, Bass:4
model.predict(test, batch_size=128)

๐Ÿ“Œ array([[overhead์ผ ํ™•๋ฅ , Snare์ผ ํ™•๋ฅ , Tom์ผ ํ™•๋ฅ , Bass์ผ ํ™•๋ฅ ]], dtype=float32)

Overhead.wav ํŒŒ์ผ์„ ๋„ฃ์—ˆ์„ ๋•Œ, ๊ฐ€์žฅ ๋†’์€ ํ™•๋ฅ ๋กœ Overhead๋กœ ์˜ˆ์ธกํ–ˆ์Šต๋‹ˆ๋‹ค!

 

๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.