[컴퓨터 비전의 모든것] Annotation Efficient Learning

[컴퓨터 비전의 모든것] Annotation Efficient Learning

2024. 12. 20. 02:03ㆍMOOC

Leveraging Pre-trained Information

Transfer Learning

정의: Transfer Learning은 한 데이터셋에서 학습한 지식을 다른 데이터셋에 전이시켜, 적은 데이터로도 높은 성능을 달성할 수 있는 방법이다. 새로운 task를 수행할 때 데이터 레이블링의 시간적, 금전적 부담과 품질 문제를 극복하기 위해 사용된다.

첫 번째 방식: 사전 학습된 모델에서 Fully Connected (FC) Layer를 제거하고, 새로운 FC Layer를 추가한 후 새 레이어만 학습(freeze)하는 방법이다. 기존의 Convolution Layer는 학습되지 않으므로 작은 데이터셋에서도 효과적으로 사용할 수 있다.
기존 모델에서 FC Layer를 교체하고 새 FC Layer를 학습하는 구조

import tensorflow as tf
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam

# 사전 학습된 모델 로드 (ImageNet으로 학습된 ResNet50)
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# 기존 Convolution Layer 고정
base_model.trainable = False

# 새로운 Fully Connected Layer 추가
global_avg_pool = tf.keras.layers.GlobalAveragePooling2D()(base_model.output)
new_fc_layer = Dense(128, activation='relu')(global_avg_pool)
out_layer = Dense(10, activation='softmax')(new_fc_layer)  # 예: 10개 클래스 분류

# 새로운 모델 정의
model = Model(inputs=base_model.input, outputs=out_layer)

# 모델 컴파일
model.compile(optimizer=Adam(learning_rate=0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# 데이터 준비 (예제 데이터셋 CIFAR-10 사용)
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = tf.image.resize(x_train, (224, 224)) / 255.0
x_test = tf.image.resize(x_test, (224, 224)) / 255.0
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# 모델 학습
model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=10, batch_size=32)

# 모델 평가
loss, accuracy = model.evaluate(x_test, y_test)
print(f"Test Loss: {loss}, Test Accuracy: {accuracy}")

두 번째 방식: 전체 모델을 Fine-tuning하는 방법이다. 기존의 Convolution Layer와 FC Layer 모두 학습이 진행된다. 다만 Convolution Layer의 학습 속도(learning rate)는 FC Layer보다 낮게 설정하여 사전 학습된 가중치를 보존하면서 새로운 데이터에 적합하도록 조정한다.
전체 모델을 Fine-tuning하면서 Convolution Layer와 FC Layer의 학습 속도를 조정하는 구조

import tensorflow as tf
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam

# 사전 학습된 모델 로드 (Imagenet 가중치 사용)
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# 새로운 레이어 추가
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)

# 새로운 모델 생성
model = Model(inputs=base_model.input, outputs=predictions)

# 모든 레이어 학습 가능하도록 설정
for layer in base_model.layers:
    layer.trainable = True

# FC Layer와 Convolution Layer의 다른 learning rate 설정
optimizer = Adam(learning_rate=1e-4)  # 기본 learning rate
optimizer_fine_tune = Adam(learning_rate=1e-5)  # Convolution Layer를 위한 낮은 learning rate

# 모델 컴파일
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

# 데이터셋 준비 (CIFAR-10 예시)
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
x_train = tf.image.resize(x_train, (224, 224)) / 255.0
x_test = tf.image.resize(x_test, (224, 224)) / 255.0

y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)

# 모델 학습
history = model.fit(
    x_train, y_train,
    validation_data=(x_test, y_test),
    epochs=10,
    batch_size=32
)

Knowledge Distillation

정의 : Knowledge Distillation은 이미 학습된 Teacher 모델의 지식을 Student 모델에 전달하여 학습시키는 방법이다. 이 방법은 Teacher 모델의 복잡한 지식을 간단한 Student 모델에 효과적으로 주입하거나 pseudo-labeling에 활용된다.

과정:
1. Teacher 모델을 사전 학습한다.
2. 동일한 입력 데이터를 Teacher 모델과 Student 모델에 입력하여 각 출력을 생성한다.
3. Teacher 모델과 Student 모델의 출력 간 KL divergence loss를 계산하여 Student 모델이 Teacher 모델의 행동을 모방하도록 학습한다.
4. Teacher 모델이 Student 모델에 지식을 전달하는 기본 구조

# Teacher 모델 정의 (복잡한 모델)
teacher_model = Sequential([
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(128, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

# Teacher 모델 컴파일 및 학습
teacher_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# MNIST 데이터 로드 및 전처리
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
x_train = np.expand_dims(x_train, -1)  # 채널 추가
x_test = np.expand_dims(x_test, -1)
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)

# Teacher 모델 학습
teacher_model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=5, batch_size=64)

# Teacher 모델의 soft label 생성
temperature = 5.0  # Softmax temperature
teacher_logits = teacher_model.predict(x_train)
soft_labels = tf.nn.softmax(teacher_logits / temperature).numpy()

# Student 모델 정의 (간단한 모델)
student_model = Sequential([
    Flatten(input_shape=(28, 28, 1)),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])

# Knowledge Distillation loss 함수 정의
def distillation_loss(y_true, y_pred, teacher_output, temperature):
    y_true_loss = tf.keras.losses.categorical_crossentropy(y_true, y_pred)
    soft_loss = tf.keras.losses.categorical_crossentropy(
        tf.nn.softmax(teacher_output / temperature),
        tf.nn.softmax(y_pred / temperature)
    )
    return y_true_loss + temperature**2 * soft_loss

# Student 모델 학습
optimizer = Adam(learning_rate=0.001)
for epoch in range(5):
    for i in range(0, len(x_train), 64):
        x_batch = x_train[i:i+64]
        y_batch = y_train[i:i+64]
        soft_batch = soft_labels[i:i+64]
        with tf.GradientTape() as tape:
            student_logits = student_model(x_batch, training=True)
            loss = distillation_loss(y_batch, student_logits, soft_batch, temperature)
        grads = tape.gradient(loss, student_model.trainable_variables)
        optimizer.apply_gradients(zip(grads, student_model.trainable_variables))

# Student 모델 평가
student_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
student_model.evaluate(x_test, y_test)

Loss 계산:
- Distillation Loss: Teacher 모델의 softmax 출력을 활용하여 Student 모델이 Teacher의 행동을 모방하도록 유도한다.
- Student Loss: Ground truth와 Student 모델의 출력을 기반으로 cross-entropy loss를 계산한다.
- Distillation Loss와 Student Loss를 활용한 전체 학습 구조

Leveraging Unlabeled Dataset for Training

정의 : 레이블링된 데이터가 한정적인 상황에서 레이블링되지 않은 데이터를 활용해 성능을 개선하는 방법이다. 대표적으로 Semi-supervised Learning과 Self-training이 있다.

Semi-supervised Learning

정의 : Semi-supervised Learning은 레이블링된 데이터를 학습한 모델을 활용하여 레이블링되지 않은 데이터에 pseudo-label을 생성하고, 이를 학습에 활용하여 데이터 부족 문제를 해결하는 기법이다.

Self-training

정의 : Self-training은 Teacher 모델을 학습시킨 뒤 pseudo-labeling으로 생성된 데이터를 활용하여 Student 모델을 학습시키는 방법이다. 이 과정에서 RandAugment를 적용하여 데이터를 증강시킨다.

과정:
1. Teacher 모델을 학습시킨다.
2. Teacher 모델로 레이블링되지 않은 데이터에 pseudo-label을 생성한다.
3. pseudo-labeled 데이터와 레이블링된 데이터를 결합한 뒤 RandAugment로 증강한다.
4. Student 모델을 학습시킨다.
5. Student 모델을 Teacher 모델로 대체하고 반복한다.
6. Self-training의 첫 번째 반복 과정
7. Self-training에서 모델을 반복적으로 업데이트하는 구조

'MOOC' 카테고리의 다른 글

[컴퓨터 비전의 모든 것] Image Classification (3) : 모델 비교 (0)	2024.12.20
[컴퓨터 비전의 모든 것] Image Classification (2) : 대표 모델 (0)	2024.12.20
[컴퓨터 비전의 모든 것] Data Augmentation (2)	2024.12.20
[컴퓨터 비전의 모든 것]Image Classification (1) : 개념 (0)	2024.12.20
[컴퓨터 비전의 모든 것] Computer Vision 이란? (2)	2024.12.19

ParkS2.tistory

ParkS2.tistory

태그

최근글

댓글

공지사항

아카이브

Leveraging Pre-trained Information

Transfer Learning

Knowledge Distillation

Leveraging Unlabeled Dataset for Training

'MOOC' 카테고리의 다른 글

관련글

티스토리툴바