[Project] VPT를 이용하여 Segmentation 모델 만들기

Notice

Recent Posts

Recent Comments

Link

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

Logical Scribbles

[Project] VPT를 이용하여 Segmentation 모델 만들기 본문

Projects

[Project] VPT를 이용하여 Segmentation 모델 만들기

KimJake 2023. 12. 30. 15:16

코드는 아래 깃허브에서 확인할 수 있습니다.

GitHub - KIM-JAKE/Segmentation-Using-VPT

Contribute to KIM-JAKE/Segmentation-Using-VPT development by creating an account on GitHub.

github.com

Preliminary

Visual Prompt Tuning

The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, ie, full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Tr

arxiv.org

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

Medical image segmentation is an essential prerequisite for developing healthcare systems, especially for disease diagnosis and treatment planning. On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the

arxiv.org

이번 프로젝트에서는 Visual Prompt Tuning 논문을 응용하여 segmentation 모델을 만들어 보았다. 기존 VPT 논문에서는 classification task를 주로 수행하고 있지만, 나는 classification head를 삭제하고 segmentation을 위한 head를 새로 추가하였다. (기존 VPT 논문에 따르면 segmentation도 실험을 진행한 것 같지만, 구체적으로 서술되어 있지 않다.)

TransUNet에서 소개된 Upsampling CNN 구조를 이용하여 task specific head를 만들어보았으며, 데이터셋으로는 PASCAL VOC2012를 사용하였다.

The PASCAL Visual Object Classes Challenge 2012 (VOC2012)

2006 10 classes: bicycle, bus, car, cat, cow, dog, horse, motorbike, person, sheep. Train/validation/test: 2618 images containing 4754 annotated objects. Images from flickr and from Microsoft Research Cambridge (MSRC) dataset The MSRC images were easier th

host.robots.ox.ac.uk

그 결과 기존 모델( ViT_base_patch16_224)의 파라미터 중 약 0.78%를 사용하여 segmentation이 가능했다.

• TOP1 Accuracy : mIoU 58.7%

• ViT_base_patch16_224, deep, batch = 32, prompt_ token = 5

• Max lr = 0.001, 에포크 = 32, weight_ decay = 0.0001

• For training : Cross Entropy, Adam W, OneCycleLR

아래는 다양한 이미지에 대한 segmetation 이미지 예시다. 비록 결과는 SOTA에 크게 못 미쳤지만 VPT의 구조를 좀 더 면밀하게 이해하여 공부할 수 있었고, 헤드를 변경하고 데이터셋을 전처리하는 과정에서 많은 것을 배웠다!

'Projects' 카테고리의 다른 글

[Project] 자연어 처리와 Graph 이론을 이용한 Twilight 인물 Network 분석 (0)	2023.12.30
[Project] zero-Shot Photo Frame Recommendation Using Clustering Algorithms (0)	2023.12.30

'Projects' Related Articles

Logical Scribbles

[Project] VPT를 이용하여 Segmentation 모델 만들기 본문

[Project] VPT를 이용하여 Segmentation 모델 만들기

'Projects' 카테고리의 다른 글

티스토리툴바