[AI] Recognize Anything 사용해보기2

Recognize Anything 모델 테스트를 이어서 하나 더 진행했다.

테스트 과정

1. inference_tag2text.py 예제 코드를 가져온다.

'''
 * The Tag2Text Model
 * Written by Xinyu Huang
'''
import argparse
import numpy as np
import random

import torch

from PIL import Image
from ram.models import tag2text
from ram import inference_tag2text as inference
from ram import get_transform


parser = argparse.ArgumentParser(
    description='Tag2Text inferece for tagging and captioning')
parser.add_argument('--image',
                    metavar='DIR',
                    help='path to dataset',
                    default='images/1641173_2291260800.jpg')
parser.add_argument('--pretrained',
                    metavar='DIR',
                    help='path to pretrained model',
                    default='pretrained/tag2text_swin_14m.pth')
parser.add_argument('--image-size',
                    default=384,
                    type=int,
                    metavar='N',
                    help='input image size (default: 384)')
parser.add_argument('--thre',
                    default=0.68,
                    type=float,
                    metavar='N',
                    help='threshold value')
parser.add_argument('--specified-tags',
                    default='None',
                    help='User input specified tags')


if __name__ == "__main__":

    args = parser.parse_args()

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    transform = get_transform(image_size=args.image_size)

    # delete some tags that may disturb captioning
    # 127: "quarter"; 2961: "back", 3351: "two"; 3265: "three"; 3338: "four"; 3355: "five"; 3359: "one"
    delete_tag_index = [127,2961, 3351, 3265, 3338, 3355, 3359]

    #######load model
    model = tag2text(pretrained=args.pretrained,
                             image_size=args.image_size,
                             vit='swin_b',
                             delete_tag_index=delete_tag_index)
    model.threshold = args.thre  # threshold for tagging
    model.eval()

    model = model.to(device)

    image = transform(Image.open(args.image)).unsqueeze(0).to(device)

    res = inference(image, model, args.specified_tags)
    print("Model Identified Tags: ", res[0])
    print("User Specified Tags: ", res[1])
    print("Image Caption: ", res[2])

2. model을 다운받는다.

https://huggingface.co/spaces/xinyu1205/recognize-anything/resolve/main/tag2text_swin_14m.pth

3. 완성 코드

처음 테스트와 마찬가지로 샘플 이미지는 깃허브에서 다운받았다.

"""
 * The Tag2Text Model
 * Written by Xinyu Huang
"""

# STEP 1 : Import modules
import argparse
import numpy as np
import random
import torch
from PIL import Image
from ram.models import tag2text
from ram import inference_tag2text as inference
from ram import get_transform

# STEP 2 : Configuration and model setup
device = torch.device(
    "cuda" if torch.cuda.is_available() else "cpu"
)  # Check device (GPU or CPU)

# Model and image configurations
model_path = "tag2text_swin_14m.pth"  # Path to the pretrained model
image_path = "test.jpg"  # Path to the input image
image_size = 384  # Input image size
thre = 0.68  # Threshold for tagging confidence
specified_tags = ""  # User-specified tags (optional)

# Transform configuration
transform = get_transform(image_size=image_size)

# Tags to delete for cleaner captioning
# 127: "quarter"; 2961: "back", 3351: "two"; 3265: "three"; 3338: "four"; 3355: "five"; 3359: "one"
delete_tag_index = [127, 2961, 3351, 3265, 3338, 3355, 3359]

# Load the model
model = tag2text(
    pretrained=model_path,
    image_size=image_size,
    vit="swin_b",
    delete_tag_index=delete_tag_index,
)
model.threshold = thre  # Set tagging threshold
model.eval()
model = model.to(device)  # Move the model to the appropriate device (GPU or CPU)

# STEP 3 : Load and preprocess image
image = (
    transform(Image.open(image_path)).unsqueeze(0).to(device)
)  # Preprocess image and move to device

# STEP 4 : Perform inference
res = inference(image, model, specified_tags)

# STEP 5 : Post-processing and results
print("Model Identified Tags: ", res[0])  # Tags identified by the model
print("User Specified Tags: ", res[1])  # Tags specified by the user (if any)
print("Image Caption: ", res[2])  # Caption generated by the model

그 외에도 많은 모델들이 있지만 아직 테스트를 해보진 못했다.

비동기식 음성인식 오픈소스 (https://github.com/schibsted/WAAS)

GitHub - schibsted/WAAS: Whisper as a Service (GUI and API with queuing for OpenAI Whisper)

Whisper as a Service (GUI and API with queuing for OpenAI Whisper) - schibsted/WAAS

github.com

문장 유사도 오픈소스 (https://sbert.net/)

SentenceTransformers Documentation — Sentence Transformers documentation

Note Sentence Transformers v3.2 recently released, introducing the ONNX and OpenVINO backends for Sentence Transformer models. Read SentenceTransformer > Usage > Speeding up Inference to learn more about the new backends and what they can mean for your inf

sbert.net

'AI' 카테고리의 다른 글

[AI] LangChain이란 무엇인가? (0)	2025.01.12
[AI] 텍스트 요약 모델(Summarization) 사용해보기 (0)	2025.01.10
[AI] Recognize Anything 사용해보기1 (0)	2025.01.09
[AI] EazyOCR 사용해보기 (0)	2025.01.09
[AI] insightface 사용해보기 (0)	2025.01.08

테스트 과정

'AI' 카테고리의 다른 글

티스토리툴바