Recognize Anything 모델 테스트를 이어서 하나 더 진행했다.
테스트 과정
1. inference_tag2text.py 예제 코드를 가져온다.
'''
* The Tag2Text Model
* Written by Xinyu Huang
'''
import argparse
import numpy as np
import random
import torch
from PIL import Image
from ram.models import tag2text
from ram import inference_tag2text as inference
from ram import get_transform
parser = argparse.ArgumentParser(
description='Tag2Text inferece for tagging and captioning')
parser.add_argument('--image',
metavar='DIR',
help='path to dataset',
default='images/1641173_2291260800.jpg')
parser.add_argument('--pretrained',
metavar='DIR',
help='path to pretrained model',
default='pretrained/tag2text_swin_14m.pth')
parser.add_argument('--image-size',
default=384,
type=int,
metavar='N',
help='input image size (default: 384)')
parser.add_argument('--thre',
default=0.68,
type=float,
metavar='N',
help='threshold value')
parser.add_argument('--specified-tags',
default='None',
help='User input specified tags')
if __name__ == "__main__":
args = parser.parse_args()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
transform = get_transform(image_size=args.image_size)
# delete some tags that may disturb captioning
# 127: "quarter"; 2961: "back", 3351: "two"; 3265: "three"; 3338: "four"; 3355: "five"; 3359: "one"
delete_tag_index = [127,2961, 3351, 3265, 3338, 3355, 3359]
#######load model
model = tag2text(pretrained=args.pretrained,
image_size=args.image_size,
vit='swin_b',
delete_tag_index=delete_tag_index)
model.threshold = args.thre # threshold for tagging
model.eval()
model = model.to(device)
image = transform(Image.open(args.image)).unsqueeze(0).to(device)
res = inference(image, model, args.specified_tags)
print("Model Identified Tags: ", res[0])
print("User Specified Tags: ", res[1])
print("Image Caption: ", res[2])
2. model을 다운받는다.
https://huggingface.co/spaces/xinyu1205/recognize-anything/resolve/main/tag2text_swin_14m.pth
3. 완성 코드
처음 테스트와 마찬가지로 샘플 이미지는 깃허브에서 다운받았다.
"""
* The Tag2Text Model
* Written by Xinyu Huang
"""
# STEP 1 : Import modules
import argparse
import numpy as np
import random
import torch
from PIL import Image
from ram.models import tag2text
from ram import inference_tag2text as inference
from ram import get_transform
# STEP 2 : Configuration and model setup
device = torch.device(
"cuda" if torch.cuda.is_available() else "cpu"
) # Check device (GPU or CPU)
# Model and image configurations
model_path = "tag2text_swin_14m.pth" # Path to the pretrained model
image_path = "test.jpg" # Path to the input image
image_size = 384 # Input image size
thre = 0.68 # Threshold for tagging confidence
specified_tags = "" # User-specified tags (optional)
# Transform configuration
transform = get_transform(image_size=image_size)
# Tags to delete for cleaner captioning
# 127: "quarter"; 2961: "back", 3351: "two"; 3265: "three"; 3338: "four"; 3355: "five"; 3359: "one"
delete_tag_index = [127, 2961, 3351, 3265, 3338, 3355, 3359]
# Load the model
model = tag2text(
pretrained=model_path,
image_size=image_size,
vit="swin_b",
delete_tag_index=delete_tag_index,
)
model.threshold = thre # Set tagging threshold
model.eval()
model = model.to(device) # Move the model to the appropriate device (GPU or CPU)
# STEP 3 : Load and preprocess image
image = (
transform(Image.open(image_path)).unsqueeze(0).to(device)
) # Preprocess image and move to device
# STEP 4 : Perform inference
res = inference(image, model, specified_tags)
# STEP 5 : Post-processing and results
print("Model Identified Tags: ", res[0]) # Tags identified by the model
print("User Specified Tags: ", res[1]) # Tags specified by the user (if any)
print("Image Caption: ", res[2]) # Caption generated by the model
그 외에도 많은 모델들이 있지만 아직 테스트를 해보진 못했다.
- 비동기식 음성인식 오픈소스 (https://github.com/schibsted/WAAS)
GitHub - schibsted/WAAS: Whisper as a Service (GUI and API with queuing for OpenAI Whisper)
Whisper as a Service (GUI and API with queuing for OpenAI Whisper) - schibsted/WAAS
github.com
- 문장 유사도 오픈소스 (https://sbert.net/)
SentenceTransformers Documentation — Sentence Transformers documentation
Note Sentence Transformers v3.2 recently released, introducing the ONNX and OpenVINO backends for Sentence Transformer models. Read SentenceTransformer > Usage > Speeding up Inference to learn more about the new backends and what they can mean for your inf
sbert.net
'AI' 카테고리의 다른 글
[AI] LangChain이란 무엇인가? (0) | 2025.01.12 |
---|---|
[AI] 텍스트 요약 모델(Summarization) 사용해보기 (0) | 2025.01.10 |
[AI] Recognize Anything 사용해보기1 (0) | 2025.01.09 |
[AI] EazyOCR 사용해보기 (0) | 2025.01.09 |
[AI] insightface 사용해보기 (0) | 2025.01.08 |