[AI] Recognize Anything 사용해보기2

2025. 1. 9. 23:23·AI

Recognize Anything 모델 테스트를 이어서 하나 더 진행했다.

 

테스트 과정

 

1. inference_tag2text.py 예제 코드를 가져온다.

'''
 * The Tag2Text Model
 * Written by Xinyu Huang
'''
import argparse
import numpy as np
import random

import torch

from PIL import Image
from ram.models import tag2text
from ram import inference_tag2text as inference
from ram import get_transform


parser = argparse.ArgumentParser(
    description='Tag2Text inferece for tagging and captioning')
parser.add_argument('--image',
                    metavar='DIR',
                    help='path to dataset',
                    default='images/1641173_2291260800.jpg')
parser.add_argument('--pretrained',
                    metavar='DIR',
                    help='path to pretrained model',
                    default='pretrained/tag2text_swin_14m.pth')
parser.add_argument('--image-size',
                    default=384,
                    type=int,
                    metavar='N',
                    help='input image size (default: 384)')
parser.add_argument('--thre',
                    default=0.68,
                    type=float,
                    metavar='N',
                    help='threshold value')
parser.add_argument('--specified-tags',
                    default='None',
                    help='User input specified tags')


if __name__ == "__main__":

    args = parser.parse_args()

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    transform = get_transform(image_size=args.image_size)

    # delete some tags that may disturb captioning
    # 127: "quarter"; 2961: "back", 3351: "two"; 3265: "three"; 3338: "four"; 3355: "five"; 3359: "one"
    delete_tag_index = [127,2961, 3351, 3265, 3338, 3355, 3359]

    #######load model
    model = tag2text(pretrained=args.pretrained,
                             image_size=args.image_size,
                             vit='swin_b',
                             delete_tag_index=delete_tag_index)
    model.threshold = args.thre  # threshold for tagging
    model.eval()

    model = model.to(device)

    image = transform(Image.open(args.image)).unsqueeze(0).to(device)

    res = inference(image, model, args.specified_tags)
    print("Model Identified Tags: ", res[0])
    print("User Specified Tags: ", res[1])
    print("Image Caption: ", res[2])

 

2. model을 다운받는다.

https://huggingface.co/spaces/xinyu1205/recognize-anything/resolve/main/tag2text_swin_14m.pth

 

3. 완성 코드

처음 테스트와 마찬가지로 샘플 이미지는 깃허브에서 다운받았다.

테스트 사진

"""
 * The Tag2Text Model
 * Written by Xinyu Huang
"""

# STEP 1 : Import modules
import argparse
import numpy as np
import random
import torch
from PIL import Image
from ram.models import tag2text
from ram import inference_tag2text as inference
from ram import get_transform

# STEP 2 : Configuration and model setup
device = torch.device(
    "cuda" if torch.cuda.is_available() else "cpu"
)  # Check device (GPU or CPU)

# Model and image configurations
model_path = "tag2text_swin_14m.pth"  # Path to the pretrained model
image_path = "test.jpg"  # Path to the input image
image_size = 384  # Input image size
thre = 0.68  # Threshold for tagging confidence
specified_tags = ""  # User-specified tags (optional)

# Transform configuration
transform = get_transform(image_size=image_size)

# Tags to delete for cleaner captioning
# 127: "quarter"; 2961: "back", 3351: "two"; 3265: "three"; 3338: "four"; 3355: "five"; 3359: "one"
delete_tag_index = [127, 2961, 3351, 3265, 3338, 3355, 3359]

# Load the model
model = tag2text(
    pretrained=model_path,
    image_size=image_size,
    vit="swin_b",
    delete_tag_index=delete_tag_index,
)
model.threshold = thre  # Set tagging threshold
model.eval()
model = model.to(device)  # Move the model to the appropriate device (GPU or CPU)

# STEP 3 : Load and preprocess image
image = (
    transform(Image.open(image_path)).unsqueeze(0).to(device)
)  # Preprocess image and move to device

# STEP 4 : Perform inference
res = inference(image, model, specified_tags)

# STEP 5 : Post-processing and results
print("Model Identified Tags: ", res[0])  # Tags identified by the model
print("User Specified Tags: ", res[1])  # Tags specified by the user (if any)
print("Image Caption: ", res[2])  # Caption generated by the model

 

 

그 외에도 많은 모델들이 있지만 아직 테스트를 해보진 못했다.

  • 비동기식 음성인식 오픈소스 (https://github.com/schibsted/WAAS)
 

GitHub - schibsted/WAAS: Whisper as a Service (GUI and API with queuing for OpenAI Whisper)

Whisper as a Service (GUI and API with queuing for OpenAI Whisper) - schibsted/WAAS

github.com

 

  • 문장 유사도 오픈소스 (https://sbert.net/)
 

SentenceTransformers Documentation — Sentence Transformers documentation

Note Sentence Transformers v3.2 recently released, introducing the ONNX and OpenVINO backends for Sentence Transformer models. Read SentenceTransformer > Usage > Speeding up Inference to learn more about the new backends and what they can mean for your inf

sbert.net

 

'AI' 카테고리의 다른 글

[AI] LangChain이란 무엇인가?  (0) 2025.01.12
[AI] 텍스트 요약 모델(Summarization) 사용해보기  (0) 2025.01.10
[AI] Recognize Anything 사용해보기1  (0) 2025.01.09
[AI] EazyOCR 사용해보기  (0) 2025.01.09
[AI] insightface 사용해보기  (0) 2025.01.08
'AI' 카테고리의 다른 글
  • [AI] LangChain이란 무엇인가?
  • [AI] 텍스트 요약 모델(Summarization) 사용해보기
  • [AI] Recognize Anything 사용해보기1
  • [AI] EazyOCR 사용해보기
dud9902
dud9902
개발자 취준생 기록일지
  • dud9902
    dud's DevStory
    dud9902
  • 전체
    오늘
    어제
    • 분류 전체보기 (79)
      • SpringBoot (14)
      • React (12)
      • Python (14)
      • AI (21)
      • DB (5)
      • Figma (1)
      • Markdown (1)
      • AWS (6)
      • 기타 (5)
  • 블로그 메뉴

    • 홈
    • 태그
    • 방명록
  • 링크

  • 공지사항

  • 인기 글

  • 태그

    SQLAlchemy
    db
    AWS
    redis
    스프링부트
    miniforge
    langchain
    pytorch
    AI
    springboot
    Agent
    Python
    FastAPI
    CrewAI
    twilio
    react
    의존성 주입
    docker
    pydantic
    recognize anything
  • 최근 댓글

  • 최근 글

  • hELLO· Designed By정상우.v4.10.3
dud9902
[AI] Recognize Anything 사용해보기2
상단으로

티스토리툴바