In a crisis situation, it is expected that a user's mobile device may not be connected to the Internet or advanced AI services, and there is a need for technology that can be used even in such cases. In this paper, we demonstrate through computer experiments that a method using BLIP-2, a visual language technology, is promising in terms of accuracy as a method for locally determining whether a subject in a photo is related to a disaster. The proposed method focuses on embedding technology, which is a preprocessing method for LLM. When LLM is not available, we thought that using its vectors as the basis for processing is one of the optimal methods. BLIP-2 is used to vectorize the photo, and the judgment is made by comparing it with the built-in database using the k-nearest neighbor method. Computer experiments show that when an appropriate database is used, this method outperforms other methods in terms of accuracy and F1 value.
Kubo et al. (Thu,) studied this question.