Exploring the Segment Anything Model: Object Detection Analysis

Chapter 1: Introduction to the Segment Anything Model

In my recent review of the Segment Anything Model (SAM), I discovered that beyond the vit-h (huge) variant, there are two additional configurations referred to as vit-b (base) and vit-l (large). The primary distinction among these three modes relates to their transformer layer configurations, attention heads, and hidden dimension sizes.

I was curious whether these variations would influence the number of objects each SAM mode could identify within the same image. To investigate this, I applied all three versions (vit-h, vit-l, and vit-b) to segment a Landsat-8 image taken over an agricultural region in California, aiming to count the number of delineated blocks. Subsequently, I compared the detected number of blocks against the actual count visible in the image.

If you're intrigued by this comparison and wish to replicate my process, I encourage you to read further.

Section 1.1: Downloading an RGB Landsat-8 Image

To implement different SAM modes on satellite imagery, I utilized a Landsat-8 image, which can be downloaded through the following link. For those wanting to use the same image, refer to the section titled "📥 Download Satellite Images in GeoTIFF Format for Your AOI."

After securing the image, execute the following code to create three directories in your workspace: Landsat-8, Image, and SAM_outputs:

import os

# Folder names

folders = ['Landsat-8', 'Image', 'SAM_outputs']

# Create the folders

for folder in folders:

if not os.path.exists(folder):

os.makedirs(folder)

The Landsat-8 directory will house the satellite image in GeoTIFF format, the Image folder will store the PNG file after visualizing the satellite image in RGB, and the SAM_outputs folder will keep the segmented satellite images resulting from the three SAM modes (vit-h, vit-l, vit-b).

Next, we will install the rasterio library, read each band, create a stacked array, visualize the stacked layer using Matplotlib, and save it in both GeoTIFF and PNG formats in the respective folders:

pip install rasterio

import rasterio

import numpy as np

from rasterio.plot import show

import matplotlib.pyplot as plt

# Open B2, B3, and B4 Landsat-8 raster files

b4 = rasterio.open('/content/Landsat-8/L08.002_SR_B4_CU_doy2023166_aid0001.tif')

b3 = rasterio.open('/content/Landsat-8/L08.002_SR_B3_CU_doy2023166_aid0001.tif')

b2 = rasterio.open('/content/Landsat-8/L08.002_SR_B2_CU_doy2023166_aid0001.tif')

# Read the data from the raster files

b4_data = b4.read(1)

b3_data = b3.read(1)

b2_data = b2.read(1)

# Stack the bands into a single 3D array

rgb = np.stack((b4_data, b3_data, b2_data), axis=0)

# Normalize the data to the 0-1 range

rgb = rgb / np.max(rgb)

# Display the RGB map

show(rgb)

# Save the RGB map as a PNG file

plt.imsave('/content/Image/Landsat-8.png', np.transpose(rgb, (1, 2, 0)))

# Export the stacked data as a new GeoTIFF file

with rasterio.open('/content/Landsat-8/Landsat-8_2023166_RGB.tif', 'w', driver='GTiff', height=b2.height, width=b2.width, count=3, dtype=rgb.dtype, crs=b2.crs, transform=b2.transform) as dst:

dst.write(rgb, indexes=[1, 2, 3])

The resulting plot will display a Landsat-8 image of California captured in June 2023.

Section 1.2: Downloading LandIQ Map

For an accurate block count within the Landsat-8 image, we will utilize a vector layer that outlines actual block boundaries. A good source for this data is the shapefile of agricultural blocks from LandIQ. You can download this layer from their website:

By opening it in QGIS and checking the row count in the attribute table, you can determine the precise number of blocks represented in the Landsat-8 image. The maximum ID number, which is 214, indicates the total count of blocks in this image.

Chapter 2: Applying SAM Modes on the Landsat-8 Image

The first video titled "SAM - Segment Anything Model by Meta AI: Complete Guide | Python Setup & Applications" provides a comprehensive overview of the SAM and its applications.

To analyze the various SAM modes on the Landsat-8 image, we'll write Python code. If you're unfamiliar with SAM or how to implement it in Python, please refer to this post:

To apply the different SAM modes to the satellite image, clone the SAM repository from Git, install the supervision library, and download the checkpoints for SAM vit-h, vit-l, and vit-b:

!pip install supervision

%cd segment-anything

Download the models:

In this step, we will set up the Google Colab environment to utilize the GPU if available, import the checkpoints, and create a Python script to detect objects in the satellite image using each SAM mode. The segmented images will be saved in PNG format, iterating through SAM's vit-h, vit-l, and vit-b modes:

import torch

import cv2

import supervision as sv

DEVICE = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

MODEL_TYPE = ["vit_h","vit_l","vit_b"]

CHECKPOINT_PATH =["./sam_vit_h_4b8939.pth","sam_vit_l_0b3195.pth","sam_vit_b_01ec64.pth"]

from segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor

for model in MODEL_TYPE:

if model =='vit_h':

CHECKPOINT_PATH = f"./sam_{model}_4b8939.pth"

elif model =='vit_l':

CHECKPOINT_PATH = f"./sam_{model}_0b3195.pth"

else:

CHECKPOINT_PATH = f"./sam_{model}_01ec64.pth"

sam = sam_model_registry[model](checkpoint=CHECKPOINT_PATH).to(device=DEVICE)

mask_generator = SamAutomaticMaskGenerator(sam)

IMAGE_PATH = "/content/Image/Landsat-8.png"

image_bgr = cv2.imread(IMAGE_PATH)

image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)

sam_result = mask_generator.generate(image_rgb)

mask_annotator = sv.MaskAnnotator(color_lookup = sv.ColorLookup.INDEX)

detections = sv.Detections.from_sam(sam_result=sam_result)

num_segments = len(detections)

print(f"Number of segments detected based on Model {model}: {num_segments} ")

annotated_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)

# Convert the annotated_image from RGB to BGR format

annotated_image_bgr = cv2.cvtColor(annotated_image, cv2.COLOR_RGB2BGR)

# Resize the annotated image

scale_factor = 2 # Adjust this value to control the size of the saved image

new_width = int(annotated_image_bgr.shape[1] * scale_factor)

new_height = int(annotated_image_bgr.shape[0] * scale_factor)

new_size = (new_width, new_height)

resized_annotated_image = cv2.resize(annotated_image_bgr, new_size, interpolation=cv2.INTER_LINEAR)

# Save the annotated_image as a PNG file

cv2.imwrite(f"/content/SAM_outputs/Segmented_by_{model}.png", resized_annotated_image)

sv.plot_images_grid(

images=[image_bgr, annotated_image],

grid_size=(1, 2),

titles=['Landsat-8 ', f'Segmented Image by Model: {model}']

)

The second video titled "Segment Anything 2 (SAM 2): Meta AI's Newest Model | Community Q&A (Jul 30) - YouTube" delves into the latest iterations of the SAM and its functionalities.

Section 2.1: Visualizing the Segmented Images

Upon inspecting the segmented images saved in PNG format, you will find the original satellite image on the left and the corresponding segmented image on the right for each SAM mode:

The satellite image vs. the segmented image by SAM vit-h
The satellite image vs. the segmented image by SAM vit-l
The satellite image vs. the segmented image by SAM vit-b

From the visual assessment, it appears that the performance of vit-h and vit-l is quite similar, indicating that the number of detected blocks in each case should be comparable. However, it is evident that SAM struggled to identify many blocks when operating in the base mode (vit-b), resulting in numerous missed detections.

Section 2.2: Comparing Detected Segments vs. Actual Count

The previous code also reveals the accurate count of objects (blocks) identified by SAM in each mode. Let's review these figures:

Detected segments for vit_h: 146
Detected segments for vit_l: 149
Detected segments for vit_b: 103

As anticipated, the detected counts for SAM in vit-h and vit-l are quite close (146 versus 149), while the base mode (vit-b) recorded only 103. Given that there are approximately 214 blocks in this image according to the LandIQ database, we can define accuracy as the ratio of detected blocks to actual blocks. The results are summarized as follows:

SAM vit-h: (146/214)*100 = 68%
SAM vit-l: (149/214)*100 = 70%
SAM vit-b: (103/214)*100 = 48%

Chapter 3: Conclusion

The Segment Anything Model (SAM) serves as a robust algorithm designed for the automatic segmentation of images and detection of objects without prior training. This algorithm features three distinct configurations. In this discussion, I assessed the efficacy of each mode on a satellite image. The findings indicated that both the "huge" and "large" modes performed similarly, successfully identifying nearly 70% of the blocks in an image containing 214 blocks. Conversely, the algorithm's performance in the "base" mode was notably poorer, missing over 50% of the blocks in the satellite image.

Chapter 4: References

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W.-Y., Dollár, P., & Girshick, R. (2023). Segment Anything. arXiv:2304.02643.

📱 Connect with me on various platforms for more engaging content! LinkedIn, ResearchGate, GitHub, and Twitter.

rhondamuse.com

Exploring the Segment Anything Model: Object Detection Analysis

Chapter 1: Introduction to the Segment Anything Model

Section 1.1: Downloading an RGB Landsat-8 Image

Section 1.2: Downloading LandIQ Map

Chapter 2: Applying SAM Modes on the Landsat-8 Image

Section 2.1: Visualizing the Segmented Images

Section 2.2: Comparing Detected Segments vs. Actual Count

Chapter 3: Conclusion

Chapter 4: References

Share the page:

Recent Post:

# Uncovering Hidden AWS Services in Machine Learning

What Are the Most Common Passwords of 2022? Insights and Trends

Exploring the Impact of AI on Our Understanding of Consciousness

# Embracing New Beginnings: A Journey of Transformation

Finding Humor in Life's Absurdities: A New Perspective

Exploring the Multiverse: Science or Pseudoscience?

# Understanding the Duration of Language Learning: Three Key Stages

Revolutionizing Your Writing Journey: Answers to Your Questions