User tutorial: Classifying unlabeled images¶
This section walks through how to classify images using zamba
. If you are new to zamba
and just want to classify some images as soon as possible, see the Quickstart guide.
This tutorial goes over the steps for using zamba
if:
- You already have
zamba
installed (for details see the Installation page) - You have unlabeled images that you want to generate labels for
- The possible class species labels for your images are included in the list of possible zamba labels. If your species are not included in this list, you can retrain a model using your own labeled data and then run inference.
Basic usage: command line interface¶
Say that we want to classify the images in a folder called example_images
as simply as possible using all of the default settings.
Minimum example for prediction in the command line:
$ zamba image predict --data-dir example_images/
Required arguments¶
To run zamba images predict
in the command line, you must specify --data-dir
and/or --filepaths
.
--data-dir PATH
: Path to the folder containing your images. If you don't also providefilepaths
, Zamba will recursively search this folder for images.--filepaths PATH
: Path to a CSV file with a column for the filepath to each video you want to classify. The CSV must have a column forfilepath
. Filepaths can be absolute on your system or relative to the data directory that your provide in--data-dir
.
All other flags are optional. To choose the model you want to use for prediction, either --model
or --checkpoint
must be specified. Use --model
to specify one of the pretrained models that ship with zamba
. Use --checkpoint
to run inference with a locally saved model. --model
defaults to lila.science
.
Basic usage: Python package¶
We also support using Zamba as a Python package. Say that we want to classify the images in a folder called example_images
as simply as possible using all of the default settings.
Minimum example for prediction using the Python package:
from zamba.images.manager import predict
from zamba.images.config import ImageClassificationPredictConfig
predict_config = ImageClassificationPredictConfig(data_dir="example_images/")
predict(config=predict_config)
The only argument that can be passed to predict
is config
, so the only step is to instantiate an ImageClassificationPredictConfig
and pass it to predict
.
Required arguments¶
To run predict
in Python, you must specify either data_dir
or filepaths
when PredictConfig
is instantiated.
-
data_dir (DirectoryPath)
: Path to the folder containing your images. If you don't also providefilepaths
, Zamba will recursively search this folder for images. -
filepaths (FilePath)
: Path to a CSV file with a column for the filepath to each image you want to classify. The CSV must have a column forfilepath
. Filepaths can be absolute or relative to the data directory provided asdata_dir
.
For detailed explanations of all possible configuration arguments, see All Optional Arguments.
Default behavior¶
By default, the lila.science
model will be used. zamba
will output a .csv
file with a row for each bounding box identified by megadetector (there may be multiple bounding boxes per image) and columns for each class (ie. species / species group). A cell in the CSV (i,j) can be interpreted as the predicted likelihood that the animal present in bounding box i is of species group j. Usually, most users want to just look at the top species prediction for each bounding box.
By default, predictions will be saved to a file called zamba_predictions.csv
in your working directory. You can save predictions to a custom directory using the --save-dir
argument.
$ cat zamba_predictions.csv
filepath,detection_category,detection_conf,x1,y1,x2,y2,species_acinonyx_jubatus,species_aepyceros_melampus,species_alcelaphus_buselaphus...
1.jpg,1,0.85,2015,1235,2448,1544,4.924246e-06,0.0001539439,9.6043495e-06...
2.jpg,1,0.921,527,1246,1805,1501,5.061601e-05,3.6830465e-05,2.4510617e-05...
3.jpg,1,0.9,1422,528,1629,816,1.1791806e-06,4.080566e-06,3.4533906e-07...
The detection_category
and detection_conf
come from MegaDetector which we use to find bounding boxes around individual animals in images. A detection_category
of "1"
indicates the presence of an animal, and detection_conf
is the confidence the animal classifier had in the presence of an animal. The columns x1
, y1
, x2
, y2
indicate the coordinates of the top-left and bottom-right corners of the bounding box relative to the top-left corner of the image. The remaining columns are the scores assigned to each species for the individual animal in the given bounding box.
Step-by-step tutorial¶
1. Specify the path to your images¶
Save all of your images within one parent folder.
- Images can be in nested subdirectories within the folder.
- Your images should be in be saved in formats that are suppored by Python's
pillow
library. Any images that fail a set of validation checks will be skipped during inference or training. By default,zamba
will look for files with the following suffixes:.jpg
,.jpeg
,.png
and.webp
. To use other image formats that are supported by pillow, set yourIMAGE_SUFFIXES
environment variable.
Add the path to your image folder to the command-line command. For example, if your images are in a folder called example_images
:
$ zamba image predict --data-dir example_images/
from zamba.images.manager import predict
from zamba.images.config import ImageClassificationPredictConfig
predict_config = ImageClassificationPredictConfig(data_dir="example_images/")
predict(config=predict_config)
2. Choose a model for prediction¶
Right now, Zamba supports only a single model out-of-the-box for images: lila.science
. While there's only a single model, this model has been trained to detect hundreds of species, so it's a great first pass. The lila.science
model will be used if no model is specified.
If you've fine-tuned a model, you can select that model instead of a built-in model by using the --checkpoint
argument. For example:
$ zamba predict --data-dir example_images/ --checkpoint zamba-image-classification-dummy_modelepoch=00-val_loss=48.372.ckpt
from zamba.images.manager import predict
from zamba.images.config import ImageClassificationPredictConfig
predict_config = ImageClassificationPredictConfig(
data_dir="example_images/",
checkpoint="zamba-image-classification-dummy_modelepoch=00-val_loss=48.372.ckpt")
predict(config=predict_config)
3. Choose the output format¶
There are two options for how to format predictions:
- CSV (default): Return predictions with a row for each bounding box-filename combination and a column for each class label, with probabilities between 0 and 1. Bounding boxes are automatically created for each image, and each image can contain 0 or more bounding boxes (for example, if there are many rabbits in a single image). Cell
(i,j)
is the probability that speciesj
is present in bounding boxi
. - MegaDetector: Returns much the same information as above, but in a JSON format similar to the COCO image format that's been augmented to be more useful for camera trap data.
Say we want to generate predictions for images in example_images
in MegaDetector format:
$ zamba image predict --data-dir example_image/ --results-file-format megadetector
$ cat zamba_predictions.json
{
"info": {},
"detection_categories": { "1": "animal", "2": "person", "3": "vehicle" },
"classification_categories": {
"0": "species_acinonyx_jubatus",
"1": "species_aepyceros_melampus",
"2": "species_alcelaphus_buselaphus",
...
},
"images": [ {
"file": "1.jpg",
"detections": [
{ "category": "1", "conf": 0.85, "bbox": [ 0.7, 0.2, 0.12, 0.4 ],
"classifications": [ [ 88, 0.9355112910270691 ],
...
from zamba.images.manager import predict
from zamba.images.config import ImageClassificationPredictConfig, ResultsFormat
predict_config = ImageClassificationPredictConfig(
data_dir="example_images/",
results_file_format=ResultsFormat.MEGADETECTOR
4. Specify any additional parameters¶
And there's so much more! You can also do things like specify your region for faster model download (--weight-download-region
), change the detection threshold above which we'll consider a bounding box contains an animal (--detections-threshold
), or specify a different folder where your predictions should be saved (--save-dir
). To read about a few common considerations, see the Guide to Common Optional Parameters page.