Welcome to the API docs of the ShiftHappens benchmark!¶
While the popularity of robustness benchmarks and new test datasets increased over the past years, the performance of computer vision models is still largely evaluated on ImageNet directly, or on simulated or isolated distribution shifts like in ImageNet-C. The goal of this two-stage workshop is twofold: First, we aim to enhance the landscape of robustness evaluation datasets for computer vision and devise new test settings and metrics for quantifying desirable properties of computer vision models. Second, we expect that these improvements in the model evaluation lead to a better guided and, thus, more efficient phase for the development of new models. This incentivizes development of models and inference methods with meaningful improvements over existing approaches with respect to a broad scope of desirable properties. Our goal is to bring the robustness, domain adaptation, and out-of-distribution detection communities together to work on a new broad-scale benchmark that tests diverse aspects of current computer vision models and guides the way towards the next generation of models.
Model implementations¶
Base classes and helper functions for adding models to the benchmark.
To add a new model, implement a new wrapper class inheriting from
shifthappens.base.Model
, and from any of the Mixins defined
in this module.
Model results should be converted to numpy arrays, and packed into an
shifthappens.base.ModelResult
instance.
- class shifthappens.models.base.ModelResult(class_labels, confidences=None, uncertainties=None, ood_scores=None, features=None)
Bases:
object
Emissions of a model after processing a batch of data.
Each model needs to return class labels that are compatible with the ILSRC2012 labels. We use the same convention used by PyTorch regarding the ordering of labels.
- Parameters
class_labels (
ndarray
) –(N, k)
array containing top-k predictions for each sample in the batch. Choice ofk
can be selected by the user, and potentially influences the type of accuracy based benchmarks that the model can be run on. For standard ImageNet, ImageNet-C evaluation, choose at leastk=5
.confidences (
Optional
[ndarray
]) – optional,(N, 1000)
confidences for each class. Standard PyTorch ImageNet class label order is expected for this array. Scores can be in the range-inf
toinf
.uncertainties (
Optional
[ndarray
]) – optional,(N, 1000)
, uncertainties for the different class predictions. Different from theconfidences
, this is a measure of certainty of the givenconfidences
and common e.g. in Bayesian Deep neural networks.ood_scores (
Optional
[ndarray
]) – optional,(N,)
, score for interpreting the sample as an out-of-distribution class, in the range-inf
toinf
.features (
Optional
[ndarray
]) – optional,(N, d)
, whered
can be arbitrary, feature representation used to arrive at the given predictions.
- class shifthappens.models.base.PredictionTargets(class_labels=False, logits=False, confidences=False, uncertainties=False, ood_scores=False, features=False)
Bases:
object
- class shifthappens.models.base.Model
Bases:
abc.ABC
Model base class.
- predict(inputs, targets)
- Parameters
inputs (np.ndarray) – Batch of images.
targets (PredictionTargets) – Indicates which kinds of targets should be predicted.
- Return type
ModelResult
- Returns
Prediction results for the given batch. Depending in the target arguments this includes the predicted labels, class confidences, class uncertainties, ood scores, and image features, all as ``np.array``s.
- class shifthappens.models.base.LabelModelMixin
Bases:
object
Inherit from this class if your model returns predicted labels.
- class shifthappens.models.base.ConfidenceModelMixin
Bases:
object
Inherit from this class if you model returns confidences.
- class shifthappens.models.base.UncertaintyModelMixin
Bases:
object
Inherit from this class if your model returns uncertainties.
- class shifthappens.models.base.OODScoreModelMixin
Bases:
object
Inherit from this class if your model returns ood scores.
- class shifthappens.models.base.FeaturesModelMixin
Bases:
object
Inherit from this class if your model returns features.
Torchvision baselines.
- class shifthappens.models.torchvision.ResNet50
Bases:
shifthappens.models.base.Model
,shifthappens.models.base.LabelModelMixin
,shifthappens.models.base.ConfidenceModelMixin
,shifthappens.models.base.FeaturesModelMixin
Reference implementation for a torchvision ResNet50 model.
- predict(inputs)
- Parameters
inputs (np.ndarray) – Batch of images.
targets (PredictionTargets) – Indicates which kinds of targets should be predicted.
- Returns
Prediction results for the given batch. Depending in the target arguments this includes the predicted labels, class confidences, class uncertainties, ood scores, and image features, all as ``np.array``s.
Task implementations¶
Base definition of a class in the shift-happens benchmark.
- class shifthappens.tasks.base.Task
Bases:
abc.ABC
Task base class.
- evaluate(model)
Validates that the model is compatible with the task and then evaluates the model’s performance using the _evaluate function of this class.
- Return type
Optional
[Dict
[str
,float
]]
- abstract _evaluate(model)
Implement this function to evaluate the task and return a dictionary with the calculated metrics.
- Return type
Dict
[str
,float
]
- class shifthappens.tasks.base.LabelTaskMixin
Bases:
object
Indicates that the task requires the model to return the predicted labels.
- class shifthappens.tasks.base.ConfidenceTaskMixin
Bases:
object
Indicates that the task requires the model to return the confidence scores.
- class shifthappens.tasks.base.UncertaintyTaskMixin
Bases:
object
Indicates that the task requires the model to return the uncertainty scores.
- class shifthappens.tasks.base.OODScoreTaskMixin
Bases:
object
Indicates that the task requires the model to return the OOD scores.
- class shifthappens.tasks.base.FeaturesTaskMixin
Bases:
object
Indicates that the task requires the model to return the raw features.