Welcome to the API docs of the ShiftHappens benchmark!

While the popularity of robustness benchmarks and new test datasets increased over the past years, the performance of computer vision models is still largely evaluated on ImageNet directly, or on simulated or isolated distribution shifts like in ImageNet-C. The goal of this two-stage workshop is twofold: First, we aim to enhance the landscape of robustness evaluation datasets for computer vision and devise new test settings and metrics for quantifying desirable properties of computer vision models. Second, we expect that these improvements in the model evaluation lead to a better guided and, thus, more efficient phase for the development of new models. This incentivizes development of models and inference methods with meaningful improvements over existing approaches with respect to a broad scope of desirable properties. Our goal is to bring the robustness, domain adaptation, and out-of-distribution detection communities together to work on a new broad-scale benchmark that tests diverse aspects of current computer vision models and guides the way towards the next generation of models.

Model implementations

Base classes and helper functions for adding models to the benchmark.

To add a new model, implement a new wrapper class inheriting from shifthappens.base.Model, and from any of the Mixins defined in this module.

Model results should be converted to numpy arrays, and packed into an shifthappens.base.ModelResult instance.

class shifthappens.models.base.ModelResult(class_labels, confidences=None, uncertainties=None, ood_scores=None, features=None)

Bases: object

Emissions of a model after processing a batch of data.

Each model needs to return class labels that are compatible with the ILSRC2012 labels. We use the same convention used by PyTorch regarding the ordering of labels.

Parameters
  • class_labels (ndarray) – (N, k) array containing top-k predictions for each sample in the batch. Choice of k can be selected by the user, and potentially influences the type of accuracy based benchmarks that the model can be run on. For standard ImageNet, ImageNet-C evaluation, choose at least k=5.

  • confidences (Optional[ndarray]) – optional, (N, 1000) confidences for each class. Standard PyTorch ImageNet class label order is expected for this array. Scores can be in the range -inf to inf.

  • uncertainties (Optional[ndarray]) – optional, (N, 1000), uncertainties for the different class predictions. Different from the confidences, this is a measure of certainty of the given confidences and common e.g. in Bayesian Deep neural networks.

  • ood_scores (Optional[ndarray]) – optional, (N,), score for interpreting the sample as an out-of-distribution class, in the range -inf to inf.

  • features (Optional[ndarray]) – optional, (N, d), where d can be arbitrary, feature representation used to arrive at the given predictions.

class shifthappens.models.base.PredictionTargets(class_labels=False, logits=False, confidences=False, uncertainties=False, ood_scores=False, features=False)

Bases: object

class shifthappens.models.base.Model

Bases: abc.ABC

Model base class.

predict(inputs, targets)
Parameters
  • inputs (np.ndarray) – Batch of images.

  • targets (PredictionTargets) – Indicates which kinds of targets should be predicted.

Return type

ModelResult

Returns

Prediction results for the given batch. Depending in the target arguments this includes the predicted labels, class confidences, class uncertainties, ood scores, and image features, all as ``np.array``s.

class shifthappens.models.base.LabelModelMixin

Bases: object

Inherit from this class if your model returns predicted labels.

class shifthappens.models.base.ConfidenceModelMixin

Bases: object

Inherit from this class if you model returns confidences.

class shifthappens.models.base.UncertaintyModelMixin

Bases: object

Inherit from this class if your model returns uncertainties.

class shifthappens.models.base.OODScoreModelMixin

Bases: object

Inherit from this class if your model returns ood scores.

class shifthappens.models.base.FeaturesModelMixin

Bases: object

Inherit from this class if your model returns features.

Torchvision baselines.

class shifthappens.models.torchvision.ResNet50

Bases: shifthappens.models.base.Model, shifthappens.models.base.LabelModelMixin, shifthappens.models.base.ConfidenceModelMixin, shifthappens.models.base.FeaturesModelMixin

Reference implementation for a torchvision ResNet50 model.

predict(inputs)
Parameters
  • inputs (np.ndarray) – Batch of images.

  • targets (PredictionTargets) – Indicates which kinds of targets should be predicted.

Returns

Prediction results for the given batch. Depending in the target arguments this includes the predicted labels, class confidences, class uncertainties, ood scores, and image features, all as ``np.array``s.

Task implementations

Base definition of a class in the shift-happens benchmark.

class shifthappens.tasks.base.Task

Bases: abc.ABC

Task base class.

evaluate(model)

Validates that the model is compatible with the task and then evaluates the model’s performance using the _evaluate function of this class.

Return type

Optional[Dict[str, float]]

abstract _evaluate(model)

Implement this function to evaluate the task and return a dictionary with the calculated metrics.

Return type

Dict[str, float]

class shifthappens.tasks.base.LabelTaskMixin

Bases: object

Indicates that the task requires the model to return the predicted labels.

class shifthappens.tasks.base.ConfidenceTaskMixin

Bases: object

Indicates that the task requires the model to return the confidence scores.

class shifthappens.tasks.base.UncertaintyTaskMixin

Bases: object

Indicates that the task requires the model to return the uncertainty scores.

class shifthappens.tasks.base.OODScoreTaskMixin

Bases: object

Indicates that the task requires the model to return the OOD scores.

class shifthappens.tasks.base.FeaturesTaskMixin

Bases: object

Indicates that the task requires the model to return the raw features.