How to use F1 score and Accuracy for binary classification? #1087

aedoardo · 2022-06-14T12:50:56Z

aedoardo
Jun 14, 2022

Hi! I'm having some doubts about a task that I'm facing. I'm doing an image binary classification and I have a question about the metrics.

In my task I have two classes that I defined as 0 and 1. Initially, I used the torchmetrics modules with the num_classes parameter equals to 2 and I saw that my F1 score and my Accuracy are always the same while the Recall is slightly different and the average is always the macro. The datasets that I'm using are balanced between the two classes.

Now, I changed the metrics' code in this way:

self.train_accuracy = torchmetrics.Accuracy(num_classes=1, average='macro', multiclass=False)
self.test_accuracy = torchmetrics.Accuracy(num_classes=1, average='macro', multiclass=False)
self.val_accuracy = torchmetrics.Accuracy(num_classes=1, average="macro", multiclass=False)

self.val_MCC = torchmetrics.MatthewsCorrCoef(num_classes=2, multiclass=False)
self.train_MCC = torchmetrics.MatthewsCorrCoef(num_classes=2, multiclass=False)
self.test_MCC = torchmetrics.MatthewsCorrCoef(num_classes=2, multiclass=False)

self.train_recall = torchmetrics.Recall(num_classes=1, average='macro', multiclass=False)
self.test_recall = torchmetrics.Recall(num_classes=1, average='macro', multiclass=False)

self.train_f1 = torchmetrics.F1Score(num_classes=1, average='macro', multiclass=False)
self.test_f1 = torchmetrics.F1Score(num_classes=1, average='macro', multiclass=False)

and even in this case I have always the same Accuracy and the same F1 Score.

I'm confused about the num_classes parameter: which one should I use for a binary classification? The first, with num_classes=2 or the second one with num_classes=1? And why I'm having always the same value for the F1 and the Accuracy even if I'm using the average macro? Is it due to the balanced dataset?

Just another information: in the training (and validation) step what I do is the following:

x, y = batch
y_hat = self(x)
loss = F.cross_entropy(y_hat, y)
output = torch.argmax(y_hat, dim=1)

and then I simply pass the output to the metrics with y as targets.

Thank you, have a nice day!

Answered by Borda

Mar 18, 2026

@aedoardo — the reason F1 and Accuracy gave identical values is that with torch.argmax on binary outputs + average="macro" + a balanced dataset, they can mathematically converge. But the deeper issue was the old API's confusing num_classes / multiclass interaction.

The fix with today's API (v1.9.0):

from torchmetrics.classification import BinaryF1Score, BinaryAccuracy
from torchmetrics import MetricCollection

metrics = MetricCollection({
    "acc": BinaryAccuracy(),
    "f1": BinaryF1Score(),
})

# In your step:
y_hat = self(x)                     # (N, 2) logits
loss = F.cross_entropy(y_hat, y)
probs = y_hat.softmax(dim=1)[:, 1]  # probability of positive class
metric_dict = metrics(probs…

View full answer

celsofranssa · 2022-06-16T21:42:37Z

celsofranssa
Jun 16, 2022

And where do you call the metrics?
Tip: Since you are logging a collection of metrics, you might want to use MetricCollection.

0 replies

Borda · 2026-03-18T15:02:24Z

Borda
Mar 18, 2026
Maintainer

@aedoardo — the reason F1 and Accuracy gave identical values is that with torch.argmax on binary outputs + average="macro" + a balanced dataset, they can mathematically converge. But the deeper issue was the old API's confusing num_classes / multiclass interaction.

The fix with today's API (v1.9.0):

from torchmetrics.classification import BinaryF1Score, BinaryAccuracy
from torchmetrics import MetricCollection

metrics = MetricCollection({
    "acc": BinaryAccuracy(),
    "f1": BinaryF1Score(),
})

# In your step:
y_hat = self(x)                     # (N, 2) logits
loss = F.cross_entropy(y_hat, y)
probs = y_hat.softmax(dim=1)[:, 1]  # probability of positive class
metric_dict = metrics(probs, y)
self.log_dict(metric_dict)

Key points:

Don't argmax before metrics — pass probabilities. Argmax throws away confidence info that metrics use for thresholding.
Use BinaryF1Score / BinaryAccuracy — not MulticlassF1Score(num_classes=2).
Extract positive class probability: y_hat.softmax(dim=1)[:, 1] — gives (N,) tensor.

With this setup, F1 and Accuracy will correctly differ.

Docs: BinaryF1Score | BinaryAccuracy

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use F1 score and Accuracy for binary classification? #1087

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to use F1 score and Accuracy for binary classification? #1087

Uh oh!

Uh oh!

aedoardo Jun 14, 2022

Replies: 2 comments

Uh oh!

celsofranssa Jun 16, 2022

Uh oh!

Borda Mar 18, 2026 Maintainer

aedoardo
Jun 14, 2022

celsofranssa
Jun 16, 2022

Borda
Mar 18, 2026
Maintainer