OAXMLC-Bench π“Š³

Benchmarking Extreme Multi-Label Classification for Semantic Annotation with Multi-Taxonomy Datasets

Pietro Caforio*, Christophe Broillet*, Philippe CudrΓ©-Mauroux, Julien Audiffren (University of Fribourg)
*Equal contribution

Extreme Multi-Label Classification (XMLC) is the task of predicting relevant labels from massive tag sets. Many real-world label collections are organized into taxonomies (hierarchical relationships), and recent methods show that leveraging taxonomic structure can boost performance. However, comprehensive evaluation remains challenging: it is hard to fairly compare methods and disentangle the effect of taxonomy from other dataset characteristics.

We introduce OAXMLC, a benchmark designed to evaluate how XMLC algorithms leverage taxonomic information across multiple tasks:

We provide two large XMLC datasets extracted from OpenAlex, each featuring two distinct taxonomies over the same documents. By benchmarking taxonomy-aware and taxonomy-agnostic methods, we analyze how tasks, datasets, and taxonomic properties impact performance.

Contents

Datasets

The benchmark is built on two large-scale XMLC datasets extracted from OpenAlex, each equipped with two independent taxonomies over the same documents: Topics (ASJC/CWTS-derived; 3 levels after extracting the domain-specific sub-taxonomy) and Concepts (MAG-derived; 5 hierarchical levels).

Dataset / Taxonomy Documents (N) Labels (N) Labels / doc (avg) Labels / doc (median)
OAXMLC-CS3,725,870–––
  β†³ Topics–7753.63
  β†³ Concepts–8,9269.89
OAXMLC-Med869,402–––
  β†³ Topics–1982.42
  β†³ Concepts–2,4534.14
MAG-CS143,9282,6414.44
EURLex51,0004,49210.410
PubMed139,9325,91118.518

XMLC dataset statistics (Table 1 in the paper).

Metrics

We report both thresholded classification metrics (Micro/Macro P, R, F1) and ranking metrics (P@k, R@k, nDCG@k). For classification metrics, scores are binarized using a threshold selected on the validation set (e.g., maximizing Micro-F1).

Notation

Let \(N\) be the number of documents and \(L\) the number of labels. For document \(i\), the set of true labels is \(Y_i\), and the predicted ranking is \(\hat{\pi}_i=(\hat{\pi}_{i,1},\hat{\pi}_{i,2},\dots)\). After thresholding, \(\hat{y}_{i\ell}\in\{0,1\}\) denotes whether label \(\ell\) is predicted for document \(i\), and \(y_{i\ell}\in\{0,1\}\) is the ground-truth.

MetricDefinition
Micro-P / Micro-R / Micro-F1 \[ \mathrm{TP}_\ell=\sum_{i=1}^N \mathbb{1}[\hat{y}_{i\ell}=1 \wedge y_{i\ell}=1],\quad \mathrm{FP}_\ell=\sum_{i=1}^N \mathbb{1}[\hat{y}_{i\ell}=1 \wedge y_{i\ell}=0],\quad \mathrm{FN}_\ell=\sum_{i=1}^N \mathbb{1}[\hat{y}_{i\ell}=0 \wedge y_{i\ell}=1] \] \[ P_{\mu}=\frac{\sum_{\ell=1}^{L}\mathrm{TP}_{\ell}}{\sum_{\ell=1}^{L}(\mathrm{TP}_{\ell}+\mathrm{FP}_{\ell})},\quad R_{\mu}=\frac{\sum_{\ell=1}^{L}\mathrm{TP}_{\ell}}{\sum_{\ell=1}^{L}(\mathrm{TP}_{\ell}+\mathrm{FN}_{\ell})},\quad F1_{\mu}=\frac{2P_{\mu}R_{\mu}}{P_{\mu}+R_{\mu}} \]
Macro-P / Macro-R / Macro-F1 \[ P_{\ell}=\frac{\mathrm{TP}_{\ell}}{\mathrm{TP}_{\ell}+\mathrm{FP}_{\ell}},\quad R_{\ell}=\frac{\mathrm{TP}_{\ell}}{\mathrm{TP}_{\ell}+\mathrm{FN}_{\ell}},\quad F1_{\ell}=\frac{2P_{\ell}R_{\ell}}{P_{\ell}+R_{\ell}} \] \[ P_{M}=\frac{1}{L}\sum_{\ell=1}^{L}P_{\ell},\quad R_{M}=\frac{1}{L}\sum_{\ell=1}^{L}R_{\ell},\quad F1_{M}=\frac{1}{L}\sum_{\ell=1}^{L}F1_{\ell} \]
P@k \[ P@k=\frac{1}{N}\sum_{i=1}^{N}\frac{1}{k}\sum_{j=1}^{k}\mathbb{1}[\hat{\pi}_{i,j}\in Y_i] \]
R@k \[ R@k=\frac{1}{N}\sum_{i=1}^{N}\frac{1}{|Y_i|}\sum_{j=1}^{k}\mathbb{1}[\hat{\pi}_{i,j}\in Y_i] \]
nDCG@k (N@k) \[ \mathrm{rel}_{i,j}=\mathbb{1}[\hat{\pi}_{i,j}\in Y_i],\quad \mathrm{DCG}@k(i)=\sum_{j=1}^{k}\frac{\mathrm{rel}_{i,j}}{\log_2(j+1)},\quad \mathrm{IDCG}@k(i)=\sum_{j=1}^{\min(k,|Y_i|)}\frac{1}{\log_2(j+1)} \] \[ \mathrm{nDCG}@k=\frac{1}{N}\sum_{i=1}^{N}\frac{\mathrm{DCG}@k(i)}{\mathrm{IDCG}@k(i)} \]

Note: when computing ranking metrics at cutoff \(k\), you may restrict to documents with \(|Y_i|\ge k\). In that case, P@3 can be larger than P@2 because they are averaged over different subsets.

Tasks

The OAXMLC benchmark evaluates Extreme Multi-Label Classification (XMLC) methods across multiple tasks, each probing a distinct capability of taxonomy-aware and taxonomy-agnostic models. All tasks assume the taxonomy completion hypothesis: if a document is annotated with a label, it is implicitly associated with all of its ancestor labels in the taxonomy.

XML Classification (XMLCl)

The standard XMLC setup: predict the full set of relevant labels for unseen documents using only their text. Models are trained on fully annotated documents and evaluated on a held-out test set.

XML Completion (XMLCo)

Documents are partially annotated: each document is initially provided with general labels (e.g., level-1), and the goal is to predict missing, more specific labels at deeper levels. This reflects realistic settings with incomplete annotations or evolving taxonomies.

XML Few-Shot Learning (XMLFS)

Evaluate adaptation to new labels. During training, a whole subtree is withheld and its labels are removed. After convergence, models are exposed to a small number of labeled examples from the withheld subtree and fine-tuned. We report performance before and after fine-tuning.

Benchmarked methods

MethodVenue / PublicationYearAlgorithm Type
MATCHWWW2021Deep learning, taxonomy-aware (Transformer)
XML-CNNSIGIR2017Deep learning (CNN-based)
AttentionXMLNeurIPS2019Deep learning, label-tree attention
FastXMLKDD2016Tree-based, non-deep learning
HECTORWWW2024Deep learning, taxonomy-aware (Seq2Seq)
TAMLECCIKM / arXiv2024–2025Deep learning, taxonomy-aware (parallel / path-based)
LightXMLAAAI2021Deep learning (Transformer, negative sampling)
CascadeXMLNeurIPS2022Deep learning (multi-resolution Transformer)
ParabelWWW2018Tree-based, embedding-based
NGAMEWSDM2023Deep learning, Siamese / metric learning
DEXAKDD2023Deep learning, Siamese with auxiliary parameters
Tip: click any column header to sort (numeric-aware). The first column (β€œMethod”) stays visible while scrolling.

Tables

Radar Summary by Task
Classification
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
AttentionXML
LightXML
MATCH
Metrics: Precision@1 (P@1). Top 3 by avg of the four axes.
Classification (per level)
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
LightXML
MATCH
CascadeXML
Metrics: Precision@1 (P@1). Per-level is avg of L1/L2/L3. Top 3 by avg of the four axes.
Completion
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
TAMLeC
Hector
AttentionXML
Metrics: Precision@1 (P@1). Top 3 by avg of the four axes.
Few-shot
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
MATCH
AttentionXML
LightXML
Metrics: Precision@1 (P@1). Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary by Task
Classification
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
AttentionXML
MATCH
LightXML
Metrics: Recall@1 (R@1). Top 3 by avg of the four axes.
Classification (per level)
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
MATCH
LightXML
XML-CNN
Metrics: Recall@1 (R@1). Per-level is avg of L1/L2/L3. Top 3 by avg of the four axes.
Completion
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
TAMLeC
Hector
AttentionXML
Metrics: Recall@1 (R@1). Top 3 by avg of the four axes.
Few-shot
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
MATCH
AttentionXML
LightXML
Metrics: Recall@1 (R@1). Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary by Task
Classification
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
AttentionXML
LightXML
MATCH
Metrics: P@2. Top 3 by avg of the four axes.
Classification (per level)
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
LightXML
MATCH
CascadeXML
Metrics: P@2. Per-level is avg of L1/L2/L3. Top 3 by avg of the four axes.
Completion
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
TAMLeC
Hector
Metrics: P@2. Top 2 by avg of the four axes.
Few-shot
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
MATCH
LightXML
AttentionXML
Metrics: P@2. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary by Task
Classification
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
AttentionXML
MATCH
LightXML
Metrics: R@2. Top 3 by avg of the four axes.
Classification (per level)
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
MATCH
LightXML
XML-CNN
Metrics: R@2. Per-level is avg of L1/L2/L3. Top 3 by avg of the four axes.
Completion
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
TAMLeC
Hector
Metrics: R@2. Top 2 by avg of the four axes.
Few-shot
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
MATCH
AttentionXML
LightXML
Metrics: R@2. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary by Task
Classification
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
AttentionXML
LightXML
MATCH
Metrics: P@3. Top 3 by avg of the four axes.
Classification (per level)
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
LightXML
MATCH
CascadeXML
Metrics: P@3. Per-level is avg of L1/L2/L3. Top 3 by avg of the four axes.
Completion
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
TAMLeC
Hector
AttentionXML
Metrics: P@3. Top 3 by avg of the four axes.
Radar Summary by Task
Classification
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
AttentionXML
LightXML
MATCH
Metrics: nDCG@3. Top 3 by avg of the four axes.
Classification (per level)
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
LightXML
MATCH
CascadeXML
Metrics: nDCG@3. Per-level is avg of L1/L2/L3. Top 3 by avg of the four axes.
Completion
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
TAMLeC
Hector
AttentionXML
Metrics: nDCG@3. Top 3 by avg of the four axes.
Radar Summary by Task
Classification
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
Hector
XML-CNN
LightXML
Metrics: Micro-P. Top 3 by avg of the four axes.
Classification (per level)
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
XML-CNN
LightXML
CascadeXML
Metrics: Micro-P. Per-level is avg of L1/L2/L3. Top 3 by avg of the four axes.
Completion
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
Hector
TAMLeC
XML-CNN
Metrics: Micro-P. Top 3 by avg of the four axes.
Few-shot
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
MATCH
AttentionXML
LightXML
Metrics: Micro-P. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary by Task
Classification
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
DEXA
NGAME
AttentionXML
Metrics: Micro-R. Top 3 by avg of the four axes.
Classification (per level)
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
DEXA
NGAME
MATCH
Metrics: Micro-R. Per-level is avg of L1/L2/L3. Top 3 by avg of the four axes.
Completion
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
TAMLeC
DEXA
AttentionXML
Metrics: Micro-R. Top 3 by avg of the four axes.
Few-shot
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
MATCH
AttentionXML
LightXML
Metrics: Micro-R. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary by Task
Classification
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
AttentionXML
MATCH
LightXML
Metrics: Micro-F1. Top 3 by avg of the four axes.
Classification (per level)
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
MATCH
LightXML
CascadeXML
Metrics: Micro-F1. Per-level is avg of L1/L2/L3. Top 3 by avg of the four axes.
Completion
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
TAMLeC
Hector
AttentionXML
Metrics: Micro-F1. Top 3 by avg of the four axes.
Few-shot
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
MATCH
AttentionXML
LightXML
Metrics: Micro-F1. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary by Task
Classification
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
MATCH
AttentionXML
CascadeXML
Metrics: Macro-P. Top 3 by avg of the four axes.
Classification (per level)
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
MATCH
LightXML
CascadeXML
Metrics: Macro-P. Per-level is avg of L1/L2/L3. Top 3 by avg of the four axes.
Completion
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
Hector
TAMLeC
MATCH
Metrics: Macro-P. Top 3 by avg of the four axes.
Few-shot
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
MATCH
AttentionXML
LightXML
Metrics: Macro-P. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary by Task
Classification
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
DEXA
NGAME
AttentionXML
Metrics: Macro-R. Top 3 by avg of the four axes.
Classification (per level)
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
NGAME
DEXA
MATCH
Metrics: Macro-R. Per-level is avg of L1/L2/L3. Top 3 by avg of the four axes.
Completion
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
TAMLeC
Hector
DEXA
Metrics: Macro-R. Top 3 by avg of the four axes.
Few-shot
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
MATCH
AttentionXML
LightXML
Metrics: Macro-R. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary by Task
Classification
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
AttentionXML
MATCH
LightXML
Metrics: Macro-F1. Top 3 by avg of the four axes.
Classification (per level)
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
MATCH
LightXML
CascadeXML
Metrics: Macro-F1. Per-level is avg of L1/L2/L3. Top 3 by avg of the four axes.
Completion
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
TAMLeC
Hector
AttentionXML
Metrics: Macro-F1. Top 3 by avg of the four axes.
Few-shot
OAXMLC-CS TopicsOAXMLC-CS ConceptsOAXMLC-Med TopicsOAXMLC-Med Concepts
MATCH
AttentionXML
LightXML
Metrics: Macro-F1. Few-shot (task, FT). Top 3 by avg of the four axes.

OAXMLC-CS Concepts

Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
AttentionXML
LightXML
Metrics: Precision@1 (P@1). Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
AttentionXML
LightXML
Metrics: Recall@1 (R@1). Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
AttentionXML
LightXML
Metrics: P@3. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
AttentionXML
LightXML
Metrics: nDCG@3. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
XML-CNN
LightXML
AttentionXML
Metrics: Micro-P. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
AttentionXML
LightXML
Metrics: Micro-R. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
AttentionXML
LightXML
Metrics: Micro-F1. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
AttentionXML
LightXML
Metrics: Macro-P. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
AttentionXML
LightXML
Metrics: Macro-R. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
AttentionXML
LightXML
Metrics: Macro-F1. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.

Classification

MethodP@1R@1P@2R@2P@3nDCG@3Micro-PMicro-RMicro-F1Macro-PMacro-RMacro-F1
FastXML0.43570.01510.41580.00710.40920.43230.77180.07710.14020.07670.00440.0083
Parabel0.57020.27490.54840.02200.54880.55650.44830.43380.44090.34940.07930.1293
CascadeXML0.92630.78350.90250.07270.88710.89950.80130.50220.61740.41550.15090.2214
AttentionXML0.92370.79470.90060.17360.88530.89760.79890.50910.62190.41350.14940.2195
LightXML0.92200.80190.89930.14290.88340.89580.80940.48590.60720.32450.10440.1580
MATCH0.92330.79330.90120.20870.88600.89800.79710.50530.61850.52200.17960.2673
XML-CNN0.91400.78950.88600.18850.86750.88220.81960.43180.56560.39600.08620.1416
Hector0.88050.68720.84760.05520.82010.83880.96770.00130.00260.00550.00010.0001
DEXA0.59970.30260.50040.03020.45570.48950.06040.58030.10940.00100.01450.0019
NGAME0.57680.36800.41850.03220.38950.42820.06000.57690.10880.00080.01440.0016

Classification (per level)

MethodP@1 (L1)R@1 (L1)P@2 (L1)R@2 (L1)P@3 (L1)nDCG@3 (L1)Micro-P (L1)Micro-R (L1)Micro-F1 (L1)Macro-P (L1)Macro-R (L1)Macro-F1 (L1)P@1 (L2)R@1 (L2)P@2 (L2)R@2 (L2)P@3 (L2)nDCG@3 (L2)Micro-P (L2)Micro-R (L2)Micro-F1 (L2)Macro-P (L2)Macro-R (L2)Macro-F1 (L2)P@1 (L3)R@1 (L3)P@2 (L3)R@2 (L3)P@3 (L3)nDCG@3 (L3)Micro-P (L3)Micro-R (L3)Micro-F1 (L3)Macro-P (L3)Macro-R (L3)Macro-F1 (L3)
FastXML0.74550.59310.71560.27130.70150.72490.60550.52520.56250.67510.26450.38000.43570.02380.40600.01120.39120.42020.78000.08720.15690.07980.00410.00780.37840.01010.41260.00720.43480.46580.77190.06480.11950.06930.00480.0089
Parabel0.49870.45530.52190.22850.55580.55380.43650.67570.53040.37550.47160.41810.49740.00980.45060.00430.42650.46190.41360.28300.33610.33670.07680.12510.41720.00420.42860.00270.43400.47420.47580.25140.32900.34440.08570.1372
CascadeXML0.92770.84640.90870.40640.88720.90520.82760.65690.73240.78360.54810.64510.78910.03670.72060.01300.67140.71720.79990.39450.52840.42220.14420.21490.65550.00820.64950.00470.64750.69180.73280.36940.49110.41960.15580.2272
AttentionXML0.92470.84540.90580.40640.88390.90230.83360.64350.72630.78730.53910.64000.79500.12750.73230.05350.68850.73180.78670.41360.54210.43810.15140.22510.68620.07950.68400.05140.68580.72670.72730.39910.51540.41220.15340.2235
LightXML0.92570.84870.90620.40700.88380.90230.82770.64840.72720.79800.52400.63260.78870.09880.71940.04160.67280.71810.80730.37790.51480.37310.11720.17830.65790.05380.65580.03730.65960.70240.75530.34180.47070.32000.10750.1609
MATCH0.92630.84810.90670.40710.88440.90280.82080.65920.73120.77650.55430.64680.79530.14380.72730.05790.68000.72500.79440.39850.53080.53490.17890.26810.67920.09890.67160.06030.67070.71270.73780.37250.49500.53380.18700.2770
XML-CNN0.91340.82880.89310.39610.86950.88960.82180.61480.70340.81330.46940.59530.75850.11720.68680.04770.64200.68800.83810.29950.44130.41320.08500.14100.64390.08250.64220.05250.64560.68800.79110.27830.41180.39850.09170.1492
Hector0.88050.78320.86530.37370.84480.86470.65560.71080.68210.62220.56190.59050.68540.02710.61790.00970.57470.61880.69020.28860.40700.21480.06450.09920.48010.00760.48900.00410.49460.53180.40550.23370.29650.18110.08130.1122
DEXA0.60360.45920.57350.18140.54510.56730.16790.93240.28450.14390.79950.24390.07570.01100.07380.00490.06680.07230.02380.40960.04490.00080.02470.00160.05040.00330.04450.00210.04090.04670.01660.10190.02860.00020.00480.0004
NGAME0.57680.40050.42910.15480.41220.44850.16600.94710.28250.13810.83670.23700.03360.01090.04730.00490.04890.04870.02270.37950.04280.00070.02480.00130.04850.00330.04010.00210.04050.04700.01510.11220.02660.00010.00560.0002

Completion

MethodP@1R@1P@3nDCG@3Micro-PMicro-RMicro-F1Macro-PMacro-RMacro-F1P@2R@2
FastXML0.43570.01510.40920.43230.77180.07710.14020.07670.00440.0083
Parabel0.50420.00550.45520.48380.45020.25480.32540.34930.07770.1271
CascadeXML0.29540.01520.14890.18530.80810.03980.07590.21280.02600.0464
AttentionXML0.79510.08700.72400.75700.75740.39930.52300.41190.14770.2174
LightXML0.78810.06500.70830.74320.78350.35320.48690.32240.10260.1557
MATCH0.79430.10850.71620.75060.76590.37970.50770.52080.17800.2653
XML-CNN0.75900.09000.68130.71680.81580.28250.41960.39410.08450.1392
Hector0.71050.01460.68940.72120.88330.35720.50870.69760.36310.47760.68670.0110
DEXA0.06770.00850.07810.07530.02960.35890.05470.00060.01330.0011
NGAME0.06280.00890.09100.08750.02890.34300.05330.00040.01310.0008
TAMLeC0.79252.12860.78230.81370.87430.54830.66650.67110.44190.52990.77534.4814

Few-shot (global)

MethodP@1 (no FT)R@1 (no FT)P@2 (no FT)R@2 (no FT)P@3 (no FT)nDCG@3 (no FT)Micro-P (no FT)Micro-R (no FT)Micro-F1 (no FT)Macro-P (no FT)Macro-R (no FT)Macro-F1 (no FT)P@1 (FT)R@1 (FT)P@2 (FT)R@2 (FT)P@3 (FT)nDCG@3 (FT)Micro-P (FT)Micro-R (FT)Micro-F1 (FT)Macro-P (FT)Macro-R (FT)Macro-F1 (FT)
CascadeXML0.12450.00490.08010.00260.06120.08060.78340.01740.03400.06890.01210.02060.12120.00490.07850.00260.06030.07920.72910.01730.03380.06030.01210.0201
AttentionXML0.90860.02420.88830.01020.87140.89070.79260.63200.70320.50110.25550.33840.90680.16870.88860.08760.87340.89280.76540.65220.70430.51050.24610.3321
LightXML0.89860.01530.87730.00650.85860.87960.81520.56910.67030.39130.19050.25620.85200.06560.83680.03180.82650.84720.75740.55680.64180.36850.16090.2240
MATCH0.90530.02200.88770.00900.87210.89090.79850.62610.70180.56520.31470.40430.87570.15650.86000.08010.84800.86830.70120.66850.68440.53840.31260.3956
XML-CNN0.67040.02610.62230.01100.59690.63280.82730.19670.31780.11930.02060.03520.52920.01990.51920.00970.50540.53440.72860.12640.21540.06700.00830.0148
DEXA0.00010.00970.00010.00750.00000.00010.00100.00990.00190.00000.01190.00000.00010.00910.00010.00740.00010.00010.00010.00060.00010.00000.01180.0000

Few-shot (task)

MethodP@1 (no FT)R@1 (no FT)P@2 (no FT)R@2 (no FT)P@3 (no FT)nDCG@3 (no FT)Micro-P (no FT)Micro-R (no FT)Micro-F1 (no FT)Macro-P (no FT)Macro-R (no FT)Macro-F1 (no FT)P@1 (FT)R@1 (FT)P@2 (FT)R@2 (FT)P@3 (FT)nDCG@3 (FT)Micro-P (FT)Micro-R (FT)Micro-F1 (FT)Macro-P (FT)Macro-R (FT)Macro-F1 (FT)
CascadeXML0.00740.00320.01130.00290.01210.01420.00000.00000.00000.00000.00000.00000.04590.00330.05330.00290.05620.07080.77450.01340.02630.07150.01730.0278
AttentionXML0.00400.00600.00710.00490.01110.01090.00000.00000.00000.00000.00000.00000.89360.62720.85500.25740.87700.90150.84760.59280.69770.54540.21420.3076
LightXML0.00480.00640.00840.00530.01210.01170.00000.00000.00000.00000.00000.00000.90650.57470.86130.21890.87840.90380.86570.59940.70840.50120.18470.2700
MATCH0.03000.00460.04200.00400.02860.03120.00000.00000.00000.00000.00000.00000.94070.77810.91400.33230.92120.93700.83590.75890.79560.71960.48770.5814
XML-CNN0.00550.00780.00940.00580.01310.01290.00000.00000.00000.00000.00000.00000.64620.10590.61840.02470.67140.72810.82290.14810.25100.00850.00190.0031
DEXA0.00120.02680.00130.00950.00080.00110.00100.40780.00210.00030.24270.00050.00490.10390.43460.01760.33370.29270.00000.00000.00000.00000.00000.0000

OAXMLC-CS Topics

Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
LightXML
AttentionXML
Metrics: Precision@1 (P@1). Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
LightXML
AttentionXML
Metrics: Recall@1 (R@1). Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 1 Methods
Hector
Metrics: P@2. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 1 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 1 Methods
Hector
Metrics: R@2. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 1 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
AttentionXML
LightXML
Metrics: P@3. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
LightXML
AttentionXML
Metrics: nDCG@3. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
XML-CNN
LightXML
MATCH
Metrics: Micro-P. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
AttentionXML
LightXML
Metrics: Micro-R. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
LightXML
AttentionXML
Metrics: Micro-F1. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
Hector
LightXML
Metrics: Macro-P. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
AttentionXML
LightXML
Metrics: Macro-R. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
AttentionXML
LightXML
Metrics: Macro-F1. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.

Classification

MethodP@1R@1P@2R@2P@3nDCG@3Micro-PMicro-RMicro-F1Macro-PMacro-RMacro-F1
FastXML0.62760.36180.51800.04810.53530.57450.62800.23480.34180.27950.01780.0335
Parabel0.59060.40090.50350.06810.52230.55720.45520.36860.40740.42450.17210.2449
CascadeXML0.84420.58330.78260.14710.77610.80780.78270.55330.64830.62960.30230.4085
AttentionXML0.88000.68540.83310.26030.83270.85850.77750.67400.72140.65030.51790.5745
LightXML0.88080.68420.83340.25630.83330.85910.80470.64880.71840.69970.45190.5488
MATCH0.88340.70790.83720.27170.83550.86120.80120.65810.72260.70370.47260.5655
XML-CNN0.85380.67420.79380.22220.79890.82830.81050.55120.65620.70660.29610.4173
Hector0.87670.63620.82460.13490.71730.77520.77130.32510.45720.54340.19190.2832
DEXA0.36380.13600.35460.01730.36770.38630.02580.72360.04980.00690.18640.0133
NGAME0.28910.25530.19810.01670.30280.30260.02650.74370.05120.00430.16260.0084

Classification (per level)

MethodP@1 (L1)R@1 (L1)P@2 (L1)R@2 (L1)P@3 (L1)nDCG@3 (L1)Micro-P (L1)Micro-R (L1)Micro-F1 (L1)Macro-P (L1)Macro-R (L1)Macro-F1 (L1)P@1 (L2)R@1 (L2)P@2 (L2)R@2 (L2)P@3 (L2)nDCG@3 (L2)Micro-P (L2)Micro-R (L2)Micro-F1 (L2)Macro-P (L2)Macro-R (L2)Macro-F1 (L2)P@1 (L3)R@1 (L3)P@2 (L3)R@2 (L3)P@3 (L3)nDCG@3 (L3)Micro-P (L3)Micro-R (L3)Micro-F1 (L3)Macro-P (L3)Macro-R (L3)Macro-F1 (L3)
FastXML0.62760.54410.62000.27380.64670.67310.61040.50160.55060.64760.32510.43290.42160.05950.42040.03650.41420.44420.79250.05190.09730.49730.02330.04450.46730.05270.52130.03090.55910.60780.90340.00380.00750.03950.00060.0011
CascadeXML0.84660.77980.81230.37460.80500.83290.81530.70380.75550.78780.62930.69970.70470.19840.66290.10560.63350.67750.74960.45810.56870.69110.35340.46770.70210.06650.70230.03330.69010.74070.75370.30560.43490.56230.23920.3357
AttentionXML0.88190.83410.85250.40200.83980.86640.82120.78120.80040.79380.72430.75710.77960.41300.74260.20810.70870.75170.73950.59900.66120.68200.50600.58000.91060.67040.89880.31970.88290.91090.69890.68050.68830.60120.53180.5615
LightXML0.88270.83450.85450.40300.84180.86800.83440.77150.80170.81010.71060.75700.77880.40370.74140.20380.70780.75090.77590.56230.65200.72920.46400.56690.90000.64850.88440.30830.87030.89990.77150.61840.68620.66090.43780.5263
MATCH0.88640.84200.85730.40550.84510.87150.83790.77440.80490.81500.71320.76070.78370.43900.74480.21670.71140.75440.76870.57660.65890.73260.47900.57920.90850.72310.89500.34700.87810.90650.75680.62590.68510.67010.45810.5442
XML-CNN0.85050.79620.81920.38120.80590.83430.82280.70210.75770.80800.61900.70100.73530.34460.69980.17480.66890.71360.79540.44770.57300.74820.32910.45710.84370.58000.82120.26970.79520.83700.82230.43300.56720.66150.25150.3644
Hector0.87670.82830.84560.39810.83340.86040.91730.50530.65160.90220.44260.59370.76510.17580.72440.07980.69110.73300.89100.19950.32580.83220.17960.29510.58880.09400.62680.03780.63280.68430.11210.21870.14820.23170.20970.2202
DEXA0.36380.33840.45050.17580.51770.48960.13240.89120.23060.19070.93570.31680.03010.01680.05380.01150.05760.05630.01420.63940.02770.00700.32800.01370.13320.00910.10840.00610.07570.10600.01110.13920.02060.00010.01750.0003
NGAME0.28910.31120.28940.19100.51860.51310.13311.00000.23490.13311.00000.23490.02710.01490.02970.01070.02610.02570.01330.59520.02600.00390.29140.00760.14370.00920.11200.00610.07790.10970.01800.13130.03170.00010.00330.0001

Completion

MethodP@1R@1P@3nDCG@3Micro-PMicro-RMicro-F1Macro-PMacro-RMacro-F1P@2R@2
FastXML0.42160.05180.40590.43570.79300.04870.09180.27270.01210.0232
Parabel0.48190.08430.45110.48790.49800.22600.31090.42420.16750.2402
CascadeXML0.70350.18220.63630.67830.74830.44540.55840.62630.29540.4014
AttentionXML0.77920.39950.73040.76900.74010.60100.66240.64760.51410.5710
LightXML0.77830.38880.72900.76770.77740.56410.65370.69760.44710.5447
MATCH0.78290.42490.73190.77050.76880.57830.66000.70160.46820.5616
XML-CNN0.73490.33350.68480.72610.79720.44620.57220.70470.29010.4110
Hector0.83600.65580.76470.80040.90720.54910.68410.91660.70800.79890.78840.3028
DEXA0.03010.01650.05710.05550.01410.60660.02760.00360.17270.0070
TAMLeC0.83261.45520.82360.84780.81430.74320.77480.85860.82240.83940.82592.9705

Few-shot (global)

MethodP@1 (no FT)R@1 (no FT)P@2 (no FT)R@2 (no FT)P@3 (no FT)nDCG@3 (no FT)Micro-P (no FT)Micro-R (no FT)Micro-F1 (no FT)Macro-P (no FT)Macro-R (no FT)Macro-F1 (no FT)P@1 (FT)R@1 (FT)P@2 (FT)R@2 (FT)P@3 (FT)nDCG@3 (FT)Micro-P (FT)Micro-R (FT)Micro-F1 (FT)Macro-P (FT)Macro-R (FT)Macro-F1 (FT)
CascadeXML0.73750.10420.72600.07610.70130.74200.75890.53380.62680.63030.39020.48200.52740.14990.56880.09220.57870.61080.59130.36220.44920.66590.29060.4045
AttentionXML0.73480.03390.72610.03030.70420.74410.75410.53810.62780.62570.43030.50950.57020.21550.59050.12510.59530.62980.59160.41850.49010.67900.35410.4654
LightXML0.73590.03390.72580.03030.70090.74190.77270.51720.61940.63550.38150.47650.56630.21580.59420.12850.59820.63400.62670.37570.46970.67010.30480.4188
MATCH0.74000.03490.72800.03100.70340.74380.74460.55010.63270.62630.42730.50790.55330.23410.57960.13120.58750.62070.54330.45280.49390.66880.38290.4869
XML-CNN0.69730.05400.68780.04660.66200.70430.78590.42490.55160.63900.26480.37430.35910.10190.40770.06590.42110.44340.55840.11520.19060.33060.04590.0805
Hector0.70320.08320.52370.04710.54860.59950.57420.18130.27510.47460.16480.24400.20070.03110.16750.01800.18180.19430.17210.03760.05710.16370.04530.0438
TAMLeC0.81422.64250.62805.65540.71580.75220.48040.71160.56870.43120.68270.52240.81422.64250.62805.65540.71580.75220.48040.71160.56870.43120.68270.5224

Few-shot (task)

MethodP@1 (no FT)R@1 (no FT)P@2 (no FT)R@2 (no FT)P@3 (no FT)nDCG@3 (no FT)Micro-P (no FT)Micro-R (no FT)Micro-F1 (no FT)Macro-P (no FT)Macro-R (no FT)Macro-F1 (no FT)P@1 (FT)R@1 (FT)P@2 (FT)R@2 (FT)P@3 (FT)nDCG@3 (FT)Micro-P (FT)Micro-R (FT)Micro-F1 (FT)Macro-P (FT)Macro-R (FT)Macro-F1 (FT)
CascadeXML0.02730.03440.07570.02650.09560.09640.00000.00000.00000.00000.00000.00000.81060.63320.74970.25320.70430.74560.87310.66260.75340.47710.30480.3719
AttentionXML0.02570.04070.04960.03000.06460.06500.00000.00000.00000.00000.00000.00000.77920.61220.72340.27240.68850.72150.83550.63740.72310.54740.30400.3908
LightXML0.02720.04260.05550.03140.06810.06930.00000.00000.00000.00000.00000.00000.80940.66450.74920.26980.69270.73490.88110.64540.74500.45840.28890.3544
MATCH0.01540.03940.03960.03030.04160.03970.00000.00000.00000.00000.00000.00000.84110.72200.81710.34590.79730.82670.85250.75610.80120.68400.52130.5914
XML-CNN0.03060.04430.04790.03060.06180.06170.00000.00000.00000.00000.00000.00000.69120.51320.59320.16380.54580.58920.89740.35230.50420.32040.11290.1666
Hector0.00010.02390.00110.02340.00140.00160.00000.00000.00000.00000.00000.00000.81220.46520.66260.19980.65430.70160.60600.53440.54940.50080.45660.4591
TAMLeC0.038522.06940.097537.44090.14040.15320.04400.00090.00170.00090.00050.00070.67343.28860.52017.98730.47410.53070.45510.60910.52090.35190.39680.3730

OAXMLC-Med Concepts

Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
AttentionXML
LightXML
Metrics: Precision@1 (P@1). Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
AttentionXML
MATCH
Hector
Metrics: Recall@1 (R@1). Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 1 Methods
Hector
Metrics: P@2. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 1 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 1 Methods
Hector
Metrics: R@2. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 1 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
AttentionXML
XML-CNN
Metrics: Micro-P. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
NGAME
DEXA
MATCH
Metrics: Micro-R. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
AttentionXML
CascadeXML
Metrics: Micro-F1. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
AttentionXML
Hector
Metrics: Macro-P. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
NGAME
DEXA
MATCH
Metrics: Macro-R. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
AttentionXML
CascadeXML
Metrics: Macro-F1. Per-level is avg of L1/L2/L3. Few-shot (task, FT). Top 3 by avg of the four axes.

Classification

MethodP@1R@1P@2R@2P@3nDCG@3Micro-PMicro-RMicro-F1Macro-PMacro-RMacro-F1
FastXML0.66430.09830.60730.02590.57190.60030.81230.21090.33330.22060.03910.0663
Parabel0.73360.06950.64450.01560.59450.63320.51360.40100.45020.32850.14120.1975
CascadeXML0.94020.52040.90220.07060.86750.88870.82170.67460.74090.59810.41020.4866
AttentionXML0.95020.66740.91570.23720.88640.90540.80610.73070.76600.61520.47870.5375
LightXML0.94130.58560.89880.21170.86000.88340.83900.63120.72040.57720.35280.4379
MATCH0.93530.57400.88960.19800.84610.87140.80520.63680.71120.57940.38920.4655
XML-CNN0.90920.53740.84960.17770.79900.82970.84200.49590.62420.52820.22870.3191
Hector0.90610.46420.80220.07860.74250.78150.83350.13450.23150.55860.11860.1954
DEXA0.87550.34370.83770.06510.78200.80890.04410.93820.08420.03550.85040.0681
NGAME0.87700.32120.83380.06120.77970.80720.04300.91590.08220.02950.84690.0571

Classification (per level)

MethodP@1 (L1)R@1 (L1)P@2 (L1)R@2 (L1)P@3 (L1)nDCG@3 (L1)Micro-P (L1)Micro-R (L1)Micro-F1 (L1)Macro-P (L1)Macro-R (L1)Macro-F1 (L1)P@1 (L2)R@1 (L2)P@2 (L2)R@2 (L2)P@3 (L2)nDCG@3 (L2)Micro-P (L2)Micro-R (L2)Micro-F1 (L2)Macro-P (L2)Macro-R (L2)Macro-F1 (L2)P@1 (L3)R@1 (L3)P@2 (L3)R@2 (L3)P@3 (L3)nDCG@3 (L3)Micro-P (L3)Micro-R (L3)Micro-F1 (L3)Macro-P (L3)Macro-R (L3)Macro-F1 (L3)
FastXML0.66430.15150.59080.06090.56840.60470.80080.24970.37900.50120.10300.17050.49660.02040.52050.01510.54690.58720.85070.15240.25730.17660.03020.05150.61180.07200.70580.07810.75350.77530.83570.04550.08600.12570.01190.0216
Parabel0.71830.14580.62410.04770.58850.63440.49530.46110.47740.52240.31420.39240.54820.01110.54210.00630.54590.59320.57870.30700.40120.31060.11750.17040.60790.03520.67160.02640.70830.74260.49220.17910.26260.21400.06850.1037
CascadeXML0.93990.62200.88870.22020.86160.88800.83630.70900.76740.80340.62500.70310.79360.03250.77270.01680.76560.80670.79980.62310.70040.57810.38240.46030.81780.07290.84240.05200.85120.87720.73250.52890.61430.47720.31480.3793
AttentionXML0.94660.75240.90440.32370.88320.90630.81810.76560.79030.78210.69870.73720.82690.26810.81000.14580.79800.83530.78790.67950.72920.59680.45270.51390.86600.56460.87940.29640.87890.89900.73290.57460.64340.53330.37040.4357
LightXML0.93960.71200.88250.28620.85080.87980.85140.66610.74740.81120.57540.67330.78460.22680.76520.12650.75710.79850.81880.58010.67910.55360.32470.40930.83160.51800.84980.29540.86680.88570.76840.47240.58510.44370.24980.3196
MATCH0.93570.70760.87180.27560.83470.86660.82640.66920.73950.77730.57800.66300.76440.19490.74230.10600.73430.77810.77220.58950.66850.55970.36620.44260.80780.44180.83380.28550.85680.87180.69470.48970.57450.46710.29960.3649
XML-CNN0.90650.62220.82480.22770.78360.82170.85150.53740.65890.80900.40730.54170.71540.19400.70000.10670.69930.74330.82560.43400.56890.49670.20540.29060.78910.48050.81620.29470.84230.86240.77180.31820.45060.38480.14930.2151
Hector0.90610.57690.85470.19280.82850.85540.95970.12730.22480.89120.12210.21470.63410.03240.62130.01200.61960.65280.83580.14680.24940.53720.11490.18910.73880.10940.76670.08410.78010.80940.28240.14970.19540.28460.13130.1794
DEXA0.88710.44660.79460.12000.73920.78140.05140.95950.09760.05180.91140.09790.72020.02850.66980.01260.64790.70050.03550.90190.06840.03350.83860.06440.75720.06470.74920.04380.78400.81850.02920.88810.05660.02760.83990.0534
NGAME0.88080.43950.79200.11230.73830.77950.06640.93260.12400.05840.89610.10960.71780.02740.66810.01140.64480.69790.02760.88730.05360.02560.83690.04970.74990.06220.73700.03480.74650.78700.01730.87900.03390.01700.84230.0333

Completion

MethodP@1R@1P@3nDCG@3Micro-PMicro-RMicro-F1Macro-PMacro-RMacro-F1P@2R@2
FastXML0.49660.01800.54980.58860.85030.14190.24200.16630.02670.0460
Parabel0.54920.00950.54950.59570.57280.29410.38870.29100.10770.1573
CascadeXML0.79370.02750.77330.81110.79350.61340.69190.55840.36870.4441
AttentionXML0.82720.24870.80470.83880.78280.66870.72070.58290.43620.4980
LightXML0.78470.21140.76670.80500.81440.56910.67000.53200.30980.3916
MATCH0.76390.17760.74530.78530.76500.57930.65930.54110.35270.4269
XML-CNN0.71560.18280.70900.75060.82130.42230.55780.47400.19420.2755
Hector0.84960.73720.77860.81010.89710.67690.77160.82910.61630.70700.80890.3372
DEXA0.71960.02830.67470.72070.03480.90050.06690.03230.83860.0622
NGAME0.71760.02700.67200.71900.02600.88640.05050.02400.83740.0466
TAMLeC0.90221.17800.82190.84580.91800.83370.86970.88620.79080.83090.85562.5332

Few-shot (global)

MethodP@1 (no FT)R@1 (no FT)P@2 (no FT)R@2 (no FT)P@3 (no FT)nDCG@3 (no FT)Micro-P (no FT)Micro-R (no FT)Micro-F1 (no FT)Macro-P (no FT)Macro-R (no FT)Macro-F1 (no FT)P@1 (FT)R@1 (FT)P@2 (FT)R@2 (FT)P@3 (FT)nDCG@3 (FT)Micro-P (FT)Micro-R (FT)Micro-F1 (FT)Macro-P (FT)Macro-R (FT)Macro-F1 (FT)
CascadeXML0.79100.02600.77930.01450.76990.80820.78760.61630.69150.55500.37040.44420.78160.02490.77120.01410.76340.80230.74360.63610.68560.53560.38770.4497
AttentionXML0.81470.17590.80200.10410.79180.82690.78500.64450.70760.57980.41340.48220.80120.16770.78670.09390.77470.81200.87110.50020.63360.58570.29480.3907
LightXML0.80530.17410.79440.10560.78500.82160.81770.60280.69400.57430.34980.43470.78870.15640.77860.09630.77130.80910.80870.57820.67410.57610.32980.4194
MATCH0.77390.16270.76020.09490.75150.79120.77180.58840.66760.54880.35780.43320.75430.11120.74110.06540.73370.77560.73200.58610.65080.52370.33580.4090
XML-CNN0.71660.16040.71200.09400.70870.75030.82000.42810.56250.46810.19620.27650.69190.13110.68900.08150.68750.72990.76730.44830.56540.46180.21020.2886
Hector0.64470.03190.62120.01390.61760.64870.68880.15760.25650.49960.12740.20300.61760.02870.60560.01230.60710.63530.66300.15050.24540.46990.11860.1894
DEXA0.84300.14780.80030.02910.73670.76720.03910.83300.07470.15090.66810.24630.83630.13190.78560.02770.71680.75040.03640.77410.06950.07650.72010.1384
NGAME0.87720.35690.83780.06230.78260.80990.04420.93850.08450.03670.85240.07050.87690.24260.83510.05250.77620.80490.04140.87850.07910.02950.86520.0570
TAMLeC0.90141.21900.84072.63630.82270.84890.86740.82910.84270.85050.77960.80770.90141.21900.84072.63630.82270.84890.86740.82910.84270.85050.77960.8077

Few-shot (task)

MethodP@1 (no FT)R@1 (no FT)P@2 (no FT)R@2 (no FT)Micro-P (no FT)Micro-R (no FT)Micro-F1 (no FT)Macro-P (no FT)Macro-R (no FT)Macro-F1 (no FT)P@1 (FT)R@1 (FT)P@2 (FT)R@2 (FT)Micro-P (FT)Micro-R (FT)Micro-F1 (FT)Macro-P (FT)Macro-R (FT)Macro-F1 (FT)
CascadeXML0.48280.66271.00000.50000.00000.00000.00000.00000.00000.00000.48280.66271.00000.50000.00000.00000.00000.00000.00000.0000
AttentionXML0.63660.73801.00000.50000.00000.00000.00000.00000.00000.00000.82000.85171.00000.50000.34000.04340.06970.20140.03990.0622
LightXML0.52300.69221.00000.50000.00000.00000.00000.00000.00000.00000.75160.81291.00000.50000.00000.00000.00000.00000.00000.0000
MATCH0.58940.71521.00000.50000.00000.00000.00000.00000.00000.00000.95400.95611.00000.50000.94140.50750.63020.83040.49510.6004
XML-CNN0.58180.70681.00000.50000.00000.00000.00000.00000.00000.00000.64250.74161.00000.50000.00000.00000.00000.00000.00000.0000
Hector0.51700.67441.00000.50000.00000.00000.00000.00000.00000.00000.51700.67441.00000.50000.00000.00000.00000.00000.00000.0000
DEXA0.54290.68631.00000.50000.03090.08110.04480.01840.07890.02990.58570.70711.00000.50000.00470.71620.00940.00470.71420.0093
NGAME0.66200.74741.00000.50000.04290.53850.07940.05160.52380.09400.83100.85541.00000.50000.01220.79490.02410.01760.78770.0345
TAMLeC0.31312.11110.27780.20200.23390.09260.18520.12350.38381.75760.37040.10100.15870.43860.10290.1667

OAXMLC-Med Topics

Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
Hector
MATCH
CascadeXML
Metrics: Precision@1 (P@1). Per-level is avg of L1/L2. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
Hector
MATCH
CascadeXML
Metrics: Recall@1 (R@1). Per-level is avg of L1/L2. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 1 Methods
Hector
Metrics: P@2. Per-level is avg of L1/L2. Few-shot (task, FT). Top 1 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 1 Methods
Hector
Metrics: R@2. Per-level is avg of L1/L2. Few-shot (task, FT). Top 1 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
Hector
LightXML
XML-CNN
Metrics: Micro-P. Per-level is avg of L1/L2. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
NGAME
DEXA
MATCH
Metrics: Micro-R. Per-level is avg of L1/L2. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
MATCH
CascadeXML
LightXML
Metrics: Micro-F1. Per-level is avg of L1/L2. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
Hector
MATCH
CascadeXML
Metrics: Macro-P. Per-level is avg of L1/L2. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
NGAME
DEXA
MATCH
Metrics: Macro-R. Per-level is avg of L1/L2. Few-shot (task, FT). Top 3 by avg of the four axes.
Radar Summary
ClassificationPer-levelCompletionFew-shot
Top 3 Methods
Hector
MATCH
CascadeXML
Metrics: Macro-F1. Per-level is avg of L1/L2. Few-shot (task, FT). Top 3 by avg of the four axes.

Classification

MethodP@1R@1P@2R@2P@3nDCG@3Micro-PMicro-RMicro-F1Macro-PMacro-RMacro-F1
FastXML0.76330.39030.72130.17650.66560.69420.83870.42890.56750.65990.23850.3503
Parabel0.81530.52610.76670.18770.68580.72050.83910.39580.53790.77190.30200.4341
CascadeXML0.95820.86170.93450.38180.85280.88060.88290.78640.83180.84480.71430.7740
AttentionXML0.96560.90060.94690.41690.86700.89350.88460.80090.84050.86620.76170.8104
LightXML0.95570.84830.93440.37650.85070.87880.89130.77710.83030.85830.70900.7765
MATCH0.95360.84970.93000.38190.84590.87420.86750.78870.82620.83280.73820.7826
XML-CNN0.92720.79300.89060.33890.80840.83940.89040.67400.76720.81720.52110.6364
Hector0.93880.84480.86330.37150.79980.82790.91520.48730.63480.87170.57120.6897
DEXA0.91020.56260.88450.18030.80670.83200.07200.95410.13390.05860.94490.1103
NGAME0.91280.53300.88340.17120.80560.83170.07340.95130.13630.05980.94630.1124

Classification (per level)

MethodP@1 (L1)R@1 (L1)P@2 (L1)R@2 (L1)P@3 (L1)nDCG@3 (L1)Micro-P (L1)Micro-R (L1)Micro-F1 (L1)Macro-P (L1)Macro-R (L1)Macro-F1 (L1)P@1 (L2)R@1 (L2)P@2 (L2)R@2 (L2)P@3 (L2)nDCG@3 (L2)Micro-P (L2)Micro-R (L2)Micro-F1 (L2)Macro-P (L2)Macro-R (L2)Macro-F1 (L2)
FastXML0.76330.49440.65860.14170.65030.68550.83570.42980.56760.79100.29240.42690.67570.31720.73650.18980.75350.80000.84420.42730.56730.58610.20810.3071
Parabel0.80000.51280.66950.13720.65440.69650.83040.38180.52310.77520.30430.43700.73910.43530.75870.22430.76090.80890.85420.42040.56340.76850.30000.4315
CascadeXML0.95850.89560.83750.27140.81480.85140.86820.74930.80440.85400.67750.75560.90410.68240.88800.31400.88750.91400.90650.85130.87800.83960.73510.7838
LightXML0.95620.88510.83200.25990.80760.84570.87610.73450.79900.86310.65440.74430.90730.70210.89320.33370.87530.90700.91520.85160.88220.85560.73980.7934
MATCH0.95380.88670.82900.27270.80570.84310.85330.74800.79710.82890.67970.74690.90100.72370.88950.34630.88110.90940.89020.85990.87480.83510.77110.8017
XML-CNN0.92610.82940.79750.24380.77550.81470.87810.65420.74980.83680.53780.65480.85660.64970.85940.32620.86000.89350.91110.70860.79720.80610.51170.6260
Hector0.95790.92540.87000.30520.85740.88440.97810.41120.57900.95950.43320.59680.79130.59800.75340.23800.71090.73820.86240.58600.69780.83100.63560.7202
DEXA0.91600.70410.80840.17060.77330.80830.10220.95600.18460.09020.93620.16450.83830.20450.79820.06570.78280.83410.04730.95060.09010.04080.94980.0782
NGAME0.91880.67620.80790.16400.77330.81040.10020.95390.18140.08970.93570.16370.83330.18650.79600.06680.79780.84550.04980.94670.09470.04290.95230.0821

Completion

MethodP@1R@1P@3nDCG@3Micro-PMicro-RMicro-F1Macro-PMacro-RMacro-F1P@2R@2
FastXML0.67570.31720.75350.80000.84420.42730.56730.58610.20810.3071
Parabel0.73930.43530.75790.80580.85380.42060.56360.77000.30060.4324
CascadeXML0.90410.68240.88750.91400.90650.85130.87800.83960.73510.7838
AttentionXML0.92950.81850.93250.94950.91030.88000.89480.87640.80450.8387
LightXML0.90730.70210.87530.90700.91520.85160.88220.85560.73980.7934
MATCH0.90100.72370.88110.90940.89020.85990.87480.83510.77110.8017
XML-CNN0.85660.64970.86000.89350.91110.70860.79720.80610.51170.6260
Hector0.99490.99431.00001.00000.99520.96440.97960.99000.94750.96830.98460.4924
DEXA0.83830.20450.78280.83410.04730.95060.09010.04080.94980.0782
NGAME0.83330.18650.79780.84550.04980.94670.09470.04290.95230.0821
TAMLeC0.97401.02960.98550.98660.97610.95380.96410.96880.93770.95160.95392.1082

Few-shot (global)

MethodP@1 (no FT)R@1 (no FT)P@2 (no FT)R@2 (no FT)P@3 (no FT)nDCG@3 (no FT)Micro-P (no FT)Micro-R (no FT)Micro-F1 (no FT)Macro-P (no FT)Macro-R (no FT)Macro-F1 (no FT)P@1 (FT)R@1 (FT)P@2 (FT)R@2 (FT)P@3 (FT)nDCG@3 (FT)Micro-P (FT)Micro-R (FT)Micro-F1 (FT)Macro-P (FT)Macro-R (FT)Macro-F1 (FT)
CascadeXML0.77770.09130.77910.05070.78330.82950.86840.70630.77860.70630.61670.65810.21530.36100.55560.19220.66610.61180.36000.45300.40110.59220.24830.3493
AttentionXML0.73210.05970.75230.03410.77200.81600.83940.62540.71260.65900.51890.57760.23350.36240.56370.19230.66980.63690.36700.49420.42070.64620.31200.4176
LightXML0.75610.06690.75200.03820.74430.79120.87710.66420.75550.61680.52190.56530.19290.41210.56660.22750.67290.61370.36550.46500.40920.59340.25530.3565
MATCH0.76050.10590.76690.06210.77320.81520.81990.70710.75930.71670.65970.68690.19600.43580.57740.25420.70190.63670.40640.55810.47030.75360.38870.5125
XML-CNN0.68540.06380.71930.03690.73470.78190.87680.49920.63600.56840.31590.40600.16590.36390.53170.22650.64960.59510.27830.30960.29310.50090.11240.1835
Hector0.71150.10810.68540.05930.65630.69400.80910.54250.64950.76380.64930.70190.13440.10060.29750.03380.35550.34810.13480.12430.12940.00530.02020.0082
DEXA0.87100.40660.83130.11000.74020.77020.06010.91870.11290.05080.95380.09640.86390.45810.80770.11010.72440.75470.07760.85530.14220.06840.86840.1267
NGAME0.87150.42470.83200.11130.73840.76840.06000.91800.11260.05080.95030.09650.82970.35000.77050.08140.69010.72010.08320.81850.15100.07440.84940.1369
TAMLeC0.95971.04490.94312.11790.99690.99730.96300.93860.94980.95250.90930.92780.95971.04490.94312.11790.99690.99730.96300.93860.94910.95180.90800.9268

Few-shot (task)

MethodP@1 (no FT)R@1 (no FT)P@2 (no FT)R@2 (no FT)Micro-P (no FT)Micro-R (no FT)Micro-F1 (no FT)Macro-P (no FT)Macro-R (no FT)Macro-F1 (no FT)P@1 (FT)R@1 (FT)P@2 (FT)R@2 (FT)Micro-P (FT)Micro-R (FT)Micro-F1 (FT)Macro-P (FT)Macro-R (FT)Macro-F1 (FT)
CascadeXML0.15210.38080.59560.35700.00000.00000.00000.00000.00000.00000.95890.95660.93420.47040.96010.95460.95730.53340.42750.4684
AttentionXML0.27840.48680.67280.37800.00000.00000.00000.00000.00000.00000.94410.94270.92030.46550.93620.94180.93890.46840.37860.4171
LightXML0.33020.53100.61420.36390.00000.00000.00000.00000.00000.00000.94650.94490.92030.46550.94740.94350.94550.41880.35250.3760
MATCH0.37480.52740.66220.37930.00000.00000.00000.00000.00000.00000.98780.98650.86350.44330.98600.98540.98570.77380.67480.7176
XML-CNN0.34810.51380.64380.37200.00000.00000.00000.00000.00000.00000.94410.94270.92030.46550.94420.94100.94260.31470.33310.3237
Hector0.00530.34060.54400.34390.00000.00000.00000.00000.00000.00000.99510.99360.92010.46600.99540.99250.99400.86630.84800.8568
DEXA0.29350.44360.62500.36360.08070.33810.13030.07720.57810.13620.89080.86481.00000.50000.13150.93950.23070.07710.83510.1411
NGAME0.30350.44380.66670.37500.07940.34500.12910.06500.51130.11530.91930.89260.91670.46150.12520.95910.22150.06800.83920.1259
TAMLeC0.28561.92341.00002.00000.14790.02990.04980.29870.03840.06810.98291.02241.00002.00000.98290.98050.98170.65030.59490.6214

References

[1] Zhang, Y., Shen, Z., Dong, Y., Wang, K., & Han, J. (2021, April). MATCH: Metadata-aware text classification in a large hierarchy. In Proceedings of the Web Conference 2021 (pp. 3246–3257).

[2] Liu, J., Chang, W. C., Wu, Y., & Yang, Y. (2017, August). Deep learning for extreme multi-label text classification. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 115–124).

[3] You, R., Zhang, Z., Wang, Z., Dai, S., Mamitsuka, H., & Zhu, S. (2019). AttentionXML: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. Advances in Neural Information Processing Systems, 32.

[4] Jain, H., Prabhu, Y., & Varma, M. (2016, August). Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 935–944).

[5] Ostapuk, N., Audiffren, J., Dolamic, L., Mermoud, A., & CudrΓ©-Mauroux, P. (2024, May). Follow the Path: Hierarchy-Aware Extreme Multi-Label Completion for Semantic Text Tagging. In Proceedings of the ACM on Web Conference 2024 (pp. 2094–2105).

[6] Audiffren, J., Broillet, C., Dolamic, L., & CudrΓ©-Mauroux, P. (2024). Extreme Multi-label Completion for Semantic Document Labelling with Taxonomy-Aware Parallel Learning. arXiv preprint arXiv:2412.13809.

[7] Jiang, T., Wang, D., Sun, L., Yang, H., Zhao, Z., & Zhuang, F. (2021, May). LightXML: Transformer with dynamic negative sampling for high-performance extreme multi-label text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, 35(9), 7987–7994.

[8] Kharbanda, S., Banerjee, A., Schultheis, E., & Babbar, R. (2022). CascadeXML: Rethinking transformers for end-to-end multi-resolution training in extreme multi-label classification. Advances in Neural Information Processing Systems, 35, 2074–2087.

[9] Prabhu, Y., Kag, A., Harsola, S., Agrawal, R., & Varma, M. (2018, April). Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In Proceedings of the 2018 World Wide Web Conference (pp. 993–1002).

[10] Dahiya, K., Gupta, N., Saini, D., Soni, A., Wang, Y., Dave, K., Jiao, J., Gururaj, K., Dey, P., Singh, A., Hada, D., Jain, V., Paliwal, B., Mittal, A., Mehta, S., Ramjee, R., Agarwal, S., Kar, P., & Varma, M. (2023, March). NGAME: Negative mining-aware mini-batching for extreme classification. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining (pp. 258–266).

[11] Dahiya, K., Yadav, S., Sondhi, S., Saini, D., Mehta, S., Jiao, J., Agarwal, S., Kar, P., & Varma, M. (2023). Deep Encoders with Auxiliary Parameters for Extreme Classification. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’23), 358–367.