Lewis, D. D.; Yang, Y.; Rose, T. G.; Li, F. (2004). RCV1: A new benchmark collection for text categorization research. In The Journal of Machine Learning Research, 5(), 361--397.
rcv1sub5
Summary
Instances | 6000 |
---|---|
Attributes | 47336 |
Inputs | 47235 |
Labels | 101 |
Labelsets | 946 |
Single labelsets | 586 |
Max frequency | 526 |
Cardinality | 2.6415 |
Density | 0.0262 |
Mean IR | 69.6815 |
SCUMBLE | 0.2381 |
TCS | 22.2303 |
Citation
Lewis, D. D.; Yang, Y.; Rose, T. G.; Li, F. (2004). RCV1: A new benchmark collection for text categorization research. In The Journal of Machine Learning Research, 5(), 361--397.
Concurrence plot
In this concurrence plot, sectors represent labels and links between them depict label co-occurrences. SCUMBLE is a measure designed to assess the concurrence among imbalanced labels.