Lewis, D. D.; Yang, Y.; Rose, T. G.; Li, F. (2004). RCV1: A new benchmark collection for text categorization research. In The Journal of Machine Learning Research, 5(), 361--397.
rcv1sub2
Summary
Instances | 6000 |
---|---|
Attributes | 47337 |
Inputs | 47236 |
Labels | 101 |
Labelsets | 954 |
Single labelsets | 589 |
Max frequency | 549 |
Cardinality | 2.6342 |
Density | 0.0261 |
Mean IR | 45.5138 |
SCUMBLE | 0.2092 |
TCS | 22.2387 |
Citation
Lewis, D. D.; Yang, Y.; Rose, T. G.; Li, F. (2004). RCV1: A new benchmark collection for text categorization research. In The Journal of Machine Learning Research, 5(), 361--397.
Concurrence plot
In this concurrence plot, sectors represent labels and links between them depict label co-occurrences. SCUMBLE is a measure designed to assess the concurrence among imbalanced labels.