Lewis, D. D.; Yang, Y.; Rose, T. G.; Li, F. (2004). RCV1: A new benchmark collection for text categorization research. In The Journal of Machine Learning Research, 5(), 361--397.
rcv1sub5
mldr.datasets::get.mldr("rcv1sub5")
Summary
Instances | 6000 |
---|---|
Attributes | 47336 |
Inputs | 47235 |
Labels | 101 |
Labelsets | 946 |
Single labelsets | 586 |
Max frequency | 526 |
Cardinality | 2.6415 |
Density | 0.0262 |
Mean IR | 69.6815 |
SCUMBLE | 0.2381 |
TCS | 22.2303 |
Citation
Lewis, D. D.; Yang, Y.; Rose, T. G.; Li, F. (2004). RCV1: A new benchmark collection for text categorization research. In The Journal of Machine Learning Research, 5(), 361--397.
@article{,
title="RCV1: A new benchmark collection for text categorization research",
author="Lewis, D. D. and Yang, Y. and Rose, T. G. and Li, F.",
journal="The Journal of Machine Learning Research",
volume="5",
pages="361--397",
year="2004"
}
Concurrence plot
In this concurrence plot, sectors represent labels and links between them depict label co-occurrences. SCUMBLE is a measure designed to assess the concurrence among imbalanced labels.