Lewis, D. D.; Yang, Y.; Rose, T. G.; Li, F. (2004). RCV1: A new benchmark collection for text categorization research. In The Journal of Machine Learning Research, 5(), 361--397.
rcv1sub1
mldr.datasets::get.mldr("rcv1sub1")
Summary
Instances | 6000 |
---|---|
Attributes | 47337 |
Inputs | 47236 |
Labels | 101 |
Labelsets | 1028 |
Single labelsets | 657 |
Max frequency | 246 |
Cardinality | 2.8797 |
Density | 0.0285 |
Mean IR | 54.4923 |
SCUMBLE | 0.2237 |
TCS | 22.3134 |
Citation
Lewis, D. D.; Yang, Y.; Rose, T. G.; Li, F. (2004). RCV1: A new benchmark collection for text categorization research. In The Journal of Machine Learning Research, 5(), 361--397.
@article{,
title="RCV1: A new benchmark collection for text categorization research",
author="Lewis, D. D. and Yang, Y. and Rose, T. G. and Li, F.",
journal="The Journal of Machine Learning Research",
volume="5",
pages="361--397",
year="2004"
}
Concurrence plot
In this concurrence plot, sectors represent labels and links between them depict label co-occurrences. SCUMBLE is a measure designed to assess the concurrence among imbalanced labels.