Lewis, D. D.; Yang, Y.; Rose, T. G.; Li, F. (2004). RCV1: A new benchmark collection for text categorization research. In The Journal of Machine Learning Research, 5(), 361--397.
rcv1sub4
mldr.datasets::get.mldr("rcv1sub4")
Summary
Instances | 6000 |
---|---|
Attributes | 47330 |
Inputs | 47229 |
Labels | 101 |
Labelsets | 816 |
Single labelsets | 491 |
Max frequency | 950 |
Cardinality | 2.4837 |
Density | 0.0246 |
Mean IR | 89.3713 |
SCUMBLE | 0.2165 |
TCS | 22.0823 |
Citation
Lewis, D. D.; Yang, Y.; Rose, T. G.; Li, F. (2004). RCV1: A new benchmark collection for text categorization research. In The Journal of Machine Learning Research, 5(), 361--397.
@article{,
title="RCV1: A new benchmark collection for text categorization research",
author="Lewis, D. D. and Yang, Y. and Rose, T. G. and Li, F.",
journal="The Journal of Machine Learning Research",
volume="5",
pages="361--397",
year="2004"
}
Concurrence plot
In this concurrence plot, sectors represent labels and links between them depict label co-occurrences. SCUMBLE is a measure designed to assess the concurrence among imbalanced labels.