Lewis, D. D.; Yang, Y.; Rose, T. G.; Li, F. (2004). RCV1: A new benchmark collection for text categorization research. In The Journal of Machine Learning Research, 5(), 361--397.
rcv1sub5
mldr.datasets::get.mldr("rcv1sub5")
Summary
| Instances | 6000 | 
|---|---|
| Attributes | 47336 | 
| Inputs | 47235 | 
| Labels | 101 | 
| Labelsets | 946 | 
| Single labelsets | 586 | 
| Max frequency | 526 | 
| Cardinality | 2.6415 | 
| Density | 0.0262 | 
| Mean IR | 69.6815 | 
| SCUMBLE | 0.2381 | 
| TCS | 22.2303 | 
Citation
Lewis, D. D.; Yang, Y.; Rose, T. G.; Li, F. (2004). RCV1: A new benchmark collection for text categorization research. In The Journal of Machine Learning Research, 5(), 361--397.
@article{,
  title="RCV1: A new benchmark collection for text categorization research",
  author="Lewis, D. D. and Yang, Y. and Rose, T. G. and Li, F.",
  journal="The Journal of Machine Learning Research",
  volume="5",
  pages="361--397",
  year="2004"
}Concurrence plot
 
        In this concurrence plot, sectors represent labels and links between them depict label co-occurrences. SCUMBLE is a measure designed to assess the concurrence among imbalanced labels.