rcv1sub1

mldr.datasets::get.mldr("rcv1sub1")

Select your download

Partitions: select your desired partitioning strategy, validation and format

Random Stratified Iterative stratified
Hold out MULAN MEKA LibSVM KEEL mldr MULAN MEKA LibSVM KEEL mldr MULAN MEKA LibSVM KEEL mldr
2x5-fold cross validation MULAN MEKA LibSVM KEEL mldr MULAN MEKA LibSVM KEEL mldr MULAN MEKA LibSVM KEEL mldr
10-fold cross validation MULAN MEKA LibSVM KEEL mldr MULAN MEKA LibSVM KEEL mldr MULAN MEKA LibSVM KEEL mldr

Summary

Instances 6000
Attributes 47337
Inputs 47236
Labels 101
Labelsets 1028
Single labelsets 657
Max frequency 246
Cardinality 2.8797
Density 0.0285
Mean IR 54.4923
SCUMBLE 0.2237
TCS 22.3134

Citation

Lewis, D. D.; Yang, Y.; Rose, T. G.; Li, F. (2004). RCV1: A new benchmark collection for text categorization research. In The Journal of Machine Learning Research, 5(), 361--397.
@article{,
  title="RCV1: A new benchmark collection for text categorization research",
  author="Lewis, D. D. and Yang, Y. and Rose, T. G. and Li, F.",
  journal="The Journal of Machine Learning Research",
  volume="5",
  pages="361--397",
  year="2004"
}

Concurrence plot

In this concurrence plot, sectors represent labels and links between them depict label co-occurrences. SCUMBLE is a measure designed to assess the concurrence among imbalanced labels.