enron

mldr.datasets::get.mldr("enron")

Select your download

Partitions: select your desired partitioning strategy, validation and format

Random Stratified Iterative stratified
Hold out MULAN MEKA LibSVM KEEL mldr MULAN MEKA LibSVM KEEL mldr MULAN MEKA LibSVM KEEL mldr
2x5-fold cross validation MULAN MEKA LibSVM KEEL mldr MULAN MEKA LibSVM KEEL mldr MULAN MEKA LibSVM KEEL mldr
10-fold cross validation MULAN MEKA LibSVM KEEL mldr MULAN MEKA LibSVM KEEL mldr MULAN MEKA LibSVM KEEL mldr

Summary

Instances 1702
Attributes 1054
Inputs 1001
Labels 53
Labelsets 753
Single labelsets 573
Max frequency 163
Cardinality 3.3784
Density 0.0637
Mean IR 73.9528
SCUMBLE 0.3028
TCS 17.5031

Citation

Klimt, B.; Yang, Y. (2004). The Enron Corpus: A New Dataset for Email Classification Research. In Proc. ECML04, Pisa, Italy, 217--226.
@incollection{,
  author = "Klimt, B. and Yang, Y.",
  title = "The Enron Corpus: A New Dataset for Email Classification Research",
  booktitle = "Proc. ECML04, Pisa, Italy",
  pages = "217--226",
  year = "2004"
}

Concurrence plot

In this concurrence plot, sectors represent labels and links between them depict label co-occurrences. SCUMBLE is a measure designed to assess the concurrence among imbalanced labels.