Klimt, B.; Yang, Y. (2004). The Enron Corpus: A New Dataset for Email Classification Research. In Proc. ECML04, Pisa, Italy, 217--226.
enron
mldr.datasets::get.mldr("enron")
Summary
Instances | 1702 |
---|---|
Attributes | 1054 |
Inputs | 1001 |
Labels | 53 |
Labelsets | 753 |
Single labelsets | 573 |
Max frequency | 163 |
Cardinality | 3.3784 |
Density | 0.0637 |
Mean IR | 73.9528 |
SCUMBLE | 0.3028 |
TCS | 17.5031 |
Citation
Klimt, B.; Yang, Y. (2004). The Enron Corpus: A New Dataset for Email Classification Research. In Proc. ECML04, Pisa, Italy, 217--226.
@incollection{,
author = "Klimt, B. and Yang, Y.",
title = "The Enron Corpus: A New Dataset for Email Classification Research",
booktitle = "Proc. ECML04, Pisa, Italy",
pages = "217--226",
year = "2004"
}
Concurrence plot
In this concurrence plot, sectors represent labels and links between them depict label co-occurrences. SCUMBLE is a measure designed to assess the concurrence among imbalanced labels.