Klimt, B.; Yang, Y. (2004). The Enron Corpus: A New Dataset for Email Classification Research. In Proc. ECML04, Pisa, Italy, 217--226.
enron
mldr.datasets::get.mldr("enron")
Summary
| Instances | 1702 |
|---|---|
| Attributes | 1054 |
| Inputs | 1001 |
| Labels | 53 |
| Labelsets | 753 |
| Single labelsets | 573 |
| Max frequency | 163 |
| Cardinality | 3.3784 |
| Density | 0.0637 |
| Mean IR | 73.9528 |
| SCUMBLE | 0.3028 |
| TCS | 17.5031 |
Citation
Klimt, B.; Yang, Y. (2004). The Enron Corpus: A New Dataset for Email Classification Research. In Proc. ECML04, Pisa, Italy, 217--226.
@incollection{,
author = "Klimt, B. and Yang, Y.",
title = "The Enron Corpus: A New Dataset for Email Classification Research",
booktitle = "Proc. ECML04, Pisa, Italy",
pages = "217--226",
year = "2004"
}Concurrence plot
In this concurrence plot, sectors represent labels and links between them depict label co-occurrences. SCUMBLE is a measure designed to assess the concurrence among imbalanced labels.