Hitesh Sajnani, Vaibhav Saini, Kusum Kumar , Eugenia Gabrielova , Pramit Choudary, Cristina Lopes (2013). The Yelp dataset challenge - Multilabel classification of Yelp reviews into relevant categories.
Yelp
mldr.datasets::get.mldr("Yelp")
Summary
| Instances | 10806 |
|---|---|
| Attributes | 676 |
| Inputs | 671 |
| Labels | 5 |
| Labelsets | 32 |
| Single labelsets | 0 |
| Max frequency | 2120 |
| Cardinality | 1.6383 |
| Density | 0.3277 |
| Mean IR | 2.8756 |
| SCUMBLE | 0.0332 |
| TCS | 11.5839 |
Citation
Hitesh Sajnani, Vaibhav Saini, Kusum Kumar , Eugenia Gabrielova , Pramit Choudary, Cristina Lopes (2013). The Yelp dataset challenge - Multilabel classification of Yelp reviews into relevant categories.
@online{,
title={The Yelp dataset challenge - Multilabel classification of Yelp reviews into relevant categories},
author={Hitesh Sajnani, Vaibhav Saini, Kusum Kumar , Eugenia Gabrielova , Pramit Choudary, Cristina Lopes},
year={2013},
url={https://www.ics.uci.edu/~vpsaini/}
}Concurrence plot
In this concurrence plot, sectors represent labels and links between them depict label co-occurrences. SCUMBLE is a measure designed to assess the concurrence among imbalanced labels.