Hitesh Sajnani, Vaibhav Saini, Kusum Kumar , Eugenia Gabrielova , Pramit Choudary, Cristina Lopes (2013). The Yelp dataset challenge - Multilabel classification of Yelp reviews into relevant categories.
Yelp
mldr.datasets::get.mldr("Yelp")
Summary
Instances | 10806 |
---|---|
Attributes | 676 |
Inputs | 671 |
Labels | 5 |
Labelsets | 32 |
Single labelsets | 0 |
Max frequency | 2120 |
Cardinality | 1.6383 |
Density | 0.3277 |
Mean IR | 2.8756 |
SCUMBLE | 0.0332 |
TCS | 11.5839 |
Citation
Hitesh Sajnani, Vaibhav Saini, Kusum Kumar , Eugenia Gabrielova , Pramit Choudary, Cristina Lopes (2013). The Yelp dataset challenge - Multilabel classification of Yelp reviews into relevant categories.
@online{,
title={The Yelp dataset challenge - Multilabel classification of Yelp reviews into relevant categories},
author={Hitesh Sajnani, Vaibhav Saini, Kusum Kumar , Eugenia Gabrielova , Pramit Choudary, Cristina Lopes},
year={2013},
url={https://www.ics.uci.edu/~vpsaini/}
}
Concurrence plot
In this concurrence plot, sectors represent labels and links between them depict label co-occurrences. SCUMBLE is a measure designed to assess the concurrence among imbalanced labels.