Dataset

Safety Risk Taxonomy

We have developed a comprehensive, fine-grained taxonomy of safety scenarios tailored for large-scale language models. This system encompasses several major categories of risk, such as violence, along with more specific subcategories like physical harm and psychological harm. Descriptions of these categories and subcategories are available within our study, offering a framework for a deep understanding of the various risks. Our taxonomy is not only thorough but also capable of identifying a range of nuanced risks that may emerge in practical applications.

Examples

The figure illustrates an example from each major risk category, with the caveat that only instances from the text and visual modalities are presented due to the unique characteristics of the audio modality.

Quality Scores

Results in the following figure reveal that while there are variations in scores between different categories (with the maximum score difference of 2.7 observed between categories CO and VI), the overall evaluations demonstrate consistently high scores across the board.

The following figure shows a comparison between our SafeBench and other datasets in terms of quality scores.