Abstract:How to improve the accuracy of load shedding under the premise of ensuring real-time performance is an important problem. Sparsity is a widespread feature of the big data stream. Therefore, we propose two load-shedding methods of the big data stream with sparsity in two scenarios. In the normal business scenario, we model the big data stream with the high dimensional space. Then we propose a load shedding method based on centrifugation, which uses the elastic distance to measure the distance of data. In the anomaly-monitoring scenario, we analyze the feature of the big data stream and propose a load shedding method based on equivalence class, which uses the combined similarity to divide the data set into equivalence classes. The combined similarity was composed of processing behavior similarity and data similarity to measure the difference between data. Repeated test results show that the two load shedding methods in this paper can significantly improve the accuracy compared with the conventional load shedding methods.