美国卡内基-梅隆大学Hanghang Tong和Jingrui He将来我校访问,并于11月10日下午2时在信电楼215会议室作学术报告。报告题目,演讲摘要,演讲者简历如下,望有兴趣者踊跃参加。
Rare Category Analysis
Abstract:
Rare category analysis is an open challenge in machine learning, and has a wealth of applications, such as financial fraud detection, network intrusion detection, astronomy, spam image detection, etc.
In this talk, I will first give a high level picture and talk about where machine learning can help in rare category analysis, including (1) initial class label discovery, (2) unsupervised rare instance selection, (3) feature selection, and (4) rare instance retrieval, etc. Then, I will focus on one particular task, - rare category detection (initial class label discovery). Our recent work focuses on for data with feature representation and graph data when different amount of prior information is available. For data with feature representation, given enough prior information about the data set, we proposed the nearest-neighbor-based methods, which essentially perform local density differential sampling. They are proven to be effective both theoretically and experimentally. On the other hand, when no prior information about the data set is available, we proposed the density-based-method, which makes use of the specially designed exponential families. For graph data, we designed two algorithms which take advantage of the global similarity between two examples. Given the same amount of information, the first algorithm performs better than state-of-the-art techniques; whereas given much less information, the second algorithm is comparable with state-of-the-art techniques.
Bio:
Jingrui He is a Ph.D candidate in machine learning department at
Fast Algorithms for Querying and Mining Large Graphs
Abstract:
Graphs appear in a wide range of settings and have posed a wealth of fascinating problems. In this talk, I will first sketch our recent progress on (1) querying (e.g., given a social network, how to measure the closeness between two persons? how to track it over time?); and (2) mining (e.g., how to identify abnormal behaviors of computer networks? In the case of virus attacks, which nodes are the best to immunize?) large graphs.
For the task of querying, our main finding is that many complex user-specific patterns on large graphs can be answered by means of proximity measurement. In other words, proximity allows us to query large graphs on the atomic levels. Then, I will focus on the fast computation of proximity. We developed a family of fast solutions to compute the proximity in several different scenarios. By carefully leveraging some important properties shared by many real graphs (e.g., the block-wise structure, the linear correlation, the skewness of real bipartite graphs, etc), we can often achieve orders of magnitude of speedup with little or no quality loss. For the task of mining, I will focus on anomaly detection. To this end, we proposed a family of example-based low-rank matrix approximation methods. The proposed algorithms are provably equal to or better than best known methods in both space and time, with the same accuracy. On real data sets, it is up to 112x faster than the best competitors, for the same accuracy.
Bio:
Hanghang Tong got his Ph.D in the Machine Learning Department at