Abstract:Most of the current credit evaluation models designed for large samples have no in-depth consideration of the distribution characteristics of large samples,and just simply apply traditional evaluation methods to large samples. This paper firstly proposes the concept and definition of the related attribute set,boundary vector and so on to describe the distribution characteristics of large samples and proves their main attributes. Then the characteristics of sample distribution are studied in the aspect of similarity based on two large sample data sets. Finally,a hybrid large sample credit evaluation model: HLSCE model is designed. The key idea of HLSCE model is that in large sample data sets,the contribution of the same attribute of samples in different local areas are different to classification performance. Specifically,HLSCE model divides,with biological heuristic algorithm,the whole data set into several subsets according to the similarity between samples and boundary vectors,and then trains the basic classifiers respectively on each subsets. The empirical study shows that compared with the existing representative credit evaluation models,our HLSCE model has a higher classification accuracy,as well as a better balance and stability.