Financial Reporting Fraud Detection: An Analysis of Data Mining Algorithms

Document Type: Original Article


1 Ph.D. Candidate in Accounting, Department of Accounting, Damavand branch, Islamic Azad University, Damavand, Iran

2 Assistant Professor of Accounting, Department of Accounting, Damavand branch, Islamic Azad University, Damavand, Iran (Corresponding Author)

3 Assistant Professor of Accounting, Department of Accounting, Damavand branch, Islamic Azad University, Damavand, Iran

4 Assistant Professor of Accounting, Department of Accounting, Qazvin branch, Islamic Azad University, Qazvin, Iran


In the last decade, high profile financial frauds committed by large companies in both developed and developing countries were discovered and reported. This study compares the performance of five popular statistical and machine learning models in detecting financial statement fraud. The research objects are companies which experienced both fraudulent and non-fraudulent financial statements between the years 2011 and 2016. The results show, that artificial neural network perform well relative to a Bayesian network, Discriminant Analysis, logistic regression and Support vector machine. The results also reveal some diversity in predictors used across the classification algorithms. Out of 19 predictors examined, only nine are consistently selected and used by different classification algorithms: Employee Productivity, Accounts Receivable to Sales, Debt-to-Equity, Inventory to Sales, Sales to Total Assets, Return On Equity, Return on Sales, Liabilities to Interest Expenses, and Assets to Liabilities. These findings extend financial statement fraud research and can be used by practitioners and regulators to improve fraud risk models.


1)     Abbasi, A., C. Albrecht, A. Vance, and J. Hansen. )2012(. MetaFraud: A meta-learning framework for detecting financial fraud. MIS Quarterly 36 (4): 1293–1327.

2)     Agarwal, R., and V. Dhar. )2014(. Editorial—Big Data, data science, and analytics: The opportunity and challenge for IS research. Information Systems Research, 25 (3): 443–448.

3)     Apostolou, B., J. Hassell, and S. Webber. )2000(. Forensic expert classification of management fraud risk factors. Journal of Forensic Accounting, 1 (2): 181–192.

4)     Armstrong, C. S., D. F. Larcker, G. Ormazabal, and D. J. Taylor. )2013(. The relation between equity incentives and misreporting: The role of risk-taking incentives, Journal of Financial Economics 109 (2): 327–350.

5)     Association of Certified Fraud Examiners (ACFE). )2014(. Report to the Nation on Occupational Fraud and Abuse. Austin, TX: ACFE.

6)     Athey, S., and G. Imbens. )2015(. A measure of robustness to misspecification. American Economic Review, 105 (5): 476–480.

7)     Bayley, L., and S. Taylor. )2007(. Identifying earnings management: A financial statement analysis (red flag) approach. In Proceedings of the American Accounting Association Annual Meeting. Sarasota, FL: AAA.

8)     Beasley, M. )1996(. An empirical analysis of the relation between the board of director composition and financial statement fraud. The Accounting Review, 71 (4): 443–465.

9)     Bell, T., and J. Carcello. )2000(. A decision aid for assessing the likelihood of fraudulent financial reporting. Auditing: A Journal of Practice & Theory 19 (1): 169–184.

10)  Bellman, R. )1961(. Adaptive Control Processes: A Guided Tour. Princeton, NJ: Princeton University Press.

11)  Beneish, M. )1997(. Detecting GAAP violation: Implications for assessing earnings management among firms with extreme financial performance, Journal of Accounting and Public Policy 16 (3): 271–309.

12)  Beneish, M. )1999(. Incentives and penalties related to earnings overstatements that violate GAAP. The Accounting Review, 74 (4): 425–457.

13)  Brazel, J. F., K. L. Jones, and M. F. Zimbelman. )2009(. Using nonfinancial measures to assess fraud risk, Journal of Accounting Research 47 (5): 1135–1166.

14)  Brown, B., M. Chui, and J. Manyika. )2011(. Are you ready for the era of ‘‘Big Data’’? McKinsey Quarterly, 4: 24–35.

15)  Caskey, J., and M. Hanlon. )2013(. Dividend policy at firms accused of accounting fraud. Contemporary Accounting Research, 30 (2): 818– 850.

16)  Cecchini, M., G. Koehler, H. Aytug, and P. Pathak. )2010(. Detecting management fraud in public companies. Management Science, 56 (7): 1146–1160.

17)  Chan, P., and S. Stolfo. )1998(. Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. New York, NY.Available at:

18)  Chawla, N. V., K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. )2002(. SMOTE: Synthetic minority oversampling technique. Journal of Artificial Intelligence Research, 16: 321–357.

19)  Chen, H., R. H. Chiang, and V. C. Storey. )2012(. Business intelligence and analytics: From Big Data to big impact. MIS Quarterly, 36 (4): 1165–1188.

20)  Dechow, P. M., R. G. Sloan, and A. P. Sweeney. )1996(. Causes and consequences of earnings manipulation: An analysis of firms subject to enforcement actions by the SEC. Contemporary Accounting Research, 13 (1): 1–36.

21)  Dechow, P. M., W. Ge, C. R. Larson, and R. G. Sloan. )2011(. Predicting material accounting misstatements. Contemporary Accounting Research, 28 (1): 17–82.

22)  Duin, P. W. R., and M. J. D. Tax. )2000(. Experiments with classifier combining rules. In Proceedings of the International Workshop on Multiple Classifier Systems. Available at:

23)  Erickson, M., M. Hanlon, and E. L. Maydew. )2006(. Is there a link between executive equity incentives and accounting fraud?, Journal of Accounting Research 44 (1): 113–143.

24)  Etemadi, H, Zolqhy, H. )2013(. Using Logistic Regression to Identify Fraudulent Financial Reporting, Journal of Audit Science, 13(51), 163-145.

25)  Ettredge, M. L., L. Sun, P. Lee, and A. A. Anandarajan. )2008(. Is earnings fraud associated with high deferred tax and/or book minus tax levels?, Auditing: A Journal of Practice & Theory 27 (1): 1–33.

26)  Fanning, K., and K. Cogger. )1998(. Neural network detection of management fraud using published financial data, International Journal of Intelligent Systems in Accounting, Finance and Management 7 (1): 21–41.

27)  Feng, M., W. Ge, S. Luo, and T. Shevlin. )2011(. Why do CFOs become involved in material accounting manipulations?, Journal of Accounting and Economics 51 (1): 21–36.

28)  Feroz, E., T. Kwon, V. Pastena, and K. Park. )2000(. The efficacy of red-flags in predicting the SEC’s targets: An artificial neural networks approach, International Journal of Intelligent Systems in Accounting, Finance and Management 9 (3): 145–157.

29)  Galar, M., A. Ferna´ndez, E. Barrenechea, H. Bustince, and F. Herrera. )2012(. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches, IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 42 (4): 463–484.

30)  Glancy, F. H., and S. B. Yadav. )2011(. A computational model for financial reporting fraud detection, Decision Support Systems 50 (3): 595–601.

31)  Goel, S., and J. Gangolly.) 2012(. Beyond the numbers: Mining the annual reports for hidden cues indicative of financial statement fraud, Intelligent Systems in Accounting, Finance and Management 19 (2): 75–89.

32)  Green, B. P., and J. H. Choi. )1997(. Assessing the risk of management fraud through neural network technology, Auditing: A Journal of Practice & Theory 16 (1): 14–28.

33)  Gupta, R., and N. S. Gill. )2012(. A solution for preventing fraudulent financial reporting using descriptive data mining techniques, International Journal of Computer Applications 58 (1): 22–28.

34)  Humpherys, S. L., K. C. Moffitt, M. B. Burns, J. K. Burgoon, and W. F. Felix. )2011(. Identification of fraudulent financial statements using linguistic credibility analysis. Decision Support Systems, 50 (3): 585–594.

35)  Jahanshad, A., and Sardarizadeh, s. )2014(. Relation between difference of financial measure (Revenue growth) and nonfinancial measure (Employee growth) with fraudulent financial reporting, Journal of Accounting Research, 4 (13): 181-198.

36)  Jones, K. L., G. V. Krishnan, and K. D. Melendrez. )2008(. Do models of discretionary accruals detect actual cases of fraudulent and restated earnings? An empirical analysis, Contemporary Accounting Research 25 (2): 499–531.

37)  Kaminski, K., S. Wetzel, and L. Guan. )2004(. Can financial ratios detect fraudulent financial reporting?, Managerial Auditing Journal 19 (1): 15–28.

38)  Kittler, J., M. Hatef, R. P. W. Duin, and J. Matas. )1998(. On combining classifiers, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (3): 226–239.

39)  Kotsiantis, S., E. Koumanakos, D. Tzelepis, and V. Tampakas. )2006(. Forecasting fraudulent financial statements using data mining, International Journal of Computational Intelligence 3 (2): 104–110.

40)  Larcker, D. F., and A. A. Zakolyukina. )2012(. Detecting deceptive discussions in conference calls, Journal of Accounting Research 50 (2): 495–540.

41)  LaValle, S., E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz. )2011(. Big Data, analytics and the path from insights to value, MIT Sloan Management Review 52 (2): 21–32.

42)  Lee, T. A., R. W. Ingram, and T. P. Howard. )1999(. The difference between earnings and operating cash flow as an indicator of financial reporting fraud, Contemporary Accounting Research 16 (4): 749–786.

43)  Lennox, C., and J. A. Pittman. )2010(. Big five audits and accounting fraud, Contemporary Accounting Research 27 (1): 209–247.

44)  Lin, J., M. Hwang, and J. Becker. )2003(. A fuzzy neural network for assessing the risk of fraudulent financial reporting, Managerial Auditing Journal 18 (8): 657–665.

45)  Loebbecke, J. K., M. M. Eining, and J. J. Willingham. )1989(. Auditors’ experience with material irregularities: Frequency, nature, and detectability, Auditing: A Journal of Practice & Theory 9 (1): 1–28.

46)  Maham, K., Torabi, A. )2012(. Providing Financial Reporting Fraud Risk Rating Model. National Conference on Economic Jihad (With Emphasis On National Production, Supporting Iranian Labor and Capital), University of Mazandaran.

47)  Maloof, M. )2003(. Learning when data sets are imbalanced and when costs are unequal and unknown. In Proceedings of the 20th International Conference on Machine Learning. Washington, DC. Available at:;nat/Workshop2003/maloof-icml03-wids.pdf

48)  Markelevich, A., and R. L. Rosner. )2013(. Auditor fees and fraud firms. Contemporary Accounting Research, 30 (4): 1590–1625.

49)  Mashayekhi, B, and Hosseinpour , A, H,. )2016(. The relationship between real earnings management and accrual earnings management in companies suspected of fraud listed in tehran stock exchange. Empirical Studies in Financial Accounting Quarterly, 12(49): 29-52.

50)  Nguyen, H. M., E. W. Cooper, and K. Kamei. )2012(. A comparative study on sampling techniques for handling class imbalance in streaming data. In Proceedings of the Soft Computing and Intelligent Systems (SCIS) and 13th International Symposium on Advanced Intelligent Systems (ISIS), 1762–1767. Available at: comparative_study_on_sampling_techniques_for_handling_class_imbalance_in_streaming_data

51)  Perols, J. L. )2011(. Financial statement fraud detection: An analysis of statistical and machine learning algorithms, Auditing: A Journal of Practice & Theory 30 (2): 19–50.

52)  Perols, J. L., and B. A. Lougee. )2011(. The relation between earnings management and financial statement fraud, Advances in Accounting, 27 (1): 39–53.

53)  Phua, C., D. Alahakoon, and V. Lee. )2004(. Minority report in fraud detection: Classification of skewed data, SIGKDD Explorations, 6 (1):50–59.

54)  Price, R. A. III, N. Y. Sharp, and D. A. Wood. )2011(. Detecting and predicting accounting irregularities: A comparison of commercial and academic risk measures. Accounting Horizons, 25 (4): 755–780.

55)  Provost, F. J., T. Fawcett, and R. Kohavi. )1998(. The case against accuracy estimation for comparing induction algorithms. In Proceedings of the Fifteenth International Conference on Machine Learning, Madison, WI, 445–453.

56)  Securities and Exchange Commission (SEC). )2015(. Examination Priorities for 2015. Available at: national-examination-program-priorities-2015.pdf

57)  Sharma, V. )2004(. Board of director characteristics, institutional ownership, and fraud: Evidence from Australia, Auditing: A Journal of Practice & Theory 23 (2): 105–117.

58)  Shin, K. S., T. Lee, and H. J. Kim. )2005(. An application of support vector machines in bankruptcy prediction models, Expert Systems with Application 28: 127–135.

59)  Summers, S. L., and J. T. Sweeney. )1998(. Fraudulently misstated financial statements and insider trading: An empirical analysis, The Accounting Review 73 (1): 131–146.

60)  Varian, H. R. )2014(. Big Data: New tricks for econometrics, Journal of Economic Perspectives, 28 (2): 3–27.

61)  Walter, E. )2013(. Harnessing tomorrow’s technology for today’s investors and markets. Speech Presented at American University School of Law, Washington, DC. Available at:

62)  Weiss, G. )2004(. Mining with rarity: A unifying framework, ACM SIGKDD Explorations Newsletter, 6 (1): 7–19.

63)  Whiting, D. G., J. V. Hansen, J. B. McDonald, C. Albrecht, and W. S. Albrecht. )2012(. Machine learning methods for detecting patterns of management fraud, Computational Intelligence, 28 (4): 505–527.

64)  Witten, I. H., and E. Frank. )2005(. Data Mining: Practical Machine Learning Tools and Techniques. Second Edition. San Francisco, CA: Morgan Kaufmann Publishers.

65)  Yang, Q., and X. Wu. )2006(. 10 challenging problems in data mining research, International Journal of Information Technology and Decision Making, 5 (4): 597–604