Searching for Malware Dataset; a Systematic Literature Review

Malware is one of the exciting topics widely discussed by both academicians and researchers, but the source list of malware rarely provided. Therefore, this paper aims to write a Systematic Literature Review (SLR) to find which datasets are commonly used by previous researchers. The three journal databases were used in this study, including IEEE, science direct, and ACM. The PRISMA statement was applied to maintain transparency during the literature review. To facilitate the search, the authors also provide limitations during the SLR process (inclusion and exclusion). The inclusion includes: (1) full article fully written in English; (2) peer-reviewed papers; (3) explicitly mentioning the name of dataset or database; and (4) explicitly mentioning the method to find malware characteristics and behavior. While the exclusion consists of: (1) articles written before 2015; (2) book and white paper; (3) article already indexed in another database journal; and (4) paper which is less than four pages. After both filter processes, there are 42 out of 245 articles eligible to answer the stated research question (RQ), which were: (1) where does the researcher usually find the malware database or dataset?; (2) what kind of methods applied by previous researchers to find the malware's characteristics or behavior?; and (3) which platforms that malware usually attacks are? Based on the three RQs, we could conclude that RQ1 recorded for 37 datasets, RQ2 recorded for 47 methods, and RQ3 recorded for six platforms.

Click here to get the full article.

Since you've made it this far, sharing this article on your favorite social media network would be highly appreciated 💖! For feedback, please ping me on Twitter.