Key points are not available for this paper at this time.
Commercial anti-virus software are unable to provide pro-tection against newly launched (a.k.a “zero-day”) malware. In this paper, we propose a novel malware detection tech-nique which is based on the analysis of byte-level file con-tent. The novelty of our approach, compared with existing content based mining schemes, is that it does not memo-rize specific byte-sequences or strings appearing in the ac-tual file content. Our technique is non-signature based and therefore has the potential to detect previously unknown and zero-day malware. We compute a wide range of statistical and information-theoretic features in a block-wise manner to quantify the byte-level file content. We leverage standard data mining algorithms to classify the file content of every block as normal or potentially malicious. Finally, we corre-late the block-wise classification results of a given file to cat-egorize it as benign or malware. Since the proposed scheme operates at the byte-level file content; therefore, it does not require any a priori information about the filetype. We have tested our proposed technique using a benign dataset com-prising of six different filetypes — DOC, EXE, JPG, MP3, PDF and ZIP and a malware dataset comprising of six different malware types — backdoor, trojan, virus, worm, construc-tor and miscellaneous. We also perform a comparison with existing data mining based malware detection techniques. The results of our experiments show that the proposed non-signature based technique surpasses the existing techniques and achieves more than 90 % detection accuracy.
Tabish et al. (Sun,) studied this question.