Ideas for PowerShell Malware Detection Engine
This article is a summary of the ideas our group came up with during the hackathon at GCC 2023.
What is GCC 2023 Singapore?
Students are divided into groups from A to G. The admins tried to make sure that there are no students from the same country in the same group.
Other participation's posts (mostly in Japanese):
- watasuke.net/blog/article/gcc2023-attend/
- A blog post from Watasuke, who was our tutor for our team.
- y05h1k1ng.github.io/posts/gcc2023-singapore/
- A blog post from Y05h1k1ng, who was in the same team as me.
- www.ta-oot.page/posts/002/
- A blog post from T-oot, who ranked high in the hackathon.
- https://blog.securesky-tech.com/entry/2023/03/02
- A blog post from Secure Sky Technology, who are the Security Camp Committee’s gold members.
- blog.y2a.dev/articles/2023/02-24/journal-gcc2023insingapore/
- A blog post from yu1hpa, who was in the same threat analysis class at Security Camp 2022.
- blog.security-camp.or.jp/posts/gcc-2023-singapore-report/
- blog post from the Security Camp Council.
- watasuke.net/blog/article/gcc2023-attend/
Hackathon
Lecture: Hackathon of PowerShell Malware Detection Engine
Instructor: Sh1n0g1, developer of the PowerShell detection engine z9, which was created in lecture z9 of Security Camp 2022.
Evaluation Criteria:
- idea
- code: Actual working scripts over fancy slides
Goal:
- The development of a PowerShell detection engine that can detect malicious PowerShell scripts.
Provided Files:
- A file explaining an overview of PowerShell and typical obfuscation processing.
- Test PowerShell scripts (benign/malicious scripts)
- XML files related to Windows PowerShell, generated using
wevutil.exe
- A simple Python script to parse the XML file.
The provided XML is a Windows event log that was run in a state where script block logging, module logging, and transcription were enabled.
Information on the functionality of script block logging, module logging, and transcription can be found in a blog article by Mandiant in 2016. Greater Visibility Through PowerShell Logging | Mandiant
However, the links in the appendix are broken, so here are the links to the archived PDFs:
Article by Yamato Security: Documentation and scripts to properly enable Windows event logs..
Microsoft’s article: about Logging Windows - PowerShell
About Us
Our group had five members, three of whom worked on the PowerShell detection engine, and the remaining two handled other group work. The detection engine development members included one high school student from Singapore and two students from Japan (y05h1k1ng and myself). We ranked third! 🎉
Problem Solving
Due to the limited time, we made the following assumptions, which helped us focus on the essence of the theme.
- The malware uses PowerShell 100%
- Generic static analysis of obfuscated strings is difficult (a)
- Obfuscated strings are suspicious (b)
- The malware tries to communicates to the C2 (c)
- The malware malware attempts to persist (d)
- No sandbox evasion (e)
We set (d) and (e) outside the scope of this challenge, as they are not unique to PowerShell.
Test Cases
Let’s consider various cases.
- Code that should be detected as malicious
- Code that communicates with a malicious infrastructure
- Code that should not be detected as malicious
- Dead code
- Comments
- Code paths that are not executed (
if False: do_malicious
)
- Dead code
These are rough sample cases, but capture the essence of the problem.
Approach
The approach we adopted was to compare static and dynamic states and evaluates dynamically generated artifacts. Specifically, it compares PowerShell scripts before execution and pcap files.
1. Static extraction
The initial step involves the extraction of URLs, IP addresses, paths, cryptocurrency addresses, etc., from the PowerShell code. Note that obfuscated files does not yield any meaningful information.
2. Dynamic extraction
In the second step, we run the PowerShell code in a sandbox and extract the DNS queries from pcap data, which is obtained from the sandbox.
3. Diffing
In the third step, we compare the extracted data from the static and dynamic states and keep the data that is not in the static data as suspicious.
4. Evaluation
Lastly, we evaluate the suspicious data. For domain names obtained from diffing, we further check the ASN. If the ASN is used for background services such as Google, we remove the domain names that are already in the static data. The remaining domain names are considered to be suspicious. I noticed that this evaluation method can easily be improved, but I will leave it as an exercise for the reader😉
The following is a rough diagram of the above process.
Final Thoughts
The above is a rough overview of our idea. It looks like it took me 110 days to finally write about this. (I was just really procrastinating and was busy with other things). While there are many areas for improvement, I believe it’s an interesting idea. I hope you find it interesting as well!