Ideas for PowerShell Malware Detection Engine

This article is a summary of the ideas our group came up with during the hackathon at GCC 2023.

What is GCC 2023 Singapore?

gcc.ac/gcc_2023
Students are divided into groups from A to G. The admins tried to make sure that there are no students from the same country in the same group.
Other participation's posts (mostly in Japanese):
- watasuke.net/blog/article/gcc2023-attend/
  - A blog post from Watasuke, who was our tutor for our team.
- y05h1k1ng.github.io/posts/gcc2023-singapore/
  - A blog post from Y05h1k1ng, who was in the same team as me.
- www.ta-oot.page/posts/002/
  - A blog post from T-oot, who ranked high in the hackathon.
- https://blog.securesky-tech.com/entry/2023/03/02
  - A blog post from Secure Sky Technology, who are the Security Camp Committee’s gold members.
- blog.y2a.dev/articles/2023/02-24/journal-gcc2023insingapore/
  - A blog post from yu1hpa, who was in the same threat analysis class at Security Camp 2022.
- blog.security-camp.or.jp/posts/gcc-2023-singapore-report/
  - blog post from the Security Camp Council.

Hackathon

Lecture: Hackathon of PowerShell Malware Detection Engine

Instructor: Sh1n0g1, developer of the PowerShell detection engine z9, which was created in lecture z9 of Security Camp 2022.

Evaluation Criteria:

idea
code: Actual working scripts over fancy slides

Goal:

The development of a PowerShell detection engine that can detect malicious PowerShell scripts.

Provided Files:

A file explaining an overview of PowerShell and typical obfuscation processing.
Test PowerShell scripts (benign/malicious scripts)
XML files related to Windows PowerShell, generated using wevutil.exe
A simple Python script to parse the XML file.

The provided XML is a Windows event log that was run in a state where script block logging, module logging, and transcription were enabled.

Information on the functionality of script block logging, module logging, and transcription can be found in a blog article by Mandiant in 2016. Greater Visibility Through PowerShell Logging | Mandiant

However, the links in the appendix are broken, so here are the links to the archived PDFs:

Article by Yamato Security: Documentation and scripts to properly enable Windows event logs..

Microsoft’s article: about Logging Windows - PowerShell

About Us

Our group had five members, three of whom worked on the PowerShell detection engine, and the remaining two handled other group work. The detection engine development members included one high school student from Singapore and two students from Japan (y05h1k1ng and myself). We ranked third! 🎉

Problem Solving

Due to the limited time, we made the following assumptions, which helped us focus on the essence of the theme.

The malware uses PowerShell 100%
Generic static analysis of obfuscated strings is difficult (a)
Obfuscated strings are suspicious (b)
The malware tries to communicates to the C2 (c)
The malware malware attempts to persist (d)
No sandbox evasion (e)

We set (d) and (e) outside the scope of this challenge, as they are not unique to PowerShell.

Test Cases

Let’s consider various cases.

Code that should be detected as malicious
- Code that communicates with a malicious infrastructure
Code that should not be detected as malicious
- Dead code
  - Comments
  - Code paths that are not executed (if False: do_malicious)

These are rough sample cases, but capture the essence of the problem.

Approach

The approach we adopted was to compare static and dynamic states and evaluates dynamically generated artifacts. Specifically, it compares PowerShell scripts before execution and pcap files.

1. Static extraction

The initial step involves the extraction of URLs, IP addresses, paths, cryptocurrency addresses, etc., from the PowerShell code. Note that obfuscated files does not yield any meaningful information.

Non-obfuscated PowerShell code sample. We could extract the URL and some interesting strings. — Non-obfuscated PowerShell code sample

Obfuscated PowerShell code sample. Nothing makes real sense. — Obfuscated PowerShell code sample

2. Dynamic extraction

In the second step, we run the PowerShell code in a sandbox and extract the DNS queries from pcap data, which is obtained from the sandbox.

3. Diffing

In the third step, we compare the extracted data from the static and dynamic states and keep the data that is not in the static data as suspicious.

4. Evaluation

Lastly, we evaluate the suspicious data. For domain names obtained from diffing, we further check the ASN. If the ASN is used for background services such as Google, we remove the domain names that are already in the static data. The remaining domain names are considered to be suspicious. I noticed that this evaluation method can easily be improved, but I will leave it as an exercise for the reader😉

The following is a rough diagram of the above process.

Ingest flow graph of a benign PowerShell script

Ingest flow graph of a malicious PowerShell script

Final Thoughts

The above is a rough overview of our idea. It looks like it took me 110 days to finally write about this. (I was just really procrastinating and was busy with other things). While there are many areas for improvement, I believe it’s an interesting idea. I hope you find it interesting as well!

Ideas for PowerShell Malware Detection Engine#

What is GCC 2023 Singapore?#

Hackathon#

About Us#

Problem Solving#

Test Cases#

Approach#

1. Static extraction#

2. Dynamic extraction#

3. Diffing#

4. Evaluation#

Final Thoughts#