Difference between pages "Proxy server" and "Word Document (DOCX)"

From ForensicsWiki
(Difference between pages)
Jump to: navigation, search
m
 
 
Line 1: Line 1:
'''Proxy server''' is a server which services the requests of its clients by forwarding requests to other servers.
+
DOCX is the file format for Microsoft Office 2007 and later.  
  
== Overview ==
+
DOCX should not be confused with [[DOC]], the format used by earlier versions of Microsoft Office.
  
Proxy servers are widely used by organizations and individuals for different purposes:
+
= Container Format =
  
* Internet sharing (like [[NAT]]);
+
DOCX is written in an OpenXML format, which consists of a [[ZIP archive]] file containing [[XML]] and binaries. Content can be analysed without modification by unzipping the file (e.g. in WinZIP) and analysing the contents of the archive.
* Traffic compression;
+
* Accelerating service requests by retrieving content from cache;
+
* and many others.
+
  
Proxy servers are commonly used by individuals who wish to violate network policies.
+
The file _rels/.rels contains information about the structure of the document. It contains paths to the metadata information as well as the main XML document that contains the content of the document itself.
* In China, proxy servers are commonly used by individuals to get around national connectivity policies. (User A can't reach website Z, but A can reach proxy server P which can reach website Z).
+
* Criminals frequently use proxy servers to hide the origin of their connections (User A connects to website Z through proxy server P; the packets appear to come from P, and not A).  
+
  
=== HTTP proxies ===
+
Metadata information are usually stored in the folder docProps.  Two or more XML files are stored inside that folder, app.xml that stores metadata information extracted from the Word application itself and core.xml that stores metadata from the document itself, such as the author name, last time it was printed, etc.
  
''These proxy servers are using HTTP.''
+
Another folder contains the actual content of the document, in a Word document, or an .docx document the folder's name is word.  A XML file called document.xml is the main document, containing most of the content of the document itself.
  
Example request (direct; with relative URI):
+
= Relationship to OOXML =
<pre>
+
GET / HTTP/1.1
+
Host: cryptome.org
+
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en; rv:1.9.0.3) Gecko/20080528 Epiphany/2.22 Firefox/3.0
+
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
+
Accept-Encoding: gzip,deflate
+
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
+
Keep-Alive: 300
+
Connection: keep-alive
+
If-Modified-Since: Tue, 14 Oct 2008 13:59:19 GMT
+
If-None-Match: "e01922-62e9-45937059ec2de"
+
Cache-Control: max-age=0
+
</pre>
+
Example request (using proxy; with absolute URI):
+
<pre>
+
GET http://cryptome.org/ HTTP/1.1
+
Host: cryptome.org
+
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en; rv:1.9.0.3) Gecko/20080528 Epiphany/2.22 Firefox/3.0
+
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
+
Accept-Encoding: gzip,deflate
+
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
+
Keep-Alive: 300
+
Proxy-Connection: keep-alive
+
If-Modified-Since: Tue, 14 Oct 2008 13:59:19 GMT
+
If-None-Match: "e01922-62e9-45937059ec2de"
+
Cache-Control: max-age=0
+
</pre>
+
''Note:'' this HTTP request was intercepted on the way to proxy server.
+
  
According to RFC 2068 (section 5.1.2):
+
For most purposes OOXML may be considered a subset of DOCX (DOCX contains additional features, like OLE serialization).
<pre>
+
The absoluteURI form is required when the request is being made to a proxy.
+
</pre>
+
''Note:'' proxy server will convert absolute URI to relative URI.
+
  
=== HTTPS proxies ===
+
Documentation on OOXML may provide a guide to analysing a DOCX file.
  
''The same as above, but using HTTPS (HTTP over SSL/TLS).''
+
= External Links =
  
Sometimes HTTP proxies that support CONNECT method are called ''"HTTPS proxies"''. These HTTP proxies can tunnel almost every TCP-based protocol.
+
* [http://msdn.microsoft.com/en-us/library/aa338205.aspx Information from Microsoft about the structure of OpenXML documents]
  
Example request:
+
* [http://www.simson.net/clips/academic/2009.IEEE.DOCX.pdf The new XML Office Document Files: Implications For Forensics], [[Simson L. Garfinkel]] and James Migletz
<pre>
+
CONNECT home.netscape.com:443 HTTP/1.0
+
User-agent: Mozilla/1.1N
+
</pre>
+
  
=== SOCKS proxies ===
+
* [http://blog.kiddaland.net/2009/07/antiword-for-office-2007/ Perl script that displays the content of a Docx document, similar to Antiword]
  
SOCKS is an Internet protocol that allows client-server applications to transparently use the services of a network firewall.
+
* [http://blog.kiddaland.net/2009/06/office-2007-metadata/ Perl script that displays metadata information that is extracted from an OpenXML document]  
 
+
[[Category:File Formats]]
=== Web proxies (CGI proxies) ===
+
 
+
These are web sites that allow a user to access a site through them. They generally use PHP or CGI to implement the proxy functionality.
+
 
+
Example GET request from [http://anonymouse.ws/ Anonymouse] (to a web server):
+
<pre>
+
GET / HTTP/1.0
+
Host: [scrubbed server host]:8080
+
User-Agent: http://Anonymouse.org/ (Unix)
+
Connection: keep-alive
+
</pre>
+
 
+
Example GET request from [http://www.hidemyass.com/ HideMyAss.com]:
+
<pre>
+
GET / HTTP/1.0
+
Host: [scrubbed server host]:8080
+
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en; rv:1.9.0.3) Gecko/20080528 Epiphany/2.22 Firefox/3.0
+
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
+
</pre>
+
 
+
== Proxy detection ==
+
 
+
=== Server-side ===
+
 
+
==== New HTTP headers ====
+
 
+
Some proxy servers add new HTTP headers to request, for example:
+
<pre>
+
GET / HTTP/1.1
+
Host: [scrubbed server host]:8080
+
Connection: keep-alive
+
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-ms-application, application/vnd.ms-xpsdocument, application/xaml+xml, application/x-ms-xbap, */*
+
Accept-Language: ru
+
UA-CPU: x86
+
Accept-Encoding: gzip, deflate
+
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04506)
+
X-Forwarded-For: [scrubbed client real IP address]
+
Via: 1.1 proxy11 (NetCache NetApp/5.6.1D24)
+
</pre>
+
 
+
''Note:'' this HTTP request was received from a proxy server using [[netcat]].
+
 
+
New HTTP headers are ''X-Forwarded-For'' and ''Via''.
+
 
+
==== Mixed HTTP headers ====
+
 
+
Some proxy servers mix HTTP headers in the original request (see example above). [[Internet Explorer]] 7 puts ''Host'' and ''Connection'' headers at the end of request, not at the beginning.
+
 
+
==== Modified HTTP header values ====
+
 
+
Some proxy servers modify HTTP headers replacing the original values (see example above). [[Internet Explorer]] 7 sends header ''Connection: Keep-Alive'', not ''Connection: keep-alive''.
+
 
+
==== [[OS fingerprinting]] and User-Agent ====
+
 
+
The following ''User-Agent'' header was received by a web server (see example above):
+
<pre>
+
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04506)
+
</pre>
+
 
+
The request was generated by using [[Internet Explorer]] 7 (''MSIE 7.0'') on [[Windows]] Vista or [[Windows]] Server 2008 (''Windows NT 6.0'').
+
However, this connection was initiated with TCP SYN packet with following options:
+
<pre>
+
MSS
+
NOP
+
NOP
+
SACK permitted
+
NOP
+
Window scale
+
NOP
+
NOP
+
Timestamps
+
</pre>
+
 
+
While [[Windows]] Vista commonly uses these options:
+
<pre>
+
MSS
+
NOP
+
Window scale
+
NOP
+
NOP
+
SACK permitted
+
</pre>
+
 
+
This means that:
+
 
+
* User-Agent header was forged;
+
* The request was sent using a proxy server with different [[OS]].
+
 
+
==== Other methods ====
+
 
+
* Active detection: see [http://metasploit.com/research/projects/decloak/ Metasploit Decloaking Engine];
+
* Comparing source IP address with a list of known proxy servers.
+
 
+
=== On the way to proxy server ===
+
 
+
==== Absolute URI ====
+
 
+
HTTP clients (such as web browsers) will only generate them in requests to proxies.
+
 
+
==== Other methods ====
+
 
+
* Comparing destination IP address with a list of known proxy servers.
+
 
+
[[Category:Anti-Forensics]]
+
[[Category:Network Forensics]]
+

Revision as of 16:57, 27 August 2009

DOCX is the file format for Microsoft Office 2007 and later.

DOCX should not be confused with DOC, the format used by earlier versions of Microsoft Office.

Container Format

DOCX is written in an OpenXML format, which consists of a ZIP archive file containing XML and binaries. Content can be analysed without modification by unzipping the file (e.g. in WinZIP) and analysing the contents of the archive.

The file _rels/.rels contains information about the structure of the document. It contains paths to the metadata information as well as the main XML document that contains the content of the document itself.

Metadata information are usually stored in the folder docProps. Two or more XML files are stored inside that folder, app.xml that stores metadata information extracted from the Word application itself and core.xml that stores metadata from the document itself, such as the author name, last time it was printed, etc.

Another folder contains the actual content of the document, in a Word document, or an .docx document the folder's name is word. A XML file called document.xml is the main document, containing most of the content of the document itself.

Relationship to OOXML

For most purposes OOXML may be considered a subset of DOCX (DOCX contains additional features, like OLE serialization).

Documentation on OOXML may provide a guide to analysing a DOCX file.

External Links