I have written a proxy server in Java. It's main purpose is to enable you to observe all the headers and data that are passing between your browser and the Internet. It can also be useful for getting round IP routing problems by routing web requests via an intermediate node. And it can be useful for connecting to a host by name where the DNS entry for that host is missing, or not pointing to where you want to connect, and you are behind a proxy, eg on the Logica network (see the translateHosts option below). It also has a rudimentary caching facility, designed only for debugging.
The program accepts HTTP requests on a port (port 2929 in the example configuration file below, but you can use any number you want), reads the details of the request made by the browser, and makes a fresh HTTP request to the actual web server. When the web server sends the page back, the proxy writes the bytes back to the browser.
It has no GUI, but you can telnet (eg using PuTTY) to the debug port (eg 2828) -- especially useful when the proxy is running on one PC and you are surfing and debugging on the other PC!
The proxy uses multiple threads to handle multiple simultaneous connections (which is why it has to be in Java -- Perl on Win32 doesn't do "fork").
Note that there are now a number of add-ons available for Firefox which enable you to view the HTTP headers, which may be easier than using this proxy. The one I use is Firebug.
The latest version is version 2.6.3. Save it as Proxy.java.
There is one source file Proxy.java. Compile it with "javac Proxy.java". It creates several class files. When compiling under newer JDKs, ignore the complaints about deprecated interfaces (eg "Note: Proxy.java uses or overrides a deprecated API.") or unchecked or unsafe operations -- it was written for JDK 1.0.2 and it should work on all subsequent JDKs. Let me know if not. There is more discussion of Java versions below in the Version History (search for "JDK").
Run it with "java Proxy properties", where "properties" is the name of the properties file.
For use at home, you run it on the PC with the modem. You need to have TCP/IP enabled on your network. Set up the browser on the second PC to use the other PC as its proxy for HTTP requests. It only handles GET, POST and CONNECT requests, ie simple web browsing. Don't expect "chat" applets, SecuRemote, and things like that to work, or FTP!
The proxy is configured using a properties file, whose name is specified on the command line. All but two of the properties are optional.
The properties file is read again each time the "h" debug command is issued, so many of the properties can be changed on the fly. The two ports cannot be changed like that, because the proxy only starts listening once, at startup.
The properties file is a standard java.util.Properties file, ie name=value lines. Comments start with "#". Don't leave trailing spaces. No case-folding is used in matching property names.
Properties can be strings, numbers or Boolean.
A Boolean property should be set as follows:
As a result of the above rules, the default value for a Boolean variable is always "false".
A minimal properties file looks like this:
inPort=2929
debugPort=2828
inPort is the TCP port on which the proxy listens to requests from the browser. The browser should be set up to use this port as its proxy server.
debugPort is the TCP port on which the proxy listens for an optional telnet connection (eg using PuTTY) for debugging and tracing. See later for the debug commands. To use it using PuTTY, set the Host Name to localhost, the Port to 2828, and the Connection type to Raw.
The name of the host to be used as a chained proxy, ie to which this proxy should send all its requests (except those that match the nonProxyHosts parameter).
To find out the current name of the correct Logica proxy for your office, download the Proxy Auto Configuration file from http://pac.groupinfra.com/pac and have a look at the last few lines. If nothing happens when you try to open it in your browser it is because the browser is interpreting it instead of showing you the source code. In that case right click and save it to disc then open it in an editor.
The port to use on the proxyHost. Default is 80.
A list of hosts to which requests should be sent direct, ie not via proxyHost. Use vertical bars "|" to separate host names. Host names can contain "*" as a wildcard, which matches a sequence of zero or more of any character. Example: *.logica.co.uk|*.*.logica.com|127.0.0.1|localhost
Boolean. Set this to make requests to plain hosts (ie with no domain name) go direct and not via the chained proxy. This saves having to add them to the nonPoxyHosts list manually. I can't think of circumstances where you would not want this behaviour, but the default is off for backward compatibility.
Setting this string makes the proxy lie to the server about which browser we are. The string is put in the User-Agent header field.
String. I originally implemented this to speed up browsing on a slow line at home, especially using Netscape, which revalidates the page every time you change the window size. I changed it in version 2.5.8 to be a string, so that you could skip the revalidation of images etc but still get fresh HTML.
When set, the proxy watches for GET If-Modified-Since requests, checks to see if the URL matches the saynotmodified string (eg "*.gif|*.js|*.css|*.jpg") and, if it matches, immediately sends back Not Modified to the browser. Don't use it if you want your pages and images up to date.
Boolean. Tells the proxy to normalise line endings to CRLF in the debug output.
Boolean. Tells the proxy to change control characters to eg ^A for a Control-A character in the debug output to avoid confusing your terminal emulator. Tabs are allowed through as is, for the sizeslog output.
Boolean. Tells the proxy to remove "Pragma: no-cache" request headers. This is for testing caching proxies.
Boolean. Normally the debug output only contains the content of items with content type text/* or application/x-javascript, so that it skips images etc. This option is useful if the server is sending something with an unusual content type which you want to see the content of.
Boolean. At debug level 7 and above, only the POST data that came in the same packet as the end of the headers is printed. Setting this option makes it print all the POST data, at debug level 7 or above.
For blocking advertisements, unwanted downloads etc set this to the name of a file containing snippets of URLs that you want to block.
A request is blocked if the URL matches any of the lines in the file. The URL is deemed to match a line if all the space-separated strings on the line appear in the URL. This gives a little flexibility, eg to block all Flash from hosts connected with www.bright.com, add a line "bright.com swf".
Comments in the blockfile start with "#". The blockfile is read again each time the properties file is read, which can be forced by typing the "h" debug command.
Boolean. Prints the size of each item returned by the server. The sizes lines have tabs in for easy import into a spreadsheet, though some telnet clients unhelpfully log these as the equivalent spaces...
Boolean. Print a timestamp on every debug message.
A list of client machines which are allowed to use the proxy. This is to stop other people using the proxy -- unlikely to be a problem, but I got worried once. Separate the strings with vertical bars "|" as usual. If not specified, allows everyone in. To see what string to put in for a particular host, look at the debug output when you try to connect. The Java library call Socket.getInetAddress().toString() returns the name and the address, if available.
This tells the proxy to use your hosts file to translate the host names for requests which are being sent via a chained proxy. This arose from the need to be able to connect to Government Gateway and Alerts Online test systems which use the same host names as the real systems. Set the parameter to the name of your hosts file, eg
translateHosts=C:/WINDOWS/system32/drivers/etc/hosts
The directory to use for the cache files, for the experimental caching facility. This is not recommended for production use, but could be useful for debugging. Use forward slashes between the directory names, even on Windows, because backslashes need escaping in a Properties file.
Optional. Set it to one of these values. If neither is set, then pages are served from the cache if present, and written to the cache as they arrive from the server. This is fine for static sites, but not for debugging web applications.
Debug commands can be typed in the console window or in the debug telnet session. All debug output is sent to the console and also to the debug telnet session if there is one.
The debug listener acts on single input characters, except for the "quit" command. If typing in the console, you have to hit Return for the command to take effect. If talking via telnet, the command may be acted on as soon as you press the key, depending on your telnet program.
The commands are:
Instructions for capturing an HTTP session.
Set your browser to use localhost:2929 as a proxy as follows.
In IE5 and IE6 go to <Tools | Internet Options... | Connections | LAN Settings... | Automatic configuration> and uncheck both boxes under "Automatic configuration". Under "Proxy server", tick "Use a proxy server", and fill in Address localhost and Port 2929.
In IE4, go to <View | Internet Options... | Connection | Automatic configuration | Configure...> and remove the PAC setting string http://pac.logica.com/. Go to <View | Internet Options... | Connection | Proxy server> and set "Access the Internet using a proxy server" with Address http://localhost and Port 2929.
Run the local proxy with "java Proxy properties". I usually put this in a batch file so I can run it with a double-click from Windows Explorer.
Start PuTTY, set the Host Name to localhost, the Port to 2828, and the Connection type to Raw.
In PuTTY, start logging (click Logging on the left). Connect.
Set the logging level to 8 (display all content except images) by typing the digit "8" in the telnet window.
In the browser, do the process that you want logged.
Stop the proxy by typing "quit" in the telnet window It doesn't echo what you type -- just keep going. In fact it just looks for the letters q-u-i-t in that order and ignores other characters between them, so if you hit the wrong key just press the right one, without bothering to backspace!
PuTTY says the connection has been lost. Close PuTTY.
Set your browser back to automatic proxy configuration or to get its settings from http://pac.logica.com/
Go and look at your log file on the disc.
The instructions here assume that you are using the proxy from a web browser. That is, you have configured your web browser to use the Java proxy, then you navigate to web pages as usual using the mouse and keyboard to control the web browser.
But a standard web browser is not the only program which can make HTTP requests to web servers. Web service clients also make HTTP requests to servers and you may want to use the Java proxy to monitor the requests and responses. To do this, you need to configure the web service client to use a proxy.
If you want to use the proxy with a Java program, acting as an HTTP client, eg as a web service client, you have to configure the program to use the proxy. The way this is done depends on how you are running the client program:
Basically this is done by setting the http.proxyHost and http.proxyPort properties.
I gave a lunchtime seminar on the proxy on Friday 27 April 2001. The slides are obtainable on request to me.
If you make any improvements to the code or documentation which you want to share, please publish your own version of the program. Please retain the copyright notices. If you let me know, I can put a link to your version here.
If you have any problems making it work, get in touch.
On the FCO Kiosks project, Paul Burgin used this proxy as a starting point and developed a caching proxy for use on a kiosk, with all sorts of extra functionality, eg for scheduled updates of pages, and interfacing to NetShift. Ask me if you would like the code.
The changes in 2.6.1 were:
The changes in 2.6 were:
The changes in 2.5.14 were:
The change in 2.5.12 is:
The change in 2.5.11 is:
The changes in 2.5.10 are:
The changes in 2.5.9 are as follows. Many of these changes were made in trying to get the proxy working with the Lenya Content Management System, which uses the PUT method, and with SharePoint. I gave up making it work with the Explorer view in SharePoint: perhaps the Explorer redirector is programmed not to send credentials via any proxy. But the proxy is much improved as a result. Apologies for not implementing the entire standard immediately, but it is just in my spare time!
The changes in 2.5.8 are:
The changes in 2.5.7 are:
Version 2.5.6 was not released -- it was my first attempt at supporting NTLM, before I knew about "Proxy-Support: Session-Based-Authentication". It converted all the NTLM authrisation headers into proxy-authorisation headers.
The changes in 2.5.5 are:
The only change in 2.5.4 is to add the ability to tell the proxy not to let any traffic through by setting a "blockall" variable. This is controlled from the debug interface by typing "s" for stop or "g" for go. This is for use when I'm opening up junk mail, to stop it going and fetching things from the Internet, and perhaps advertising the fact that I have opened the mail. I would have used S and Q which would be familiar to those who used terminals with XON/XOFF flow control in the old days, but Q is used for the quit command.
The only change in 2.5.3 is to add the configuration parameter plainHostDirect, to tell it to treat any plain host name (ie with no domain name) as if it was in the nonProxyHosts list. There are more plain host names on the intranet now, as a result of merging with CMG.
The only change in 2.5.2 is to fix some SSL logging which was annoying me (it logged the reply to the client, and it's much better for it just to log the word "Connected", because it's shorter, and doesn't give the impression it received something it didn't).
Sorry I introduced a bug in 2.5 which made it always use the chained proxy if configured and ignore nonProxyHosts. I have fixed it in 2.5.1 (fingers crossed).
Changes made in version 2.5:
It still uses the JDK 1.0.2 API, and compiles and runs OK under JDK 1.0.2. It also compiles and runs under later JVMs -- I have tried up to JDK 1.4. When compiling under newer JVMs, ignore the complaints about deprecated interfaces -- I am not going to change it until I have evidence that the deprecated interfaces don't work. On one of my home PCs (64 Mbyte memory), it takes 6 seconds to compile with JDK 1.0.2, 16 seconds to compile with JDK 1.3, and 25 seconds to compile with JDK 1.4. That's progress.
There are problems running it under JDK 1.0.2 on Windows XP. As far as I can tell, there is a problem with the socket libraries and the connection to the browser gets reset instead of being closed cleanly. IE 6 then puts up a "Server cannot be found" message. The workround is to use a newer JDK -- I had to install one specially.
If you want it to recognise numeric IP addresses under JDK 1.0.2, you will need a copy of an extra little class I wrote java/net/InetAddressFix.java, and to uncomment one line in the source of Proxy.java (search for "InetAddressFix").
The previous version was version 2.4. In that version I added support for "100 Continue" headers, which I discovered being returned by MS IIS. There is a slight defect in the support for "100 Continue": strictly it should have another "while" loop in case two headers arrive in a single chunk of data, but this doesn't happen in practice.
I also included optional printing of the size of each item returned: this is useful for measuring the total size, including images, of a web page. See "sizesFlag" below.
The version before was version 2.3. In that version I added HTTP 1.1 support, including persistent connections and support for chunked transfer encoding.
I am keeping the support for HTTP 1.1 as simple as possible, bearing in mind the purpose: if you use an HTTP 1.0 browser, the proxy uses HTTP 1.0 to the server. However it has still almost doubled in size (the source is now 53K).
As it was such a major change from version 2.2 to 2.3, I have also not held back in refactoring and generally fixing up any parts which needed cleaning up -- if you "diff" it with the previous version it may not make much sense. I have changed the name of one of the (less used) options to "saynotmodified", and the logging format has changed a bit, but otherwise it should be a drop-in replacement for the previous version. Please let me know of any problems you have -- there may well be things that don't work with different JVMs, different browsers or different web servers, and I would be happy to make any fixes required to make it work in as wide a range of circumstances as possible.
Version 2.2 is the previous version, without HTTP 1.1 support, which is much simpler. If your browser sort of works with version 2.2 of the proxy, but then tends to hang, you can work round it by turning off the use of HTTP 1.1 through proxy connections. In IE 5 it's under <Tools | Internet Options... | Advanced>: uncheck "Use HTTP 1.1 through proxy connections".