Crawlers in Java

Heritrix is the Internet Archive’s open-source, extensible, web-scale, archival-quality web crawler project.

WebSPHINX ( Website-Specific Processors for HTML INformation eXtraction) is a Java class library and interactive development environment for Web crawlers that browse and process Web pages automatically.

JoBo is a simple program to download complete websites to your local computer. Internally it is basically a web spider. The main advantage to other download tools is that it can automatically fill out forms (e.g. for automated login) and also use cookies for session handling.

Database Connection Pools

Jakarta DBCP
DBCP is a database connection pool that relies on code in the Jakarta commons-pool package to provide the underlying object pool mechanisms that it utilizes. Applications can use the DBCP component directly or through the existing interface of their container / supporting framework.
C3P0 is an easy-to-use library for augmenting traditional (DriverManager based) JDBC drivers with JNDI-bindable DataSources, including DataSources that implement Connection and Statement Pooling, as described by the jdbc3 spec and jdbc2 standard extensio

Proxool is a Java connection pool.It transparently adds connection pooling to your existing JDBC driver.


Command Line tools

Jakarta Commons CLI
The Apache Commons CLI library provides an API for processing command line interfaces. There are three stages to command line processing. They are the definition, parsing and interrogation stages.

ArgParser is a Java package, which can be used to specify command line options for a Java application. It has support for range checking, multiple option names (aliases), single word options, multiple values associated with an option, multiple option invocation, generating help information, custom argument parsing, and reading arguments from a file.


JArgs is a comprehensive command line option parsing suite, for use by Java programmers. Initially, parsing compatible with GNU-style ‘getopt’ is provided. JArgs is easy to use, thoroughly tested and well documented.

