MetaCombine NMF Document Clustering Web Service version .10 2005-04-20 Urvashi Gadi Emory University Credits ======= Urvashi Gadi - Lead developer. Aaron Krowne - Project Manager. Overview ======== This software provides web service interface to MetaCombine NMF Clustering System Version 0.80 (a system for "document" clustering using the Non-negative Matrix Factorization (NMF) method). Clustering is a "preclassificatory" task -- it is used to discover latent associations in the data, separating the data points out into "bins" which can be interpreted as clusters or classes. For text documents, this can be thought of as discovering classes based on the topics within the corpus. This web service clustering system is part of the MetaCombine project, which seeks to more meaningfully bring together digital library resources, helping to build more coherent services on top of them. The purpose of this web service clustering system is to allow third parties to use advanced clustering techniques on their data, without needing to be familiar with the details of the clustering system, having to go through the complex installation process, or having to supply the computational hardware. See http://www.metacombine.org/ for more on MetaCombine. Also see http://www.ockham.org/ for information on the OCKHAM project, which is building a p2p network of library services (like this clustering service). Note that there is no need for clients to this service to be written in PHP as this one is; this client is merely a functional demonstration or example. It is easy to write web services clients in many other languages as well. Running ======= To access Metacluster web service you will need a client. The client can be in any language, using any component model, and running on any operating system. The server expects the input in the following sequence : BASEURL FIELDS HMODE CPARAMS All the fields are mandatory BASEURL : Data Repository URL FIELDS : Clustering fields (title,subject,desc) seperated by , no space HMODE : Hierachical/flat Clustering Mode [f/h] f : flat mode h : hierachical mode CPARAMS : Semantic Clustering parameters flat mode : [ OPTS ] < LOWER UPPER TOTAL > hierachical mode : [ OPTS ] < TOTAL | BRANCH LIMIT > OPTS are optional flags which consist of: -r to perform contraction on first-cut clusters -m [ FRAC ] to select multiclassification up to FRAC of highest score -d hierarchical clustering max-depth (0,1,2,... def. 5) -l multiclassification limit on # of clusters per record (def # clusters) -u < THRESHOLD > set the threshold for unclassification For more information about NMF Clustering web service, point any browser to http://metacombine.org/schema/web_services/cluster-server.wsdl Installation of sample Client Code ================================== The included PHP script (cluster-client.php) creates a SOAP client. PHP does not come with a bundled SOAP extension. Before you can begin using the client, you need to download and install files to let you easily integrate SOAP. There are three major SOAP implementations for PHP: PEAR::SOAP, NuSOAP, and PHP-SOAP. This client uses PEAR::SOAP. If the PEAR package manager, is installed on your machine, run the following command in your shell: % pear install SOAP This will download, unzip, and install PEAR::SOAP. The PEAR package manager has dependcies on the following packages : Mail_Mime, Net_URL, HTTP_Request, and Net_DIME. Depending on which packages already installed, install the remaining packages. Notes: Since the clustering service is a long running server process, the client might time out due to network inactivity. To avoid this, adjust $soapclient->__options['timeout'] in the client code accordingly. License ======= BSD. See included "LICENSE" file.