Skip to main content
Last updated June 16, 2014 07:00, by ChooJun
Feedicon  

USM Extract : A Sustainable Soft-Computing Platform

About USM Extract

Artificial Neural Networks (ANNs) are computerized models that mimic the human nervous system. From the computational viewpoint, they are algorithms consisting of weighted inter-connecting processing units (resembling the neural map of the human brain). To address a particular problem using ANNs, the inter-related connections (known as weights) between processing units are adjusted according to some learning rule. ANNs are useful for undertaking pattern recognition tasks, which offer benefits of solving practical problems in a variety of domains including business, medical, and engineering areas.

Proper data pre-processing, network structure selection, and network training algorithms are required in order to harvest the best from utilization of ANNs. USM Extract comprises a series of ANN models and algorithms in an open source application environment, whereby the ANN-based solutions can be configured easily by users to suit their applications. In this aspect, USM Extract alleviates the arduous task of establishing ANN solutions for tackling real-world problems. In USM Extract, certain ANN parameters are optimally set while others can be fine-tuned manually. The main objective is to allow users to fully utilize the advantages and benefits of ANNs without the need to acquire the knowledge of a rocket scientist in ANNs.

System Configuration

  • Fedora / Ubuntu / CentOS / Windows
  • Oracle JavaSE Runtime Environment 6
  • 100MB free hard drive space
  • 4GB RAM

Languages & Technologies

USM Extract utilizes IceFaces, Spring, and Hibernate as the development frameworks, whereby they are enriched with the benefits of ANNs. These frameworks allow easy and rapid development for the USM Extract showcases.

Evaluation

Graphic credit to http://www.usm.my, http://www.openclipart.org/image/800px/svg_to_png/user.png, http://www.openoffice.org, http://www.gimp.org

Originality and Creation

Research models and algorithms of ANNs

USM Extract comprises a series of models and algorithms of ANNs. We adopt published academic research outcomes into USM Extract as an ANN-based package for solving real-work problems.

Class Diagram

Class diagrams are implemented in generic Java inheritance concepts that allow more research results to be incorporated in future. Please see the example of class diagrams of FAM, PFAM, FMM, FCM, TFAM, TPFAM, TFMM

References of classification algorithms

  • FAM: Fuzzy Adaptive Resonance Theory
    • G. A. Carpenter, S. Grossber, and D. B. Rosen, Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system, Neural Networks, vol. 4, pp.759 -771, 1991
    • G. A. Carpenter, S. Grossberg, N. Markuzon, J. H. Reynolds, and D. B. Rosen, Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps, IEEE Trans. Neural Networks, vol. 3, pp.698 -712, 1992
  • PFAM: Probabilistic Fuzzy Adaptive Resonance Theory
    • C.P. Lim and R.F. Harrison, An incremental adaptive network for online supervised learning and probability estimation, Neural Networks, vol. 10, pp. 925-939, 1997.
    • D. F. Specht, Probabilistic neural networks, Neural Networks, vol. 3, pp.109 -118, 1990
  • FMM: Fuzzy Min-Max
    • A. Quteishat, C.P. Lim, and K. S. Tan, A modified fuzzy min-max neural network with a genetic-algorithm-based rule extractor for pattern classification, IEEE Trans. Systems, Man and Cybernetics, Part A: Systems and Humans, vol. 40, pp. 641-650, 2010.
    • P.K. Simpson, Fuzzy min-max neural networks. Part I Classification, IEEE Trans. Neural Networks, vol. 3, 776-786, 1992.
  • Classifier Agent with Bayesian Trust:
    • A. Quteishat, C.P. Lim, J. M. Saleh, J. Tweedale, and L. C. Jain, A neural network-based multi-agent classifier system with a Bayesian formalism for trust measurement, Soft Computing, vol. 15, pp. 221–231, 2011.
    • A. Quteishat, C.P. Lim, J. Tweedale, and L. C. Jain, A neural network-based multi-agent classifier system, Neurocomputing, vol. 72, pp. 1639–1647, 2009.

References of clustering algorithms

  • FCM: Fuzzy C-Means
    • C.P. Lim , and W.S. Ooi, An empirical analysis of colour image segmentation using fuzzy c-means clustering, International Journal of Knowledge Engineering and Soft Data Paradigms, vol. 2, pp. 97-106, 2010.
    • R.J. Hathaway and J.C. Bezdek, Fuzzy c-means clustering with incomplete data, IEEE Trans. Systems, Man, and Cybernetics, Part B, vol. 31, pp. 735-744, 2001.
  • FMM: Fuzzy Min-Max
    • B. Gabrys and A. Bargiela, General fuzzy min-max neural network for clustering and classification, IEEE Trans Neural Networks, vol. 11, pp. 769-783, 2000.
    • P.K. Simpson, Fuzzy min-max neural networks. Part 2- Clustering, IEEE Trans. Fuzzy Systems, vol. 1, 32-45, 1993.

Generic data storage for data sets

USM Extract is designed with a generic data storage structure in RDBMS (please see the Entity Relationship Diagram). All data sets are stored in the same format of database objects, and the data sets are retrievable by our web application showcase without the need to change the source codes.

Open source middleware

USM Extract utilizes open source middlewares; thus involving no cost of software licensing. USM Extract is a sustainable platform for solving real-world problems, and it allows users to use ANN-based solutions efficiently and effectively.

Innovative Ideas

  • The logical flow of USM Extract is as follows. There are 3 sub-flows inside the figure which uses ANN models and algorithms.
    • ANN Training in USM Extract
    • ANN Prediction in USM Extract
    • Embedding USM Extract API into external middlewares

Graphic credit to http://www.usm.my, http://www.openclipart.org/detail/1089/earth-in-gentle-hands-by-liftarn, http://www.openoffice.org, http://www.gimp.org

Learning various types of Iris plants

  • IRIS: Iris Plant dataset. This data set allows us to evaluate the pattern classification capabilities of USM Extract in learning and differentiating various types of Iris plants.

Learning various types of diabetes

  • PID: Pima Indians Diabetes Data Set. This data set demonstrates how USM Extract can be used to learn and classify patients who have signs of diabetes.

Learning various types of wine

  • WINE91: Wine Data Set. The data samples are the results of a chemical analysis of wines grown in certain region in Italy from three different cultivators. A total of 13 features are used to determine various types of wine.

Learning various types of wine quality

  • WINE: Wine Quality Data Set. The data set shows variants of the Portuguese "Vinho Verde" wine. USM Extract helps determine either excellent or poor wine quality.

Benefits

.edu perspective

  • As a development platform to facilitate researchers in exploring the usability of various ANN models and algorithms.
  • As a leaning platform for tertiary students in exploring various ANN models and algorithms and the associated benefits.

.com perspective

  • As a common platform for integrating third party or proprietary solutions.
  • As a development platform for software houses to extend their product designs and solutions.


Quick Start Guide for Web Application Showcase

To best view all video clips, use VLC Player

Data set

Scripts for new data sets from UCL

  • Download a new data file from UCI, for example the PID data set
  • Duplicate the iris() (a Java Method) inside class edu.usm.extract.cmdline.MyDataMart (one of the source code in USM Extract), and give a new name to the new Java Method "pid()" and edit according to the following instructions.
 // Part A: Provide following information for below Java Fields, say for the PID data set
 String DMID = "PID"; // an ID for your data set
 String DMShortName = "datamart13"; // a short name for your data set
 String rawTable = "PID"; // a table name in database (RDBMS) for your data set
 String rawTableID = "REC_ID"; // a table's column name in database to give the uniqueness of each record
 String classColumnName = "CLASS"; // a table table's column name in database that represent the TARGET of predicting process
 int classColumnPosition = 1; // position data in your data set's table that represent the target of predicting process
 String INPUT_DATA_FILE = localPath+"pima-indians-diabetes.data"; // the file name of your downloaded data set from UCI
 ...
 ...
 ...
 // Part B: You should have multi copy of following lines. Each set of these lines represent a FEATURE of the USM Extract training/predicting process. For example the FEATURE called "pregnant time"
 features.add("PREGNANT_TIME");
 featureDesc.add("Number of times pregnant");
 featureType.add("F");
 featureDataType.add("N");
 ...
 ...
 ...
 // Part C: The number of lines below should equal to number of FEATURE of the target data set. For example the PID data set which has 8 FEATURE in total, we should have 8 lines as below. The number is in incremental order and start from "3". The "1" and "2" represent "REC_ID" and "TARGET" of given records in your data set.
 ids.add("3");
 ids.add("4");
 ids.add("5");
 ids.add("6");
 ids.add("7");
 ids.add("8");
 ids.add("9");
 ids.add("10");
 ...
 ...
 ...
 // Part D: The number of set of lines below should equal to number of choices of TARGET in data set. For example the PID data set which has 2 choices: "Tested negative for diabetes" and "Tested positive for diabetes"
 classID.add("0");
 classDescVal.add("0");
 classDescName.add("Tested negative for diabetes");
 ...
 classID.add("1");
 classDescVal.add("1");
 classDescName.add("Tested positive for diabetes");
  • Copy the created Java Method "pid()" inside the "main()", Save and Run the MyDataMart class (right click the class name --> Run As --> Java Application)
 MyDataMart myObj = new MyDataMart();
 myObj.pid();
 myObj.wine91();
 myObj.wine();
 myObj.iris();
  • Execute the generated scripts (by data set) according to the following steps using the specified command, where "ig" is the database name of USM Extract residing in MySQL
 mysql -h <database server ip> -u root -p ig <data set file name at Part A>_table.sql
 mysql -h <database server ip> -u root -p ig <data set file name at Part A>_view.sql
 mysql -h <database server ip> -u root -p ig <data set file name at Part A>_initdata.sql

Database in MySQL

  • Create a database named "ig" in MySQL, for example use the following command in linux
 mysql -h <database server ip> -u root -p
 create database ig;
  • Load the given script called extract2_mysql.sql, say in linux, by using the following command
 mysql -h <database server ip> -u root -p ig < extract2_mysql.sql
  • Load the created scripts one by one for all data sets, for example
 mysql -h <database server ip> -u root -p ig < iris.data_tables.sql
 mysql -h <database server ip> -u root -p ig < iris.data_views.sql
 mysql -h <database server ip> -u root -p ig < iris.data_initdata.sql
 mysql -h <database server ip> -u root -p ig < pima-indians-diabetes.data_tables.sql
 mysql -h <database server ip> -u root -p ig < pima-indians-diabetes.data_views.sql
 mysql -h <database server ip> -u root -p ig < pima-indians-diabetes.data_initdata.sql
 mysql -h <database server ip> -u root -p ig < wine.data_tables.sql
 mysql -h <database server ip> -u root -p ig < wine.data_views.sql
 mysql -h <database server ip> -u root -p ig < wine.data_initdata.sql
 mysql -h <database server ip> -u root -p ig < winequality-white.csv_tables.sql
 mysql -h <database server ip> -u root -p ig < winequality-white.csv_views.sql
 mysql -h <database server ip> -u root -p ig < winequality-white.csv_initdata.sql

Database Login id and password

  • Change the following information in the file called
    • jdbc.properties
    • hibernate.cfg.xml

Creation of the USM Extract war file for deployment

  • Download and install JavaSE 6 (see how to)
  • Download and install Apache Tomcat 6 (see how to)
  • Download and install Eclipse, and attach Tomcat to Eclipse (see how to)
  • Check out the source code from our Subversion using the following command and import it (the Eclipse's project of USM Extract) to Eclipse (see sample how to)
 svn checkout https://svn.java.net/svn/extract~svn/tag/<release version>/extract
  • Alternatively, install the Subclipse (see how to) and check out the source code from our Subversion using following URL
 https://svn.java.net/svn/extract~svn/tag/<release version>/extract
  • Once all errors are resolved in the imported Eclipse's project, create the war file for USM Extract web instance using Eclipse by File --> Export --> Web --> WAR File.

Deployment of USM Extract in Apache Tomcat

  • With Tomcat installed (see how to) properly, copy the created USM Extract war file into following directory
 <tomcat installed location>/webapps/
  • Alternatively, run USM Extract inside Eclipse, by right the project name --> Run As --> Run on Server
  • Once USM Extract is deployed successfully, the following page should appear via URL http://<tomcat ip address>:8080/extract/

User Guide for Showcase

  • Instructions are given in each USM Extract web application showcase GUIs, or watch this ANN training and prediction procedure in this video clip.
  • Explore the showcase based on the following diagram
  • A comprehensive steps of using Subclipse in USM Extract deployment can be found in here


Quick Start Guide for API Integration Showcase

A showcase for Java application of middleware integration is as follows.

USM Extract jar file

  • After checking out the source codes from our Subversion, resolve all errors in the Eclipse project.
  • Create the jar file for USM Extract using Eclipse by File --> Export --> Java --> JAR File.

USM Extract configuration file

  • Configure USM Extract via
    • extract.properties
  • USM Extract is pre-configured with
    • PID: Pima Indians Diabetes Data Set
  • Change this file content according to the intended data set after reading and understanding the relevant comment for each parameter.

USM Extract API at the external middleware

  • Assume that the middleware used is able to invoke Java.
  • Include the USM Extract jar file into the intended development/deployment environment, as well as the following lines for accessing USM Extract.
    • The following lines show the loading procedure of the pre-configured job's information from extract.properties.
    • The propFile represents the file name and location of USM Extract configuration.
    • The dataMart has the same value as DMID in database script creation for a new data set.
    • The Java integer network represents algorithms that are available in USM Extract. The possible values for this Java Field are as follows:
      • for both classification network training and prediction
        • ExtractConstant.FAM
        • ExtractConstant.PFAM
        • ExtractConstant.FMM
        • ExtractConstant.TFAM (i.e. the FAM with Bayesian Trust )
        • ExtractConstant.TPFAM (i.e. the PFAM with Bayesian Trust)
        • ExtractConstant.TFMM (i.e. the FMM with Bayesian Trust)
      • for clustering network training
        • ExtractConstant.FCM
        • ExtractConstant.FMM2 (i.e. the FMM)

		String propFile = "conf/extract.properties";
		Properties props = new Properties();
		props.load(new FileInputStream(propFile));
		String dataMart = props.get("job").toString();
		dataMart = dataMart.substring(0, dataMart.indexOf(';'));
		String network = ExtractConstant.FAM;

Training of classification and clustering networks

  • Invoke the Java Method called doSingleInstance with three Java Field values (i.e. propFile, dataMart, and network). This method belongs to the Java Class called edu.usm.extract.cmdline.Training.
  • After the Training procedure is completed, collect the training results from log/extract.log

		System.out.println(printUnitsGroupData(doSingleInstance(dataMart, network, propFile)));

Prediction of classification networks

Invoke the Java Method called doBatchInstance with two Java Field values (i.e. propFile, dataMart and network) and a Java String (i.e. dataFile) and one Java integer. This method belongs to the Java Class called edu.usm.extract.cmdline.Predicting.

  • The Java String dataFile is a file name and location that consists of multiple line records for the target data set (for example file pid.data_predictdata.txt).
  • The last pass in Java integer represents the TARGET value in each line of record, value 9 shows the TARGET value position for the PID data set (using the sample attached in Subversion source code).
  • After the prediction procedure is completed, collect the predicting results from log/extract.log

		String dataFile = "./WebContent/sample/pid.data_predictdata.txt";
		doBatchInstance(network, dataMart, szPropFile, dataFile, 9);

User guide for showcase

  • Explore the showcase based on the following diagram


USM Extract Team

Source Codes

Check out a complete source code from our Subversion

 svn checkout https://svn.java.net/svn/extract~svn/tag/<release version>/extract

or using Eclipse to perform the same activity (see how to )

License

The software distribution is based on GNU General Public License, version 3 (GPL-3.0), see more details at URL http://www.opensource.org/licenses/gpl-3.0.html

Change Log

Release 2.0.0.201108

Release 1.x.x.x

USM Extract Award

Citation

If you would like to cite USM Extract, cite the following URL:

Choo Jun Tan and Chee Peng Lim, USM Extract: A Sustainable Soft-Computing Platform, 2011, software available at http://extract.java.net.

The bibtex format is as follows

@Manual{usmextract2011, author = {Choo Jun Tan and Chee Peng Lim}, title = {{USM Extract}: A Sustainable Soft-Computing Platform}, year = {2011}, note = {Software available at \url{http://extract.java.net}} }

Acknowledgements

Thanks to

Another project from University Science Malaysia, see Mobile Desktop Grid

 
 
Close
loading
Please Confirm
Close