Friday 13 December 2013

Natural Language Processing & Data mining Algorithms & Predictive Analytics

Hi All,

I will write article on this later. But for datawarehouse today we are using natural language processing to get data out of handwritne notes and then mapping the data appropriately.

To understand data patterns we are also using data mining algorithm such that any set of data which is blank is filled with appropriate values

Good Article on data mining

http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm

Below are few notes on Predictive analytics

http://www.ftpress.com/articles/article.aspx?p=2133374

Book by Thomas W Miller is good on this subject.

Two Predicite analytical models
1)Regression
2) Classification

Regression and classification are two common types predictive models. Regression involves predicting a response with meaningful magnitude, such as quantity sold, stock price, or return on investment. Classification involves predicting a categorical response. Which brand will be purchased? Will the consumer buy the product or not? Will the account holder pay off or default on the loan? Is this bank transaction true or fraudulent?

Consider you wanted to do predictive analyis on Excel file that you have with you

1) The file can be divided into 2 halfs
2) Define variable(Which excel) column you want to analyse as part of model
3) If you feeding this to predictive analytic tools then you can select all columns
4) Tool will analyse all columns (variable) and come up with model with variable that has highest impact like column for age, location
5) Suppose your excel has 100 columsn the challenge in manually doing is to find out which columns to take and each column can have a weak relation so computer do this well and very fast so that you do not need traditional statistical analyst who will come up with hypothesis and design a model

Three ways to come up with Model
1) Traditional -- A hypothesis and a theory . We come up with theory and try to fit it to data . Linear regression and Logistic regression

2) Data Adaptive - Also called statistical learning or Data Mining . We look at data and come up with model. The data determines the model . either we go through data manually and come up with model or we use a tool that go through data and comes up with model ( Neural n/w is one such thing)

3) Model dependent - We develop mathematical models and generate data and compare this data with real data.


It is often a combination of models and methods that works best. Consider an application from the field of financial research. The manager of a mutual fund is looking for additional stocks for a fund’s portfolio. A financial engineer employs a data-adaptive model (perhaps a neural network) to search across thousands of performance indictors and stocks, identifying a subset of stocks for further analysis. Then, working with that subset of stocks, the financial engineer employs a theory-based approach (CAPM, the capital asset pricing model) to identify a smaller set of stocks to recommend to the fund manager. As a final step, using model-dependent research (mathematical programming), the engineer identifies the minimum-risk capital investment for each of the stocks in the portfolio.

Below is article on Linear Regression widely used in predictive analytics and How it can be implemented by Neural Networks 

http://www.willamette.edu/~gorr/classes/cs449/linear1.html

Very Good Article on Statistics ( this is basic about statistics )

http://home.ubalt.edu/ntsbarsh/stat-data/Topics.htm



No comments:

Post a Comment