Introduction: Deriving meaningful information out of heap of data is the minimal requirement for any establishment today for its survival & sustenance. There are many terminologies and buzz words related to this area that blurs the meaning leaving people confused, such as Bigdata, Data Ware house (DWH), BI analytics, AI (Artificial Intelligence), Data Science, Machine Learning, Advance Analytics, Deep Learning, Cognitive, Predictive modeling etc. to name a few. I have seen institutions would have team using machine learning to build classifiers but they call the same as AI team, Business Analysts would run some diagnostic analysis using Tableau but are called Data Scientists, sometimes we write conventional code or may be use RPA tool such as blue prism in automating certain portion of business process (e.g. take data from one app, paste to a file , format the same before sending as expense report) and we might unintentionally call that as AI eco-system and so on. College graduates, job seekers (fresh or lateral), business executives and technologists must make good effort to understand the concept behind various data management subject area (AI, ML, DS, Deep Learning, Cognitive Computing, Statistics, BI, DWH etc.), associated roles (such as data engineers, business analysts, data scientists, ML engineer, Data Modelers, data administrators etc.) and subsequently plan to learn and apply else Industries might lose revenue, CoE & practices blur and frustration creep in between expectation vs reality. We, however, haven’t gone that far in data maturity area (Exploration, diagnostic, prediction & prescriptive), hence important to clarify and understand various data subject area before its late. This article is an attempt to put clarity around these data subject area involving AI, Data Science and related terms to help graduates, data practitioners, business executives and others to develop career, establish practices, community and competency in Data Science area.
AI and Machine Learning: Artificial Intelligence is the subject area in computer science engineering that deals in creating pro
ducts that can think and process like human. Researches or Industries tweak the definition suiting their area of application. However, we can google, read articles, analyze AI products to get closer to the absolute AI definition (don’t think we have absolute definition yet ?). Goal of AI has always been to create expert decision system and act accordingly. Machine Learning (ML) is the subject area that deals in training or capturing hidden structure by learning through historical data. Foundation of ML algorithms primarily lies in probability, linear algebra and bit of calculus (differentiation, double diff. etc.). ML is used to create AI products so ML is regarded as subset of AI, for ex. Auto driver vehicle (AI Product) will have a trained data model that would have been trained by tons of images (direction, road, traffic, auto etc.), audio records (honking, human voice command etc.). But calling ML model, AI is not appropriate and this is where people use these two words quite interchangeably. Its like twin turbocharged V8 engine makes McLaren the fastest sport car (please don’t go to the detailed specifics), here engine can be the analogy to ML and car to AI. Essentially ML is the research field at the intersection of artificial intelligence, computer science and statistics.
Data Science and Statistics: Data Science is the word coined by DJ Patil back in 2008, is the area in AI where various skill and steps encompassing application of business domain knowledge, data extraction, data wrangling/munging, data cleaning & formatting to the statistical modeling (deriving significance of each data attributes/variables/parameters, correlations, causation etc.), data scripting/programming, machine learning to finally developing a data product or model. This is quite a vast area and requires multi-dimensional skill to output a reasonable model such as “Prediction to identify Credit Card Defaulter”, “Classify customers with high CLV- Customer Life time Value”, “Patients with high chances of Diabetics” and many more. Statistics is the subject area by itself that deals in deriving and/or establishing significance, relations, inference and/or prediction of numbers (By sampling from the Population) with the help of probability, number theorem and proven axioms. Statistics is in the core of ML algorithm thereby to AI. Sometime people use data scientists and statisticians interchangeably. Ex. A data scientist develops a classifier to classify risks (high risk, low risk, moderate risk) while processing loan applications by using statistics and ML.
Data Science and Machine Learning: By now, I think distinction and characteristics are clear between Data Science and ML. Yes, Machine Learning is the subset of Data Science which is the subset of AI. It is very difficult to build a predictive, classifier or prescriptive system without the use of ML. Statistics helps building the model with sample data, e.g. stat helps in getting the trending, inference or direction to reasonable extent. This helps in selecting and/or building ML algorithm suiting to train the population training data with appropriate tuning parameters to build a data model that helps business.
Deep Learning & Cognitive Computing: These terminologies are often the victim of loose terms used for AI. Once we uncover the layers of ML we get bunch of algorithms those get trained with historical records and after multiple iteration between training and Test data excursions, the process yields some reasonable model. There is one such algorithm group, Neural Network (NN), which has been evolving since perceptron days (Comp. Science graduates must have studied in colleges), has been built simulating the behavior of human brain neuron structures. NN algorithms (RNN, CNN, LSTM) is based upon the concept of layers of neurons between input and output. This hidden layer is trained to learn different and various small to medium structures of the object being fed, such as hair of human in image processing, previous day of stock value while predicting todays and there are many such examples. Deep Learning manifests to the number of hidden layers, more the layers, deeper learning prospect algorithm would have, so better the prediction could be. I don’t intend to get into the detail of such algorithm constructs, and that will come in upcoming articles. The point is to outline key distinguishing elements between these AI related terms. Cognitive Computing (CC) is the platform consisting of HW, SW, ML trained models with adequate UI (user interface) to provide aid to the decision support system. Such as this system can help doctors in arriving at treatment decision for a patient (e.g. medicine with exercises can increase chances to cure by 75%+ in next 3-4 weeks, so can we defer the decision of surgery until then). CC features include adaptive (re-orient and optimize the model as input, goal changes), interactive (give ability to the user to interact with AI model with feedback and input), iterative (optimize problem statement), contextual (adds elements of semantics and context to the whole premises).
Summary:AI is the subject area that covers whole range of topics related to advance analytics, however, it may not be possible to have absolute definition to all of those topics, but it’s very important to underscore underlying constructs and thereby concepts. Related terms such as DS (Data Science), ML(Machine Learning), NN(Neural Networking), CC(Cognitive Computing), DL(Deep Learning), Statistics and several other core data fields such as Data Engineering, Data Analytics, Data Admin., DWH, BI, Big data, Data Lake and many more should first be understood as general common definition available with practitioners (academicians and Industry) followed by qualifying and optimizing these terms by practicing in real. Artificial general intelligence (AGI) or general or strong AI is the next generation of AI that concerns machine might probably be taking over entire range of general human tasks. RPA (Robotic process automation) ranges from easy (specific segment of BPM) to the entire range of process automation involving great deal of AI, please read my previous article here, RPA + AI might lead to AGI someday. Irrespective of all evolutions, research and business implementation, we must make effort to understand the concept and underlying principles to design career and programs to aid towards data maturity curve. Whenever we hear or talk various conflicting AI terms and involved roles, our effort should be to put clarity around terms, distinguish related meanings and appreciate underlying effort in that field whether it is AI, ML, Data Science, Cognitive, Prediction, causative & correlation analysis etc. Business and technology are converging day by day, community in data management area require practitioners from both domains as new bees, job seekers, fresh graduates. This article is an attempt to put clarity around AI field and related subject areas that helps practitioners (technologist or business) to learn and experience at various data maturity phases viz. Diagnostic, Explorative, Predictive & Prescriptive. Stay tuned for the next article in the same topic related to course curriculum and contents to learn & practice.