Tuesday, August 9, 2011

Database - for CIOs

Madnick and Donavan (Operating Systems, p.337-8. 1974) carried the following functions of Information Management. “Information Management is quite simple… yet, one of the most important …”.
1.       Keeping track of all information through various tables
2.       Deciding policies on storage and access
3.       Allocating and
4.       De-allocating information resource
The term ‘file-system’ is used where it is concerned with simple logical organization. ‘file’ or ‘data-set’ is single separate collection. ‘Data management system’ conducts some “structuring”, but NO interpretation. ‘Database system’ addresses both “structuring” and “interpretation”.
The database is now required to carry out two aspects –
1.       Structuring
2.       Interpretation
Until this is accomplished, the collection of data remains a data-set.
i.                     Classical MS-Excel data-set, if organized carefully can structure the data but cannot interpret.
ii.                   Database is not described through data quantities. Small data-set can become a database and large data-set may remain as file.
To further the understanding of database we need to deal with the “structure” and “interpretation” aspects. The term ‘semantics’ has later been introduced (Semantics is the study of meaning) to replace the interpretation (meaning is fundamental for interpretation) as pre-requisite of a database.
Database = Structure + Semantic {Data-set}

Structure

Data Structure as a distinctive part of Program was defined by Nicholas Wirth (Program = Algorithm + Data Structure). Thus, no Programs are possible without Data Structure! Over the 40 years’, data structures have developed into a complex specialized area with every computer science curriculum having courses addressing them. They are the back-bone to every Information System. Ph.D. level research on data structures is quite popular.
Data Structures emphasize on the elegance, efficiency and effectiveness of storage, operations and access from a computer implementation perspective. Any data structure can become the foundation architecture for database.
Databases emphasize also on the intrinsic context and applicability of the data for the world. This is where the Interpretation and Semantics come in.

Semantics

The semantics and associated developments in computer science are far the most complex and innovative aspects of human discovery. These are applied in many areas of pattern recognition, machine learning and knowledge representation. We will limit to the scope of discussion of semantics to common and widely used implementation of ‘database’ and how they constitute fundamental part of database.
Data Structures like trees created hierarchical databases, wherein the interpretation is captured as stages of hierarchy and resulting models became database systems. IMS (IBM) was so popular in my programming days that CICS-IMS is a sure success skill!
Network database systems approached modeling of interpretation using a network mesh of dependencies using a generalized graph structure. This allowed multiple parent-child dependencies. It acquired a standard status through CODASYL and was used to represent the Interpretation of real business interfaces through datasets.
Relational database developed using the work of Codd (1979) and represented datasets as ‘tables’ in certain unique forms called normalized tables. The business or real-world representation and use of data is modeled through techniques like Entity-Relationship models to capture the interpretation. The resulting design is accompanied with a data dictionary is provided with supporting reference tables like – master lists that provided for controlled vocabulary and values in every table. Together, considerable progress has been made to capture the semantics of data in the model.
Most of the modern information systems deploy relational database models.

Database

1.       Is a model of the real-word interfaces of the data on a computer information system – providing a structure and a semantic (interpretation)
2.       In absence of semantic support the database ceases to be one.
3.       The goodness of any database is largely dependent on the designed model which expresses the interpretation captured from the real-world
4.       Relational database is a structure of database providing semantics through techniques like E-R models and data dictionary

Modern Information Systems and their value are completely dependent on the databases. The 4th Paradigm Science is essentially built on the databases. For an organization, team or the CIO, databases are most important aspect of IS.
Contents in this post are Re-Created from the publicly available information as a CIO sees. For the spirit of this Read-Write culture check this http://blog.ted.com/2007/11/06/larry_lessig/

Monday, August 8, 2011

Oil & Gas Technical Information Systems

Petroleum Exploration & Production business deals with a variety of sciences, technologies and specializations. Along with all the information systems of standard business like ERP, EDMS, GIS etc., the Oil & Gas business handles typically 200-300 technical applications.
This note shall examine the experience with the technical applications over the last 3 decades. The SWOT of the application scenario is addressed. In the light of the current trends in application organization and delivery, the high-value opportunities are identified. CIO for Oil&Gas business has daunting task of specifying, developing and delivering these gold-mines for E&P.

Saturday, August 6, 2011

Know-All Knowledge in Computers

Thanks to the proliferation of computers into everything from the Microwave to the corporate board-room, everyone and anyone believe and claim to be computer experts! CIOs are often confronted with Know-all mindset of the populace in their scope of interaction. Dealing with the double-edged knife like situation from the versatile and proliferating knowledge and (mis) understanding about computers, databases, Information Systems etc. is a challenge to the CIO.
This curious faced young visitor at my home was in his 7th grade. His father an Computer Science Graduate engineer was deliberating with me on the know-all issues. I did a small experiment.
Q. What is a Tau Transform and what is a database?  I asked both the father and son.
Both didn’t know much of Tau Transform except the engineer recollecting that in the 6th semester signal processing course there was some mention of it.
Both had an answer for what a database is.  Both were right to some extent. There is probably no single English knowing educated person, who claims lack of knowledge or understanding of the term ‘database’.
Here is an example of excessive visibility of a term (database) giving a level of familiarity leading to sense of understanding. It is very valuable for CIO and at the same time very dangerous for his work.
There are far fewer people who developed and are working on Tau Transform. Say, a hundred or so world-wide. In contrast from the initial work of Codd (1979), who formulated the relational model of data-structures laying foundation to database architectures, which drives the vast Information System space, there are few hundred-thousand professionals working on this simple all-familiar term – ‘database’. The intricacy, detail and complexity in databases are deeper than those of Tau transform.
Yet, all of us know databases! How? In which context? To what extent?
Just anyone who can use MS-Excel knows databases and has a definition and opinion on it. This exuberance has led to complete clouding of what the database is and what are required to do it right. Consequence is the hugely wasted efforts and misnomers on the strength of organization’s database. Improper, Incomplete, Inadequate, Inefficient, and Ineffective databases are a bane and the reason why the companies (teams or groups) are unable to leverage the new computing techniques. Facebook, Twitter, You-Tube etc. are examples where the database is well designed KISS (Keeping It Simple and Serviceable) leading to the transformational collaboration facility.
Database is not a uniformly understood and addressed term. For the “Knowing”, it is an intricate Entity-Relationship model and Normalized description of the business’ data. For the “All-Knowing” (Could be even your CIO, CFO, CEO, CxO, President, VP, or just anyone else who matter) it is all pervading equivalent of a mobile-phone. Great to use, Nothing to Learn, Costing as little as $50!
CIO meets many such cases of conflict between Knowing:All-Knowing. It could be system security, RAM requirements, Backup architecture, Application needs, Standard Nomenclature, Cluster Architecture etc. for a Head IT. Aggravated by Bing, Google and Others, CIO often stands in same position as a experienced physician confronted with Googled Patient holding latest R&D details, completely interpreted out-of-context or out-of-relevance!
CIO is responsible to bring correct, complete and competent solutions with proper understanding for Information maturity.
  • Failed CIO is a failed database of the company.
  • Failed database is the failed Knowedge Mangement.
  • Failed KM is the business risk of 21st century.
What is a database? Try to answer this before reading further and grasping the CIO’s thoughts. I will take it further as an addendum to this post! It was hilarious to note reputed organization sites misrepresenting this popular term.

Thursday, August 4, 2011

Driving the Change

Experience brings a distinct advantage in the form of “real” understanding unmatched by any PPT. CIO has a responsibility to drive change in the organization. This post is looking back to what drove some significant “changes”, what have changed and where are they being resisted.
1.       1960-70              : Any requirement to telephone involved a workflow. There was always a sweet voiced Ms.Mary – the telephone operator. Request her for your connection and then you speak. The same process is followed with some unknown Mr.Operator, to make outstation calls (trunk-calls). Nothing much is needed as knowledge from the requester. No need to remember any STD code or extension number. Operators even assisted in telephone directory look-up.
THIS GOT CHANGED. There is no need to expand how calls are made – Speed dialing, voice dialing etc.!
REMOVING Ms.MARY and Mr. OPERATOR from JOB has happened. CHANGE lead to extinction of job roles.
2.       1980-90              : As a student in Bombay, planning to travel to Hyderabad on holidays required an early morning travel to Bombay-VT. Stand in queue for 4-6 hrs and get the train ticket booking. Often, to celebrate the success, it is followed by a nice lunch in Church-Gate and a movie!
THIS GOT CHANGED. Faster than completing this Blog post, a ticket can be booked from any origin to any destination in India, online!
QUANTUM CHANGE IN SOLUTION EFFECTIVENESS (1000 times) THOUGH PROPER TECHNOLOGY ADOPTION caused this.
3.       1990-95              : Trading in Bombay Stock Exchange or Madras Stock Exchange needed a broker. Paper applications, documents of shares, etc. was norm. No two share certificates carry same spelling of the name! There was no need to worry about Income-Tax. Unless disclosed, it can be never found (many shares were on some Shriikant’s name, thanks to the Gujarati influence on BSE!)
THIS GOT CHANGED. Income Tax Department issued the Unique PAN Card. REGULATIONS were strictly enforced for PAN CARD to do transactions. DMAT accounts came in and NAME of HOLDER is no more someone’s fancy! Income Tax department can track all transactions and so, it is necessary to file the income returns from market transactions.
NEW GUIDELINES (LAWS) AND GOVERNANCE has done this. TECHNOLOGY assisted the LAW and its GOVERNANCE mechanism.
Any Successful long-term Business Transformation and consequent will entail these 3 things.
i.                     Some tasks becoming redundant and roles removed <What is not needed, expunge>
ii.                   Designing and building new effective methods with QUANTUM improvements <Make all needed tasks many times efficient and easy>
a.       Ease of learning and use (adaptation)
b.      Expansion of scope (applicability)
c.       Addressing known, emerging and new aspects (assimilation - integration)
iii.                 Stipulating critical and essential regulatory controls and effective governance of them
If these aspects do not dominate any thought process of CHANGE, think what else have been driver of change in your experience and examine for them.
With every CHANGE, a work practice, job position or service method has undergone RADICAL transformation. It is imprudent to seek an operator to connect a telephone call today!
In organizational IT, many practices evolved and developed as standard way of working. CHANGES to the computer technology, software architecture, usage and cost structure has severely undermined many of these practices. I will list some for your thought and can discuss them later.
i.                     Strict review and audit of proposals for workstations, storage and network
ii.                   Keeping a team of “programmers” to help professionals to do standard improvements
iii.                 Preparing a monumental “Requirement Document” and specifications for new information systems
iv.                 Return-on-Investment (Cost-Benefit) justification for Information System proposals
v.                   Swearing in packaged vanilla applications that cost in millions (Specifically in Oil&Gas technical space)
vi.                 Creating and storing 100s of versions of the management presentation as it develops
vii.                Getting excited with every new software for an application
viii.              Trusting the mouse!

Monday, August 1, 2011

Changing World of Knowledge

Knowledge and Knowledge representation are undergoing radical changes with the use of computer technologies. The context is not about MS-Office, Facebook or live data feeds. This article is about how logic of human minds is being replicated with technologies that have significant practical use in all facets of science including Oil&Gas Exploration and Prodcution (area of my business).

Chess Grandmaster

Deep Blue made history when it beat reigning world chess champion Gary Kasparov in 1997. Beating with 3 ½ : 2 ½ score, the advent of clear dominance on a facet of intelligence was opened. Subsequently,  in 1998, a standard AMD processor based REBEL played with Vishwanathan Anand who was then world no-2 and beat him. By 2009 a chess engine that runs on a mobile phone has acquired ‘Grandmaster’ norm!( http://en.wikipedia.org/wiki/Human-computer_chess_matches)

Jeopardy!

A very popular quiz program (http://en.wikipedia.org/wiki/Jeopardy! ) was played by IBM Watson computer that had convincingly beaten the best players (http://en.wikipedia.org/wiki/Watson_Jeopardy ).  Unlike Chess program, Jeopardy involves complex integration of billions of items of information into a dynamic structure appropriate to the situation at hand. The game is very demanding on search, assimilation and inference.
These along with the inventions being made using 4th Paradigm of Science in Biotechnology point to a changing role of computers.

Role of Computers over 5 decades

1.       1960-2011+         : Electronic Data Processing (Science & Commerce)
2.       1980-2011+         : Databases
3.       1985-2011+         : Simulations and Advanced Graphics
4.       2000-2011+         : Communications and social networks
5.       2005-2011+         : Knowledge and Digital Inference (Also 4th Paradigm of Science)
Databases (Master data, Transaction data or Meta data) represent a critical foundation layer recognized in 1980s. Designing, developing and operating databases continue to consume the largest resource. Correctness, Completeness and Capability of databases to meet the changing (and growing) applications has become essential. Advent of automated measurements (DCS, SCADA) are expected to radically alter the requirement of humans to ‘Cure’ data and build databases.
HPCs (High Performance Computers), 4GLs, and Functional Development Environments (e.g. R, SAS, Mathcad etc.) have grown to provide necessary algorithmic power at the hands of executors to innovate beyond vanilla software packages. Ability to use these tools has opened new avenues of interpretation in all branches of science and technology.
Numerical data is amenable to Simulation and Advanced Graphics. Yet, a large quantum of human information is embedded into unstructured data (MS-Office Files, Pictures, Audio, Videos, Location details, Contexts etc.).
This is here the new technologies of Knowledge representation,  assimilation and inference come into foray.

Knowledge in Information Systems

IBM describes Watson as “IBM describes it as "an application of advanced Natural Language Processing, Information Retrieval, Knowledge Representation and Reasoning, and Machine Learning technologies to the field of open domain question answering" which is "built on IBM's DeepQA technology for hypothesis generation, massive evidence gathering, analysis, and scoring."[2]
Designed for Complex Analytics, the Watson includes cluster computers, 16TB RAM and a complex network of switches and storage. Hardware is fairly straight forward. What makes Watson different is - a) data organization and b) Inference Logic.
Some quips from Wikipedia:
1.       Watson can process 500 gigabytes, the equivalent of a million books, per second. Processing is not reading or loading 500GB/S into memory.
2.       Watson's hardware cost at about $3 million. GDPF used for Seismic Data Processing costs more.
3.       Watson uses SUSE Linux              
4.       Solutions are developed using
a.       Java
b.      C++
c.       Apache Hadoop (for distributed computing)
d.      Apache UIMA (Unstructured Information Management Architecture)
e.      IBM’s DeepQA software
5.       More than 100 techniques are used in analysis (includes, integration, inference and assessment)
6.       Watson also used databases, taxonomies, and ontologies. Each of these databases tool over 15years to develop and are based on sophisticated semantic architecture.
a.       DBPedia,
b.      WordNet, and
c.       Yago
“We are using information as it exists and making the computer smarter in analyzing that content to compute answers.” Watson works using this principle, which is also the principle of 4th Paradigm
The new age of Knowledge is inherently more dependent on the analytical outcomes from the vast data. Microsoft Research has been working on a variety of new method of working with computers that facilitate transformation into the new world of knowledge. Visualizing, learning and adapting to the new methods of working is going to become the biggest challenge for the CIOs and CTOs of science and technology intense industries like Oil&Gas.

All models are WRONG, but some are useful. A radical outlook to the way science has been functioning and its limitations is given by Anderson (1). There are no fundamental organizing principles in Biology - this in turn facilitiated the field to become intensely data-driven (rather than theory). Earth-Sciences that drive the Oil & Gas industry also lack strong theoretical framework. Unlike Biology, there has been NO collective emphasis on data collection, organization or analysis in Earth-Sciences. Bio-Informatics has become a well funded and highly researched colloborative science since 1990s. There is significant learning from the advancements in the field of Earth Sciences. 
These two subjects Biology & Geology, point to the two differing growth pattern of respective sciences and their impact on scientific discovery.

ARE YOU READY?
1.       READY to UNLEARN the EDP perspective of Computers and Information Systems
2.       READY to organize, structure and semantically define your data
3.       READY for Analysis and Inference
4.       READY to accept the new findings
THE FUTURE IS BEYOND WHAT THE PRESENT IS POINTING TO.  

CIO is responsible to impart the understanding of the new technologies and facilitate development of skills to adapt the new representation of Knowledge. In Oil&Gas which is intensely knowledge driven, the CIO identifies and leads new studies and projects that will give the business necessary advantage and Intellectual Property.