Monday, August 1, 2011

Changing World of Knowledge

Knowledge and Knowledge representation are undergoing radical changes with the use of computer technologies. The context is not about MS-Office, Facebook or live data feeds. This article is about how logic of human minds is being replicated with technologies that have significant practical use in all facets of science including Oil&Gas Exploration and Prodcution (area of my business).

Chess Grandmaster

Deep Blue made history when it beat reigning world chess champion Gary Kasparov in 1997. Beating with 3 ½ : 2 ½ score, the advent of clear dominance on a facet of intelligence was opened. Subsequently,  in 1998, a standard AMD processor based REBEL played with Vishwanathan Anand who was then world no-2 and beat him. By 2009 a chess engine that runs on a mobile phone has acquired ‘Grandmaster’ norm!( http://en.wikipedia.org/wiki/Human-computer_chess_matches)

Jeopardy!

A very popular quiz program (http://en.wikipedia.org/wiki/Jeopardy! ) was played by IBM Watson computer that had convincingly beaten the best players (http://en.wikipedia.org/wiki/Watson_Jeopardy ).  Unlike Chess program, Jeopardy involves complex integration of billions of items of information into a dynamic structure appropriate to the situation at hand. The game is very demanding on search, assimilation and inference.
These along with the inventions being made using 4th Paradigm of Science in Biotechnology point to a changing role of computers.

Role of Computers over 5 decades

1.       1960-2011+         : Electronic Data Processing (Science & Commerce)
2.       1980-2011+         : Databases
3.       1985-2011+         : Simulations and Advanced Graphics
4.       2000-2011+         : Communications and social networks
5.       2005-2011+         : Knowledge and Digital Inference (Also 4th Paradigm of Science)
Databases (Master data, Transaction data or Meta data) represent a critical foundation layer recognized in 1980s. Designing, developing and operating databases continue to consume the largest resource. Correctness, Completeness and Capability of databases to meet the changing (and growing) applications has become essential. Advent of automated measurements (DCS, SCADA) are expected to radically alter the requirement of humans to ‘Cure’ data and build databases.
HPCs (High Performance Computers), 4GLs, and Functional Development Environments (e.g. R, SAS, Mathcad etc.) have grown to provide necessary algorithmic power at the hands of executors to innovate beyond vanilla software packages. Ability to use these tools has opened new avenues of interpretation in all branches of science and technology.
Numerical data is amenable to Simulation and Advanced Graphics. Yet, a large quantum of human information is embedded into unstructured data (MS-Office Files, Pictures, Audio, Videos, Location details, Contexts etc.).
This is here the new technologies of Knowledge representation,  assimilation and inference come into foray.

Knowledge in Information Systems

IBM describes Watson as “IBM describes it as "an application of advanced Natural Language Processing, Information Retrieval, Knowledge Representation and Reasoning, and Machine Learning technologies to the field of open domain question answering" which is "built on IBM's DeepQA technology for hypothesis generation, massive evidence gathering, analysis, and scoring."[2]
Designed for Complex Analytics, the Watson includes cluster computers, 16TB RAM and a complex network of switches and storage. Hardware is fairly straight forward. What makes Watson different is - a) data organization and b) Inference Logic.
Some quips from Wikipedia:
1.       Watson can process 500 gigabytes, the equivalent of a million books, per second. Processing is not reading or loading 500GB/S into memory.
2.       Watson's hardware cost at about $3 million. GDPF used for Seismic Data Processing costs more.
3.       Watson uses SUSE Linux              
4.       Solutions are developed using
a.       Java
b.      C++
c.       Apache Hadoop (for distributed computing)
d.      Apache UIMA (Unstructured Information Management Architecture)
e.      IBM’s DeepQA software
5.       More than 100 techniques are used in analysis (includes, integration, inference and assessment)
6.       Watson also used databases, taxonomies, and ontologies. Each of these databases tool over 15years to develop and are based on sophisticated semantic architecture.
a.       DBPedia,
b.      WordNet, and
c.       Yago
“We are using information as it exists and making the computer smarter in analyzing that content to compute answers.” Watson works using this principle, which is also the principle of 4th Paradigm
The new age of Knowledge is inherently more dependent on the analytical outcomes from the vast data. Microsoft Research has been working on a variety of new method of working with computers that facilitate transformation into the new world of knowledge. Visualizing, learning and adapting to the new methods of working is going to become the biggest challenge for the CIOs and CTOs of science and technology intense industries like Oil&Gas.

All models are WRONG, but some are useful. A radical outlook to the way science has been functioning and its limitations is given by Anderson (1). There are no fundamental organizing principles in Biology - this in turn facilitiated the field to become intensely data-driven (rather than theory). Earth-Sciences that drive the Oil & Gas industry also lack strong theoretical framework. Unlike Biology, there has been NO collective emphasis on data collection, organization or analysis in Earth-Sciences. Bio-Informatics has become a well funded and highly researched colloborative science since 1990s. There is significant learning from the advancements in the field of Earth Sciences. 
These two subjects Biology & Geology, point to the two differing growth pattern of respective sciences and their impact on scientific discovery.

ARE YOU READY?
1.       READY to UNLEARN the EDP perspective of Computers and Information Systems
2.       READY to organize, structure and semantically define your data
3.       READY for Analysis and Inference
4.       READY to accept the new findings
THE FUTURE IS BEYOND WHAT THE PRESENT IS POINTING TO.  

CIO is responsible to impart the understanding of the new technologies and facilitate development of skills to adapt the new representation of Knowledge. In Oil&Gas which is intensely knowledge driven, the CIO identifies and leads new studies and projects that will give the business necessary advantage and Intellectual Property.

No comments:

Post a Comment