
Полная версия:
Hardware and software of the brain
After this introduction, let's try and describe human software in details. For a computer, there is a hierarchy of programming languages. The first one was The Language Of Assembler or just Assembler which consists of instructions that are immediately supported by the processor hardware of the computer in question. The next level contains languages of system programming. The most known is C that was conceived as a universal Assembler which should work on different machines. Further up will be application languages to write software for end users and particular application domains. Typically, the languages of lower levels are used to create a programming tool for the next level upwards. That is, when you use a single operator of an application language, it employs several elementary commands and instructions of lower levels. As a result, very complicated and clever programs are composed of a few very simple instructions. Hence, we need to seek for such most elementary internal actions supported by brain hardware.
The most obvious are memory read/write operations. Humans can remember separate images and later recall them. In addition, they can remember associations between 2 images. Then, when the first is presented, heteroassociative memory returns the second.
The next function is evaluation. The limbic system receives input from different sensory channels and generates a positive or negative drive depending on how the current situation compares to expectations. These drives work as start/stop instructions which are important for asynchronous architecture.
Now add various methods of data processing implemented in structured multilayered neural nets and you will get a computer with a decent set of instructions which may be used so as to develop sophisticated programs.
System software and applications
Newborn humans are virtually helpless. Can't walk, even stand. Unable for coordinated actions. Moreover, they don't know elementary actions themselves. On the other hand, it is well known that if a human hadn't learned any language at early age, he can't do it as an adult. Seemingly, human learning is arranged similarly to installation of computer software. An operating system comes at the beginning. Accordingly, you need to learn some basic skills first. Then, the rest will be added on this foundation.
Meanwhile the difference is substantial too. A computer comes with a ready instruction set and electronic circuits for algorithmic processing. Creation of human system software begins with creation of a discrete processor on the basis of the analog neurosubstrate. This firmware operates in accordance with principles outlined in Formal neurocomputer.
The rest of system software constitutes what is known as "culture". It includes such elements as habits to wash your face or clean your teeth, preferred clothes and dishes, usual time when you go to bed and wake up, etc. The most characteristic feature of a culture is its language. As in computers, system software divides into that which supports individual behavior and that for communication and cooperation (networking). Each nation has its unique traditions of interaction.
Finally, when a person has more or less remembered these components, it's time for professional education. That is, installation of application software begins. Computers have many programming languages. Each was developed for a certain class of applications. In humans, a natural language serves as a universal one. Nevertheless, each profession develops its own dialect with special terminology and even a specific style of thinking.
Complete theory of natural language
When we hear "theory" during a discussion of human language, the first idea is grammar. Syntax, punctuation, word spelling, etc. Meanwhile this is only a superficial wrapping. There is also at least semantics which is more important than purely formal grammar, but you won't find a comprehensive course on this discipline. On the other hand, when professional linguists discuss language usage, there is one problem. How to tell a correct construct from incorrect one? For a student, it is simple. Just open a textbook, but how to decide it for a professor who writes these textbooks? It turns out that they can't suggest anything better than intuitive "well-formedness".
Human language is a part of a live computational system. For a neurocomputer, the problem of "well-formedness" becomes quite practical. It may be resolved on the basis of computational efficiency, reliability, informational capacity, and so on. Let's try and formulate the structure of natural language in this aspect. We will keep in mind English because it is a lingua franca and also its fixed word order makes it easy for parsing. Nevertheless, usually speculation should be applicable to other human languages as well.
What is natural language?
Human language is a communication channel for a live computing system. It is very different from common computers. On one hand, knowledge of this system as a whole is very useful for understanding of how this language works. On the other, even if we don't know its inner details, we still can access its functions via the language interface. Existing computers are famous for being universal. On one hand, the human brain is universal too, on the other it is extremely specialized. In fact, it has just the single task – homeostasis. Existence is the ultimate goal of this automatic control system. Variability is needed to adapt to a wide range of living conditions. The whole complexity of human behavior may be logically derived from this root. The overall brain may be represented as 2 regulators – the internal and external one. The first looks inside, the other – outside, but they do the same – maintain the reality in accord with some ideal image. If your blood contains little nutrients, eat something. If your shoes became old and don't protect from water, throw them out and buy new ones. To evaluate the disbalance, we need 2 images – an ideal and real one. Our sensory organs permanently update the second. We don't know exact details of its internal representation but can communicate it via the language channel with any precision required.
Assignment
In computers, we have many different languages, each for a specific purpose. Instead, humans use one language in all of these cases. You can even tell some words which directly affect internal organs or the state of the brain itself. This is an assignment of Assembler – the language for hardware programming. You can code an algorithm like with Basic, formulate a question, that is a query to a database like with SQL, or use it for communication as HTML.
Classical hierarchy of language structure
This is widely known as
Lexicon – Syntax – Semantics – Pragmatics.
The first level contains various words. All of them are divided into several different parts of speech – nouns, adjectives, adverbs, … Syntax is a set of rules which define how these words group into noun phrases, verb phrases, and other constructs. They further group into clauses and sentences that represent elementary complete ideas. The flow of the text represents the flow of ideas in human mind. Note that syntax rules are formulated in grammatical categories. They are meaningless, used only to link words with each other. Some meaning emerges on the next, semantical level when the syntax structure is considered together with concrete words within it. Nevertheless, this meaning is still incomplete. It is called "direct" one. On the last, pragmatical level, context is added and we receive "indirect", derived meaning of the text. Context also plays the crucial role in disambiguation.
This scheme looks perfect, but is it really workable? Grammar was added later in human history. It is mainly associated with written language. Like any theory, it only partially fits the reality of live human communication. A typical cycle for learning a foreign language is studying a grammar textbook, passing the exam, then forgetting it all and taking a practical course of business English. Why such discrepancy?
The problem is that the human brain groups words not according to formal categories of the part of speech, but according to their meaning and takes context into account simultaneously. In fact, there are no sequential levels in the live neurocomputer. Everything happens in a complicated computational system using many different neural nets which operate simultaneously in the parallel mode. Then, is syntax really useful? Maybe we should reduce the previous scheme to Lexicon – Pragmatics?
In the very distant past, the language, and the life itself, was simpler. In the mind of those people word phrases were translated directly into static or dynamic images. Probably word categories were used too, but they were meaningful. Not the noun but the object. Not the verb but the action. With the development of civilization, new features were added. The gerund is a noun-type word derived from the verb. Why not? Can we take a movie, pick one frame from it, and consider this frame as a static object? Yes, of course. The next example is abstract concepts. Take justice. Is it an object or an action? Looks like parts of speech and abstract grammar are necessary indeed.
Language imperfectness
Human language is a product of evolution. Nobody intentionally developed it. Various features were added by different people in different epochs. Some elements used by mathematics may be easily found, but look into the matters and you will discover that their development was simply uncompleted. Moreover, it works on the live computational system which was created using the same principle. The main goal of this language is not precision and efficiency but workability in the quite various, often harsh environment. It is easy and convenient for simple tasks, everyday use by millions of people. If you face complications and want super reliability, it is better to use more formalized tools.
The current state of human languages is the state of overcomplication. Too many features were piled together. Let's consider an example. The normal attribute to the noun is the adjective. What if we want to use another noun for this purpose? The normal way will be 'leg of chair'. For simplicity English allows 'chair leg', but how to agree it with syntax rules? In Russian, there is a simple way to produce an adjective from a noun. Their spelling will be different. In English 2 variants are used. The syntax allows a noun as an attribute to another noun. Alternatively, we can make a double entry into the dictionary. One as a noun, another as an adjective. Both create problems for a parser in computers. Even more problems will be on the semantical level because all semantical rules should be doubled as well.
Semantics of natural language
Basic semantic categories depend both on the structure of the real world and the operation of our perception. They represent what features we extract out of nature.
Language describes different types of reality. This may be external events in the environment, own actions of the speaker, the same actions of another man.
When we learn a language, be it our native one in school or a foreign language, the focus is usually made on grammar. Accordingly, the success of education is evaluated by the number of grammatical errors. Meanwhile, this is not the main goal of communication. If you miss a comma in a sentence, but the reader understands it correctly, no matter. Much worse if the sentence is grammatically correct but meaningless. I would prefer a language where I can freely choose options to express my ideas better rather than permanently fear to make an error.
Let's look how natural language represents meaning. It generates 2D images in the neocortex. The first sentence creates an image. The next – add details. To group words inside the sentence, grammar is used. A Part Of Speech has some generalized semantical load, but is mainly needed in syntax rules.
POS is defined for each word separately by enumeration. This does not obey any rules. Instead, POS itself defines how this word will be used in syntax.
It turns out that live humans use 2 principally different systems of language processing. 1 – intuitive, which you learned as a native spoken language. 2 – grammatical, learned with writing or as a foreign language.
Formal semantics
Semantics of human language may be formalized like it was already done with lexicon and syntax. What elements of meaning is it possible to single out from a text? Words have their own meaning which directly links language to the real world. We will concentrate on the next level of semantics – the meaning of the syntax, that is the meaning which emerges when words interact with each other according to the principle of compositionality in linguistics. The sentence (or the clause of complex sentences) is the smallest complete structure of language. This is enough to represent an idea. How sentences group in the text is a separate question. Let's discuss the meaning of the single sentence now.
Actions and items
3 different types exist: affirmative, interrogative sentences, and orders. The last 2 types are variations of the first one so let's consider semantics of the affirmative sentence. When a person conceives it, this is transformed into some internal image. It may have no details like 'A large air balloon hanged in the sky.' or may be rich in various parts. In this case parts are designated by various phrases of the sentence. The structure usually forms a hierarchy where large-scale parts have further details. At the upper layer, the sentence is divided into the subject phrase and the predicate phrase. Which is the main part of the sentence? Probably, predicate is more preferable. In this case, the whole sentence denotes some action. Static sentences such as 'An apricot is a fruit.' are not an exclusion. Rather, they are a particular case of inaction when nothing changes. If the text is a list of actions, then the whole of it is the answer to the question, "What happens?" A quite reasonable approach to the world and especially to the life with its dynamism.
Other parts of the sentence play certain roles in this action. The subject is an actor, the direct object is an application of this action while the prepositional object is an instrument or any other supplementary part. The roles may vary. If there is no actor in an action, the subject may designate the focus of attention. Note that the term 'object' is used differently in linguistics and programming. The former is a purely formal element of the syntax while the latter is meaningful and may be a very complicated construct. Objects in programming may represent both actions and items.
Now, we need some lexical semantics. Of course, any word has its own meaning, but words fall into several large groups. Verbs usually designate actions, nouns – items. Other words are used so as to build complex constructs. Adjectives denote properties of items. If you add an adjective to a noun, you will create a noun phrase and can add color, dimension, smell, even a texture of the surface to some object. Similarly, adverbs modify actions. The simplest verb phrase is verb + adverb. In addition, it may include other elements. As mentioned above, the subject denotes the main participant of the action. There is also the indirect object in the form of the prepositional phrase (He came with a new book.). The indirect object without the preposition (I gave him a new article.) is ellipsis where the preposition is dropped. An equivalent prepositional variant exists (I gave a new article to him.) so such reduced constructs may be considered derivative, auxiliary. The preposition itself designates a relation. This is especially obvious for spatial prepositions. 'Upon' and 'under' designate a direction (vertical as opposed to horizontal in this case) and also determine what is on top.
Overall semantics of the sentence may be denoted as
predicate(subject_phrase, direct_object, adverb, prepositional_object)
This looks like a function of C programming language or mathematics, right? Sentences of natural language are a powerful tool to describe variability of the analog world in discrete words. They are a subset of all the structures possible in mathematics. Why? Because they represent the internal gear of human perception. If you write a program in a human-like language, the computer will think like a human being.
This is only basics. It was enough for people millennia ago, but further, language evolved and got more and more complicated. This evolution is controversial. On one hand this made it possible to describe more situations, but on the other hand more and more troubles emerged. As new elements were added or old ones found additional usage, it affected the previously perfect composition. Nobody supervised these "amendments". Those who introduced new elements even didn't think about what they do so now we have literally a pile of features which are often not coordinated with each other. If you try to implement them mechanically from the list, the program simply will not work. The main problem for those who want to work with a more close approximation of human language is not only to implement more features separately, but also ensure that they will work in various combinations. Let's try and list these features one by one.
The adverb may modify the adjective. 'very big', 'brightly green'. How is it possible if we defined the adverb as an attribute of the verb? In principle, receptors of the human eye encode both color and brightness so correct expression would be 'bright and green'. 'brightly green' may be shortening from 'a green leaf brightly shines in the sun'.
The action may turn into the item. The verb has even not one form for this. 'To define the concept is the first stage.' 'Defining the concept is the first stage.' Here the infinitive and the gerund are used. This may be easily explained. The action is represented in the brain as a dynamic image, a movie. Take a single picture from this movie and you will have the static item which may represent this action.
The action can also become a property of the noun instead of the adjective. 'Running man'. 'Running' is a participle here and we already see ambiguity with the gerund on the level of word forms which could be easily fixed. Where did creators of the language look? The explanation is in the next paragraph.
The noun may be a property of another noun. 'Animal paw'. This also has a more distinct variant – 'animal's paw' but may be expressed by the basic construct – 'paw of the animal'. There is yet another variant in English. Many words may be nouns and adjectives simultaneously so this feature may be implemented both on lexical and syntax levels.
Auxiliary verbs. 'Be' and 'have' are used in verb tenses and compound predicates, but they retain usual meaning there. In 'Maple is a tree.' 'is a tree' may be interpreted either as a purely formal construct representing a predicate as a whole or as a normal predicate phrase. In the latter case 'tree' will be a direct object. This has deeper semantical issues, but is very convenient programmatically. Nevertheless this also may have problems on the syntax level. 'The lemon is yellow.' The direct object can't consist of the attribute only. A full-scale noun phrase is required. A solution may be that this is ellipsis from 'The lemon is a yellow fruit.'
Different types of actions. In some cases, an action is directed at some object. In others, this object is absent. Accordingly, language distinguishes transitive and intransitive verbs. Reflexive verbs denote a situation when an action performed by some subject is directed at this subject itself. In English, this is encoded by means of the reflexive pronoun (he washed himself) or is not marked at all (he looks nice).
In complex sentences, the simple sentence becomes a clause. It still represents an action, but may play different roles. 'That I have a vacation is convenient now.' The subordinate clause stands in place of the subject. 'I decided that I must take a vacation.' Here, the similar clause is the direct object already. Indeed, the subordinate clause may be any part of the sentence. No problems if the action becomes the item and the item is the attribute.
Compound sentences like 'People stood on the shore, and the ship moved in front of them.' are short lists. A more extended variant is the paragraph of the text.
Affirmative sentences are basic units of a description. They represent knowledge. Questions and orders are related to using this knowledge. The former are queries for information extraction. The latter are used primarily in communication so as to prompt some action rather than transmit data.
Relations
There are 2 main types of relations: cause-consequence and general-particular (or abstract-concrete). The first type links actions. It may be expressed by complex sentences in both directions – with subordinate clauses of cause or goal. Conditional sentences are semantically close. Here the condition is not the reason completely but may be considered as a part of it. Causes and reasons may also be expressed by adverbial modifiers of the simple sentence. As they are made of noun-type words, some transformations are required. To represent the goal, the infinitive may be used. An example is the previous sentence itself. Reasons are often expressed by nouns. This may be interpreted as a reduced form. 'He turned his vehicle because of a man on the road.' 'He turned his vehicle because a man stood on the road.' The first is a simple sentence while the second – the complex sentence with the full subordinate clause of cause.
The concrete-abstract or is-a relation is used to represent conceptual hierarchy. The human neocortex has at least 3 levels of such hierarchy – the photographic image, abstraction in the limits of a given modality (vision, hearing, etc.), and complex multimodal images. Hence this type of semantics may be processed immediately on the hardware level.
Spatial and temporal relations are the next by popularity. They are encoded by various prepositions: on, upon, by, under, before, after, etc. In this case, the prepositional phrase is called the adverbial modifier.
The text
Now that we have determined semantics of the single sentence, let's try to understand how the meaning of a text is composed. In linguistics it is called coherence. This is closely related to the work of human memory and upper levels of perception. What happens when you try to understand the architecture of some building? You will walk around and look at it from different sides. Human memory is able to do interpolations. If you need a picture from some point which you didn't visit, you can easily imagine it. The result of such exploration is a set of images taken from a number of optimal points. The text reproduces this structure, only each image is replaced by a single sentence. The same principle works for processes. In this case, each reference point corresponds to a dot on the time scale so the set should be ordered. An intermediate state of the process may be easily interpolated from 2 neighboring points.
Nowadays, human texts have more complicated structure. Sentences are grouped into a hierarchy of blocks on the principle paragraph – chapter – book – library. Inside each block, elements of lower levels are listed as described above.
In addition, sentences are not just non-related elements of the list. They are interlinked. The mechanism used resembles variables of programming languages, only the implementation is very rudimental. Close analogue of the variable is the pronoun. Their implementation in human language is so poor that binding pronouns to their values became one of the major problems of natural language understanding.
Another method resembles the class-object pair. In programming languages, you should declare a variable of some class, then assign an instance of that class (object) to that variable. Human language doesn't like formalities. You can immediately use the name of a class as a variable with an already bound value. When you read 'apple', this may be either an abstract concept denoting the whole class or a concrete material fruit. Definite vs. indefinite articles may be used so as to distinguish between them. Unfortunately, the number of possible variants is so large that any programmatic implementation should list them explicitly.



