logo
vision scalability social networking revelation

Recognizing categories, and recognizing particulars as forming a category

This is, of course, a deep unsolved problem in philosophy.

However, it seems to be soluble as computer algorithm. Programs that do this, ought to look conscious.

There are a lot of programs solving things that I though were AI hard, for example recognizing pornography, recognizing faces in images, predicting what music, or what books, or what movies, a particular customer might like.

We have clustering algorithms that work in on points in spaces of reasonably small dimension. However, instances are sparse vectors in space of unenumerably large dimension.

Consider, for example, the problem of grouping like documents to like, for spam filtering. Suppose the properties of the document are all substrings of the document of twenty words or less and 200 characters or less. In that case, there are as many dimensions as there are two hundred character strings.

1 Dimensional reduction

The combinatorial explosion occurs because we have taken the wrong approach to reducing problems that originate in the physical world of very large dimension, large because each quality of the objects involved or potentially involved is a dimension.

The cool magic trick that makes this manageable is dimensional reduction. Johnson and Lindenstrauss discovered in the early 1980s that if one has O(2^n) points in a space of very large dimension, a random projection onto a space of dimension O(n) does not much affect distances and angles.

Achlioptas found that this is true for the not very random mapping wherein elements of the matrix mapping the large space to the smaller space have the form 1, with probability \frac{1}{6}, 0 with probability \frac{4}{6}, -1 with probability \frac{1}{6}, though a sparse matrix is apt to distort a sparse vector

There exists a set of m points that needs dimension \displaystyle{\LARGE\bigcirc\normalsize\frac{\ln(m)}{ε^2}} in order to preserve the distances between all pairs of points within a factor of 1±ε

This is apt to be a lot. We might well have ten million points, and wish to preserve distances within twenty five percent, in which case we need two hundred and fifty six dimensions. So a dimensionally reduced point is not necessarily reduced by a whole lot.

Two hundred and fifty six dimensions is still an impractically large dimensionality, albeit an improvement on what was likely a near infinite number of dimensions.

To reduce it to something actually useful, need principal component analysis. If non linear methods required, as likely they will be, neural network methods.

For spaces of dimension higher than fifteen or so, clever methods of nearest neighbour search generally fail, and people generally wind up with brute force search, comparing each point to each of the others, and then they aggregate into groups by making near neighbours into groups, and near groups into supergroups.

Wikipedia reports two open source methods, Locality Sensitive Hashing, one of them used for exactly this problem, finding groups in emails.

The problem of finding near neighbours in social space, mapping the Kademlia algorithm to social space, is similar but little different, since every vertex already is more likely to have connection to neighbours, and for an arbitrary vertex, whose connections we do not know, we want to find a vertex among those we do know that is more likely to have a connection to it, or to someone that has a connection to it, that is likely to be nearer in terms of number of vertex traversals.

In which case everyone reports a value that reflects his neighbours, and their neighbours, and their neighbours neighbours, with a neighbourhood smell that grows more similar when we find a vertex likely to be nearest neighbour of our target, and the problem is to construct this smell as a moderately sized blob of data, that can be widely shared, so that each vertex has unique smell of, say 256 or 512 bits, that reflects who it stably has the information to quickly connect with, so you look at who you have a connection with to find who is likely to have a connection to your target, and he looks up those he is connected with to find someone more likely to have a connection.

Locality sensitive hashing, LSH, including the open source email algorithm, Nilsimsa Hash, attempts to distribute all points that are near each other into the same bin.

So in a space of unenumerably large dimension, such as the set of substrings of an email, or perhaps substrings of bounded length with bounds at spaces, carriage returns, and punctuation, we deterministically hash each substring, and use the hash to deterministically assign a mapping between the vector corresponding to this substring, and a vector in the reduced space.

The optimal instance recognition algorithm, for normally distributed attributes, and for already existent, already known categories, is Mahalanobis distance

Is not the spam probability of an email just its T.(S-G), where T is the vector of the email, and S and G are the average vectors of good email and spam email?

Variance works, instead of probability – Mahalanobis distance, but this is most reasonable for things that have reasonable dimension, like attributing skulls to races, while dimensional reduction is most useful in spaces of unenumerably large dimension, where distributions are necessarily non normal.

But variance is, approximately, the log of probability, so Mahalanobis is more or less Bayes filtering, or at least one can be derived in terms of the other.

So we can reasonably reduce each email into twenty questions space, albeit in practice, a great deal more than twenty. Finding far from random dimensions that reduce it to a mere twenty or so is an artificial intelligence hard problem. If random dimensions, need \bigcirc20\log{(n)} dimensions where n is the number of things. And n is apt to be very large.

Finding interesting and relevant dimensions, and ignoring irrelevant and uninteresting dimensions, is the big problem. It is the tie between categorizing the world into natural kinds and seeing what matters in the perceptual data while ignoring what is trivial and irrelevant. This requires non trivial non local and non linear combinations of data, for example adjusting the perceived colour of the apple for shadow and light colour, so see the ample, rather than merely the light scattered by the apple into the eye.

We then, in the reduced space, find natural groupings, a natural grouping being an elliptic region in high dimensional space where the density is anomalously large, or rather a normal distribution in high dimensional space such that assigning a particular email to a particular normal distribution dramatically reduces the entropy.

We label each such natural grouping with the statistically improbable phrase that best distinguishes members of the grouping from all other such groupings.

The end user can then issue rules that mails belonging to certain groupings be given particular attention – or lack of attention, such as being sent direct to spam.

The success of face recognition, etc, suggests that this might be just a problem of clever algorithms. Pile enough successful intelligence like algorithms together, integrate them well, perhaps we will have sentience. Analogously with the autonomous cars. They had no new algorithms, they just made the old algorithms actually do something useful.

2 Robot movement

Finding movement paths is full of singularities, looks to me that we force it down to two and half dimensions, force the obstacles to stick figures, and then find a path to the destination. Hence the mental limit on complex knot problems.

Equivalently, we want to reduce the problem space to a collection of regions in which pathfinding algorithms that assume continuity work, and then construct graph of such regions where nodes correspond to such convex region within which continuity works, and edges correspond an overlap between two such convex regions. Since the space is enormous, drastic reduction is needed.

In the case of robot movement we are likely to wind up with a very large graph of such convex regions within which the assumption of singularity free movement is correct, and because the graph is apt to be very large, finding an efficient path through the graph is apt to be prohibitive, which is apt to cause robot ground vehicles to crash because they cannot quickly figure out the path to evade an unexpected object and makes it impractical for a robot to take a can of beer from the fridge.

We therefore use the sybil guard algorithm to reduce the graph by treating groups of highly connected vertices as a single vertex.

3 Artificial Intelligence

Gradient descent is not what makes Neural nets work Comment by Bruce on Echo State Networks.

Echo state Network is your random neural network, which mixes a great pile of randomness into your actual data, to expand it into a much larger pile of data that implicitly contains all the uncorrupted original information in its very great redundancy, albeit in massively mangled form. Then “You just fit the output layer using linear regression. You can fit it with something more complicated, but why bother; it doesn’t help.”

A generalization of “fitting the output layer using linear regression” is finding groupings, recognizing categories, in the space of very large dimension that consists of the output of the output layer.

Fitting by linear regression assumes we already have a list of instances that are known to be type specimens of the category, assumes that the category is already defined and we want an efficient way of recognizing new instances as members of this category. But living creatures can form new categories, without having them handed to them on a platter. We want to be able to discover that a group of instances belong together.

So we generate a random neural network, identify those outputs that provide useful information identifying categories, and prune those elements of the network that do not contribute useful information identifying useful categories.

That it does not help tells me you are doing a dimensional reduction on the outputs of an echo state network.

You are generating vectors in a space of uncountably large dimension, which vectors describe probabilities, and probabilities of probabilities (Bayesian regress, probability of a probability of a frequency, to correct priors, and priors about priors) so that if two vectors are distant in your space, one is uncorrelated with the other, and if two things are close, they are correlated.

Because the space is of uncountably large dimension, the vectors are impossible to manipulate directly, so you are going to perform a random dimensional reduction on a set of such vectors to a space of manageably large dimension.

At a higher level you eventually need to distinguish the direction of causation in order to get an action network, a network that envisages action to bring the external world through a causal path to an intended state, which state has a causal correlation to desire, a network whose output state is intent, and whose goal is desire. When the action network selects one intended state of the external world rather than another, that selection is purpose. When the action network selects one causal path rather than another, that selection is will.

The colour red is not a wavelength, a phenomenon, but is a qualia, an estimate of the likelihood that an object has a reflectance spectrum in visible light peaked in that wavelength, but which estimate of probability can then be used as if it were a phenomena in forming concepts, such as blood, which in turn can be used to form higher level concepts, as when the Old Testament says of someone who needed killing “His blood is on his own head”.

Concepts are Hegelian Neural Networks: “Neurons that fire together, wire together”

This is related to random dimensional reduction. You have a collection of vectors in space of uncountably large dimension. Documents, emails, what you see when you look in a particular direction, what you experience at a particular moment. You perform a random dimensional reduction to a space of manageable dimension, but large enough to probabilistically preserve distances and angles in the original space – typically twenty or a hundred dimensions.

By this means, you are able to calculate distances and angles in your dimensionally reduced space which approximately reflect the distances and angles in the original space, which was probably of dimension 10^{100^{100^{100}}}, the original space being phenomena that occurred together, and collections of phenomena that occurred together that you have some reason for bundling into a collection, and your randomly reduced space having dimension of order that a child can count to in a reasonably short time.

And now you have vectors such that you can calculate the inner product and cross product on them, and perform matrix operations on them. This gives you qualia. Higher level qualia are awareness

And, using this, you can restructure the original vectors, for example structuring experiences into events, structuring things in the visual field into objects, and then you can do the same process on collections of events, and collections of objects that have something common.

Building a flying machine was very hard, until the Wright brothers said “three axis control, pitch, yaw, and roll”

Now I have said the words “dimensional reduction of vectors in a space of uncountably large dimension, desire, purpose, intent, and will”

Creative Commons License reaction.la gpg key 154588427F2709CD9D7146B01C99BB982002C39F
This work is licensed under the Creative Commons Attribution 4.0 International License.