You want your features to induce four things: information, linear independence, resilience, and speed.
Example: you want to figure out who someone is.
Obviously, your features should have enough information that the model can accurately answer a question. A good feature is a name.
You don’t want redundant features. The first character of the name is highly linearly dependent with the name. Not perfectly, as you could get noise on the line, a misclick, or something similar, but very, very dependent. The first character of the name would be less useful if the name is another feature.
You don’t want a system so light that noise in any one feature throws the model off. So you don’t only want the name as a feature because people change their names, use nicknames, and so forth. If all you’ve got is the name, and Thomas is going by Tom, you’ve got a problem. A problem that the first letter of the name as a name might solve.
Computers don’t know the connection between Thomas and Tom, much less Elizabeth and Liz. So you’ve got to make judgement calls. But a balance of the three aspects such that the programmer maximizes the norm is a good direction to start.
Of course, you also want your model to work on finite hardware, so speed maximization is a factor. Maximize norm(I, LI, R)*speed.