Your original question is too ambiguous to have a precise answer, so it cannot be turned into an algorithm. Your first attempt at a precise formulation turned out to provide useless answers.

I am not asking for an exact "yes" or "no." I'm looking for an approximate/probabilistic solution. So it doesn't even have to return the correct answer. But it should get closer to the correct answer if I keep providing it with additional learning samples of (coded, decoded).

If you've ever suffered through a course of theoretical computer science, then you've probably seen a lot of proofs relevant to your problem.

I have but none of them cover non-deterministic / approximate / probabilistic algorithms. That is, algorithms which sacrifice the guarantee of "yes" or "no" with probability = 1.0 in exchange for being "tractable".

As an example of what I'm saying: I would solve the halting problem (which can't be solved) using machine learning to try to guess if a program terminates or not on a given input assuming I had training pairs or some kind of success function. It'd learn something, but of course can't ever be perfect.

Best you can do is follow the advice in the thread you've been linked: forget about having an algorithm, make some educated guesses and see if any of them pan out.

I did and unfortunately the programs I tried said the "coded" wasn't in any base encoding.

In this case, coming up with a question that is both precise and useful is incredibly difficult, and even if you could find one, I would guess that your question is either undecidable, or any algorithm that could answer it would run several orders of magnitude longer than you intend to wait

Well, this kind of problem is actually solved every day by human children. They receive sound waves as inputs and eventually learn to map these to a language's phonology and grammar. I'd argue their problem is even harder because nobody tells them the (coded, decoded) pairs. So, if babies can solve this, I don't see why a computer can't in a much easier situation.

Here, I have a solution which is probabilistic and may not even work, but it only is polynomial time using a sequential state model, e.g. HMM, CRF, whatever you want.

Here's my intuition of why my solution could work:

Let's suppose we need to do a grapheme to phoneme conversion. That is, given a word we need to produce its phonetic pronunciation.

We have pairs like this:

(dog, dag)

(the, T^)

etc.

If I train a CRF/HMM on the observable states (orthographic representation) and have the hidden states being the individual phonemes, I'll probably get an accuracy of 80% at least, with doing no feature engineering.

The baseline does so well because there is some kind of mapping/function between how a word is written and how it is spelt. Granted English has very irregular spelling but the observation holds true.

On the other hand, if I asked you to tell me whether there is a mapping between the name for an animal and the name for its meat, the trained CRF/HMM will have terrible accuracy because it won't learn anything--because the null hypothesis is true. That is, based on how an animal is written in English from just its characters you cannot predict how the meat of the animal is written.

(cow, beef)

(lamb, mutton)

(pig, pork)

So, my hypothesis is that for pairs of (coded, decoded), if one trains a sequential model and then tests, if the accuracy/F1 is low enough, it confirms the null hypothesis. If it's high enough, then it could be evidence for there being some deterministic mapping.

OF course, it works for grapheme to phoneme because we know how to model it intuitively. But for arbitrary data? It may be the "features" are mathematical functions that aren't discoverable without some kind of intuition about how such things work.

So, I've given an actual idea thats actionable. It probably won't work, but you see at least it's going in the right direction.

I hope this provides some more grounds for discussion so I could hear about what kind of approaches you'd have in mind.