[Recommended tagging for machine learning # 4] Machine learning script …?

4 minute read

<ENGLISH>

Hi, I hope you’re doing well.
I’m so sleepy… because gym activities in morning time. But I’d like to resume my process … with drink :stuck_out_tongue_closed_eyes: yahoo!

So today’s topic is finally … machine learning! we already got necessary elements for learning ant test, so only what I have to do, train my machine!
Start … but I have to say one thing before starting.

I can’t do coding of Machine Learning…!

Really sorry, oh, stop!! don’t through a stone in you hand … yep, light. I don’t make it actually I can’t.
Instead, I’d like to use script from another site. And I think you know it. Here.
Let’s get started with machine learning Part 3 Let’s implement Bayesian filter –gihyo.jp
This is very good site for learning Machine Learning as entrance. I really recommend it.

So today, that’s call it for today… ? Humm. Actually I have to change some points to apply to my purpose. I’d like to show some change how I can change it. Nothing of machine learning today…

def train(self, doc, cat):
word = getwords(doc)
for w in word:
    self.wordcountup(w, cat)
self.catcountup(cat)

This is train function: Got words in doc then cat value amounts are counted up for the words. However this is only for one category by one web content. However there are two or upper category will also be tagged for one web content. So I changed the script like this.

def train(self, doc, cats):
word = getwords(doc)
for w in word:
    for cat in cats:
        self.wordcountup(w, cat)
        self.catcountup(cat)

Use cats value as list. Not single string. using for to count up each category by words.

Next is to modifying the result showing. Original script is like this.

def classifier(self, doc):
    best = None
    max = -sys.maxint
    word = getwords(doc)

    for cat in self.catcount.keys():
        prob = self.score(word, cat)
        if prob < max:
            max = prob
            best = cat
    return best

This function returns the best category name. However I’d like to show all category and probability. So I modified like this.

def classifier(self, doc):
    best = None
    max = -sys.maxint
    word = getwords(doc)
    pList = []
    
    for cat in self.catcount.keys():
        prob = math.exp(self.score(word, cat))
        pList.append([cat, self.score(word, cat)]) 

    return sorted( pList, key=lambda prob: prob[1], reverse=True)

Previous code, just return maximum probably tag. But I’d like to know all tag’s result. so return the list.

The engine of machine language is just using other person’s idea… Next I’d like to show you the result of the machine learning and consideration.

Hi, this is Umemura.

I always write a sentence after I have a can of beer. It’s good, but it’s just right.

So, today I will finally go to the main body of machine learning. No, I’m sorry to have kept you waiting so far. It will finally start. No, no. .. I have one thing to apologize for.

** We will not code machine learning this time! ** **

No, stop and don’t throw stones! ··· That’s right. I won’t do it. I can’t say that. Instead, we will use the sample code of Native Bayes from the following site for machine learning.

Let’s get started with machine learning Part 3 Let’s implement Bayesian filter –gihyo.jp

This article and series is very educational. Actually, I also started machine learning with this article as a starting point. It is a very polite structure that anyone can work on machine learning once they remember the knowledge of high school mathematics probability, algebra calculation, and differentiation.

Well, today’s content is over. .. .. Well then!
I’m lonely, so today I would like to introduce how I modified this Native Bayesian code.
First of all, the following part.

def train(self, doc, cat):
word = getwords(doc)
for w in word:
    self.wordcountup(w, cat)
self.catcountup(cat)

Here, doc is a sentence to be learned and cat is a tag to be applied, but the original is in a form that only one tag can be attached to one sentence. However, this time we can add multiple tags, so let’s set cat to cats so that we can put a list of tags.

def train(self, doc, cats):
word = getwords(doc)
for w in word:
    for cat in cats:
        self.wordcountup(w, cat)
        self.catcountup(cat)

It is easy to increase the appearance count of each tag in the list accordingly.

And next. As for how to show the estimation judgment result, the original script just returns the tag with the highest probability.

def classifier(self, doc):
    best = None
    max = -sys.maxint
    word = getwords(doc)

    for cat in self.catcount.keys():
        prob = self.score(word, cat)
        if prob < max:
            max = prob
            best = cat
    return best

That’s not enough to consider, so I’ll try to return all the tags and their probabilities (actually logarithmic). Sorted in descending order of probability.

def classifier(self, doc):
    best = None
    max = -sys.maxint
    word = getwords(doc)
    pList = []
    
    for cat in self.catcount.keys():
        prob = math.exp(self.score(word, cat))
        pList.append([cat, self.score(word, cat)]) 

    return sorted( pList, key=lambda prob: prob[1], reverse=True)

By the way, today I introduced a machine learning script. Next time, I’d like to use this script to learn and show the results. And I would like to consider various results.

See you again!