Understanding MLB transaction news using Microsoft Cognitive Services (LUIS AI) — Part 2

Manvel Ghazaryan
5 min readNov 21, 2020

In Part 1 we’ve looked into basics of LUIS AI & created a model to predict PlayerPlacedOnBereavementList intent.

Let’s now look into more complex text. Recall, I extracted all transaction news for all teams from January — September. One intent I see has variations in news text is player trade (96 examples). Here are few example transaction news about a player trade:

Detroit Tigers traded LF Cameron Maybin to Chicago Cubs for SS Zack Short.

Baltimore Orioles traded RHP Mychal Givens to Colorado Rockies for 1B Tyler Nevin; SS Terrin Vavra and Player To Be Named Later.

Cleveland Indians traded RHP Mike Clevinger; LF Greg Allen and Player To Be Named Later to San Diego Padres for C Austin Hedges; RHP Cal Quantrill; RF Josh Naylor; Gabriel Arias; Joey Cantillo and Owen Miller.

Pittsburgh Pirates traded CF Jarrod Dyson to Chicago White Sox for Future Considerations.

Atlanta Braves traded 1B Yonder Alonso to San Diego Padres for cash.

Cincinnati Reds traded LHP Jose Salvador to Los Angeles Angels.

San Francisco Giants traded cash to Chicago White Sox for CF Luis Alexander Basabe.

This is definitely more complex than PlayerPlacedOnBereavementList intent.

Let us follow same steps and create a model to properly predict intent & extract data.

Create An Intent

Let’s create a TradePlayers intent and add few examples (NOTE, I have examples labelled in the screenshot, but we should do it after we create entities to extract data)

I want to create a decomposable machine learning entity — Trade. Looking into example player trade news, we can see there are few data points to extract.

For example, given following

Detroit Tigers traded LF Cameron Maybin to Chicago Cubs for SS Zack Short.

I’d like to extract

· Trading Team — Detroit Tigers

· Traded Player

· Buying Team — Chicago Cubs

· Bought Player

For PlayerPlacedOnBereavementList we were extracting player’s name & position independently. How about adding structure to Traded/Bought Player entities and represent them as position + personName. Let’s try that

Since we already have team/position list entities defined, I’m adding them as features.

With this in place we can go back & label example intents.

Label Example Intents

I tried to add different examples of player trade news to an intent. We could see from examples that not all data points are available to be extracted.

For example

Here there is no Bought Player as trade was done for a cash.

In another example, there is more than 1 traded/bought players

Recall that we’re interested in roster changes. So if a team trades a player to another team for a cash, all we’re interested in is to move a player from a trading team roster to a buying team roster.

Once we have all examples labeled (recommended minimum is 15–30 examples) we can train & test.

Train & Test

I’m following advice of having 80 % of examples as a train data (provided as an example to an intent & labelled) & 20 % of examples as test data.

Click Train, wait until LUIS AI done training & lets see how well our model performs.

I’ll look into simple cases first.

Given text:

tampa bay rays traded rhp dylan covey to boston red sox.

LUIS predicts with 92 % confidence that this is a TradePlayers intent & properly extracts data.

Another example:

Houston Astros traded 3B Jack Mortensen to Washington Nationals for cash.

In this case prediction score is 89 %. By our definition still high (recall we agreed high means > 85 %). Data is extracted properly.

Publish & Test With Real Data

I’ve added couple more intents & defined entities to extract data. Final list of intents are the following

So far I was using transaction news for all teams from January to September to train & test (manually).

There are transaction news for October. I want to publish the app & this time ask LUIS to predict what’s the transaction news.

First I publish the app, obtain the url for prediction endpoint:

https://eastus2.api.cognitive.microsoft.com/luis/prediction/v3.0/apps/{app-guid}/slots/staging/predict?subscription-key={subscription-guid}&verbose=true&show-all-intents=true&log=true&query=YOUR_QUERY_HERE

I will now re-run transaction news parser app & this time request LUIS AI to give an intent prediction.

At the same time I parsed transaction news for October and labelled them (like it was done for news from January to September). I want to compare LUIS predictions with labelling done by me (obviously not all news will be properly predicted, I’ll focus on ones that have intent created).

Time to analyze the results and see how well our model performed.

First let’s look into activated from restricted label. It has 14 transaction news. Model correctly predicted intent — PlayerActivated for all of them, lowest prediction score is 95 %.

Next is activated from injured list label. It has 42 transaction news. Model corectly predicted Intent — PlayerActivated for all entries, lowest prediction score is 95 %.

sent outright label has 76 transaction news. All of them were correctly predicted to be PlayerSentOutright. Lowest prediction score is 72 %. If we consider confidence threshold of 85 % (arbitrary number we agreed on), out of 76 transaction news, 17 has prediction score below 85 % (or we can say 77 % of predictions satisfy our confidence threshold requirements).

player activated label contains 188 records. All of them were correctly predicted as PlayerActivated intent, with the lowest prediction score of 87 %.

Note that while labels are different, some of them fall under the same intent. That is because our app needs to know when a player activated, doesn’t matter from which list.

I also checked entities and all were correctly extracted.

In closing

I think LUIS AI is a powerful tool to create applications which require NLU features. It is relatively easy to get started, is a good choice especially for teams who don’t have AI/ML engineers skilled in NLU domain. In my opinion it will also greatly fit prototyping phase.

How well it will perform for a given domain is yet to be discovered. In the domain of understanding roster change MLB news (R&D, prototyping phase) it was good enough considering complete lack of NLU skills and efforts spent overall.

--

--