Understanding MLB transaction news using Microsoft Cognitive Services (LUIS AI) — Part 1

8 min readNov 15, 2020

As I mentioned in an earlier post I work for a company in a sport domain. Our company is involved in many different sports, I’m part of a Baseball Department. We build products which provide feedback to a player about their performance. Simply put, player would throw/hit a ball and later on get a report with various data attributes about his performance (ball speed, spin rate, trajectory etc…).

One of the needs we have is up to date team roster. Assume there is a roster for a team. Ideally we’d want to update team roster as soon as there is a change, e.g. when a player is traded or injured or put on restricted list etc.

This is a challenging task, as there is no API (that I know of) that provides roster updates.

We found a web page that provides MLB transactions news. Each MLB team has a dedicated page that provides transaction news for a given month. For example this is a web page of Boston Red Sox transactions for January, 2020 — https://www.mlb.com/redsox/roster/transactions/2020/01

Processing this data is not straightforward. We can parse HTML web pages, but then transaction text needs to be understood before it can be processed.

Natural Language Understanding

Natural-language understanding is a branch of AI/ML. Microsoft Azure Cognitive Services provides Language Understanding (LUIS AI) service which helps with natural language understanding & doesn’t require knowledge of AI/ML. LUIS AI greatly integrates with chat bot apps (and most of the examples I could find are building chat bots), but I want to give it a try & see how much it can help with our use case — understanding MLB transaction news.

High Overview Of LUIS AI

In the heart of LUIS AI are intents and entities. I’m going to quote LUIS documentation.

An intent represents a task or action the user wants to perform. It is a purpose or goal expressed in a user’s utterance.

We don’t have users. In our case we want to express transaction news as an intent (player traded, player put on injury list etc.)

An entity extracts data from a user utterance at prediction runtime.

Entity will represent data from a transaction news (player, team, player’s position etc…).

A prediction score indicates the degree of confidence LUIS has for prediction results of a user utterance.

When LUIS predicts intent, it also provides a prediction score. This will help us to understand how well LUIS AI was trained for a particular intent.

And there are more pieces that help with building/training/improving LUIS AI models. We’ll go over some of them when we start to build our own models for transaction news.

I want to build a LUIS AI app which will understand transaction news and map them to intents. In essence, I want to:

· Determine intents (and provide some examples)

· Define entities to extract data

· Label example intents using entities we defined

· Train

· Test

· Publish

· Improve

Collecting And Preparing Data To Train LUIS AI

Before we can start creating intents and define entities to extract data we need to understand what news we might get. I created a console application to parse all transaction news entries for all teams from January — September.

Then, I tried to label each transaction news entry and group them into labels. Here are the labels with number of transaction news entries corresponding to each (in descending order)

· Player assigned to a team (1017 entries)

· Invited non-roster to spring training (702)

· Player signed to minor league contract (674)

· Recalled (500)

· Player placed on Injured List (483)

· Optioned to training site (368)

· Team selected Contract (246)

· Player activated from injured list (238)

· Designated for assignment (207)

· Team signed player (153)

· Player sent outright (133)

· Player Released (119)

· Player Trade (96)

· Extended Injury List (69)

· Player Activated (67)

· Team signed free agent (55)

· Player claimed off waivers (55)

· Player roster status changed (51)

· Player placed on restricted list (42)

· Player placed on paternity (18)

· Player placed on bereavement (9)

· Player returned to a team (8)

· Player reassigned to a minor league (7)

· Player elected free agency (6)

· Player sent on a rehab assignment (3)

· Player retired (1)

You can see this is a long list. Let’s pick relatively simple label and try to create a model for it with high prediction score.

Creating LUIS App

Looking at example data for Player placed on bereavement (9 entries) label, it looks straightforward so I would like to start with that. Few examples of this data:

Baltimore Orioles placed LHP John Means on the bereavement list.
Tampa Bay Rays placed CF Manuel Margot on the bereavement list.

Given this sample input text, I want LUIS to predict (with high prediction score, let’s say high means bigger or equal to 0.85) an intent of a player placed on bereavement list.

Let’s go to LUIS portal and create application — Transaction News Processor. Navigate to Intents, we see there is None intent created for us. In short None intent is a way for LUIS to recognize text that is outside of our domain. For the chat bot apps, where text input is coming from a human, this makes sense as user can type whatever they want. In our case we know that likelihood of this happening is very small since we’re parsing transaction news from a dedicated HTML page. But let’s follow the advice & add few examples that are outside of our domain (baseball transactions).

Create An Intent

Ok with None intent in place, let’s now create our first intent — PlayerPlacedOnBereavementList. We have 9 examples of player being put on bereavement list. LUIS documentation advises to have at least 15–30 entries. We can add examples entries ourselves to get to 20 total examples. We’ll follow the pattern

{MLB TEAM} placed {PLAYER POSITION} {PLAYER} on the bereavement list.

Let’s now add examples to PlayerPlacedOnBereavementList intent

Create Entities

Following LUIS documentation advise let’s create decomposable machine-learning entity. As documentation states:

Entity decomposability is important for both intent prediction and for data extraction with the entity.
Start with a machine-learning entity, which is the beginning and top-level entity for data extraction. Then decompose the entity into subentities.

I will create a PlayerOnBereavement machine learned entity and add a structure

We know we want to extract

· Team

· Position player plays in

· Player

We end up with the following entity

Entities are used to extract data from an intent. We already saw a machine-learning entity which is decomposed to sub-entities.

Another type of an entity is a list. To quote LUIS documentation

List entities represent a fixed, closed set of related words along with their synonyms

Very good candidate to represent 30 MLB teams.

Let’s create MlbTeamList list entity which will help to extract team information

Note, I’m adding twitter handlers as synonyms. Although, I’m not expecting MLB transaction page to provide twitter handlers, I might use the same app to try to understand tweets about MLB transactions.

Following same logic we can create PlayerPositionList list entity to extract player position from an intent.

Last subentity of a machine learning PlayerOnBereavement entity is a Player. Player is represented as a person name in the text. LUIS has many prebuilt entities to extract data such as numbers, dates etc. One of these prebuilt entities is personName. Let’s add personName prebuilt entity to our Entities.

These are entities we created so far

PlayerOnBereavement — is our machine learned decomposed entity.

MlbTeamList — list entity which should extract MLB team

PlayerPositionList — list entity which should extract player position.

personName — prebuilt entity to extract Player’s name.

Another LUIS concept is machine-learning features. As stated in documentation

Machine-learning features give LUIS important cues for where to look for things that distinguish a concept. They’re hints that LUIS can use, but they aren’t hard rules. LUIS uses these hints in conjunction with the labels to find the data.

There are 2 types of features

· Phrase list feature

· Model (intent or entity) as a feature

We created 3 entities and can add them as features to sub-entities of machine learning PlayerOnBereavement entity to help with data extraction.

Let’s add features to sub-entities

Before moving on, let’s recap what we have.

An intent — PlayerPlacedOnBereavementList

A machine learning entity — PlayerOnBereavement, decomposed to

Team sub-entity with MlbTeamList list entity as a feature
Position sub-entity with PlayerPositionList list entity as a feature
Player entity with build-in personName entity as a feature.

With this in place, let’s now look into labeling example intents.

Label Example Intents

I now go back to example intents and label them. By doing so we help LUIS to learn how to extract data from an intent.

Train & Test

Once labeling process is finished we can click on Train to kick off LUIS training process based on intent/entity labeled examples.

After it’s done, let’s create few test examples to see how LUIS performs.

Preferably test examples should have data that weren’t seen before.

Given the text:

Washington Nationals placed SS John Stewart on bereavement list.

Here’s the prediction:

Not bad, LUIS is 98 % sure that this is PlayerPlacedOnBereavementList intent.

It also correctly extracted team (Washington Nationals) player position (SS) player (John Stewart).

Let’s complicate a bit. Let’s create a test example where player’s name contains similar text to a position

Los Angeles Dodgers placed P J.P. Morgan on the bereavement list.

Great results ! 99 % prediction score that this is indeed PlayerPlacedOnBereavementList intent.

Data is also correctly extracted — team (los angeles dodgers), player position (P), player (J.P. Morgan).

We’ll look into more complex text in the next post.