As was the case with the last tutorial, lets set up our enviornment.
This time, we are going to be doing something a little different, we are going to be using regular expressions to parse natural language and get the logical information out. Unlike the previous tutorial, this process is slightly more exhaustive so it can take slightly more time to develop and deploy a system like this.
If you don't know what regular expressions are, take a look here. In general, regular expressions are a series of characters that are used for pattern matching in text.
Now, lets define some regular expression templates.
#Here we define the regular expression templates
EXPRESSIONS = [("what.*temp.*kitchen.*", ["kitchen", "temperature"]),
("how.*hot.*kitchen.*", ["kitchen", "temperature"]),
("how.*cold.*kitchen.*", ["kitchen", "temperature"]),
("what.*temp.*bath.*", ["bathroom", "temperature"]),
("how.*hot.*bath.*", ["bathroom", "temperature"]),
("how.*cold.*bath.*", ["bathroom", "temperature"]),
("what.*temp.*living.*", ["livingroom", "temperature"]),
("how.*hot.*living.*", ["livingroom", "temperature"]),
("how.*cold.*living.*", ["livingroom", "temperature"]),
("what.*temp.*family.*", ["livingroom", "temperature"]),
("how.*hot.*family.*", ["livingroom", "temperature"]),
("how.*cold.*family.*", ["livingroom", "temperature"]),
("what.*temp.*bed.*", ["bedroom", "temperature"]),
("how.*hot.*bed.*", ["bedroom", "temperature"]),
("how.*cold.*bed.*", ["bedroom", "temperature"]),
("what.*temp.*dining.*", ["diningroom", "temperature"]),
("how.*hot.*dining.*", ["diningroom", "temperature"]),
("how.*cold.*dining.*", ["diningroom", "temperature"])]
#Here we define some indexes to keep track of the format of the expressions data-structure
REGEX = 0
LOGICAL_FORM = 1
LOGICAL_FORM_ROOM = 0
LOGICAL_FORM_ATTRIBUTE = 1
Again, lets get some input to our system. In the same format as the last tutorial.
inputString = "How cold is it in the kitchen?" # The question that we want to answer
systemInput = {"question": inputString,
"history" : []} # A simple datastructure for controlling that data
Next, we are going to write a function to clean the input, in a fairly similar manner as we did in the last tutorial. But this time, we arent going to split the string into words, we are just going to clean the original text.
def cleanText(inputString):
#First we convert the input to lower case
loweredInput = inputString.lower()
#Then we remove all the characters that are not alphanumeric, or spaces
cleanedInput = ""
for character in loweredInput: #For every character in the question that has been converted to lower case
if(character.isalnum() or character == " "): #Check to see if it is an alpha numeric character (A-Z, a-z, 0-9) or a space and if it is...
cleanedInput += character #Then we add it to the cleaned input string, building it up character by character
else: #If it isn't alpha numeric, or a space
pass #Then we ignore it because we no longer need to keep track of it
#This is what our input question looks like now...
print("cleanedInput:", cleanedInput)
#Finally, return the cleaned input
return cleanedInput
Now that we have our cleaned text, all we need to do is see if it matches any of the templates that we have written. If it matches, then we know what the user is talking about (usually) and if we don't we can ask the user to rephrase their question.
Lets write a function to perform this task.
#We start by importing the python library for regular expressions
import re #You can find support for regular expressions in most modern programming langauges
def extractLogicalForm(inputString):
#Next we are going to look through all the patterns that we have until we find one that matches a template
extractedLogicalForm = None
#For every regex pattern that we have
for regex, logicalForm in EXPRESSIONS:
compiledRegex = re.compile(regex) #compile the pattern (convert it from a string to something python understands)
result = compiledRegex.match(inputString) #Check to see if the regular expression matches the question we have asked
if(result != None): #If there is a match...
print("We found a match!")
print("\tRegex:", regex)
print("\tLogical Form:", logicalForm)
return logicalForm #Return the logical form so we are able to use it later
print("We didn't find a match") #If we didn't find a match, say so
return None
Next we are going to define a dummy function that returns a fake value when asked for a temperature for a specific room. This is the same dummy function from the previous tutorial. Again you can replace its contents with whatever contents is needed to actually look up the temperature.
def getAttributeValue(room, attribute):
if (room != None and attribute != None):
if(room == "livingroom" and attribute == "temperature"):
return 72
elif(room == "bathroom" and attribute == "temperature"):
return 73
elif(room == "kitchen" and attribute == "temperature"):
return 81
elif(room == "bedroom" and attribute == "temperature"):
return 68
elif(room == "diningroom" and attribute == "temperature"):
return 79
else:
return 50
else:
raise Exception("There was an error parsing the sentence, got: " + str(room) + ", " + str(attribute))
Now that we have everything, we can plut it into our functions and get the answer that we want
rawInputString = systemInput["question"]
print("Got input:", rawInputString)
cleanedInput = cleanText(rawInputString)
extractedLogicalForm = extractLogicalForm(cleanedInput)
targetAttribute = extractedLogicalForm[LOGICAL_FORM_ATTRIBUTE]
targetRoom = extractedLogicalForm[LOGICAL_FORM_ROOM]
print("The", targetAttribute, "in the", targetRoom, "is", getAttributeValue(targetRoom, targetAttribute))
But as was the case with the last tutorial you should be able to see how this system can fail. While this system is a bit more loose and is able to handle more varied types of queries, you need to try and enumerate every possible query that you can get before hand and write a regular expression for it. This can be somewhat daunting in a real world system. Again, spellings of words matter greatly, and if a word is incorrectly spelled the system isn't able to handle it.
How would you handle an input such as "How hot is it in the place where I typically cook?"