MySQL - Natural Language Fulltext Search



Before we fully get into the concept of Natural Language Full-text Search, let us try to understand the context of it. Nowadays, the keywords used for searches might not always match the results that users expect. So search engines are designed to focus on increasing search relevance to reduce the accuracy gap between search queries and search results. Thus, results are displayed in order of most relevance to the search keyword.

Similarly, in relational databases like MySQL, full-text search is a technique used to retrieve result-sets that might not perfectly match the search keyword. There are three types of search modes used with full-text search −

  • Natural Language Mode

  • Query Expansion Mode

  • Boolean Mode

The Natural Language Full-text search performs the usual Full-text search in the IN NATURAL LANGUAGE mode. When a Full-text search is performed in this mode, the search results are displayed in the order of their relevance to the keyword (against which this search is performed). This is the default mode for the Full-text search.

Since this is a Full-text search, the FULLTEXT indexes must be applied on text-based columns (like CHAR, VARCHAR, TEXT datatype columns). The FULLTEXT index is a special type of index that is used to search for the keywords in the text values instead of trying to compare the keyword with these column values.

Syntax

Following is the basic syntax to perform the Natural Language Full-text Search −

SELECT * FROM table_name 
WHERE MATCH(column_name(s)) 
AGAINST ('keyword_name' IN NATURAL LANGUAGE MODE);

Example

Let us understand how to perform Natural Language Full-text Search on a database table in the following example.

For that, we will first create a table named ARTICLES containing the title and description of an article. The FULLTEXT index is applied on text columns article_title and descriptions as shown below −

CREATE TABLE ARTICLES (
   ID INT AUTO_INCREMENT NOT NULL PRIMARY KEY,
   ARTICLE_TITLE VARCHAR(100),
   DESCRIPTION TEXT,
   FULLTEXT (ARTICLE_TITLE, DESCRIPTION)
) ENGINE = InnoDB;

Now, let us insert details about articles, like their titles and DESCRIPTION, into this table using the following queries −

INSERT INTO ARTICLES (ARTICLE_TITLE, DESCRIPTION) VALUES 
('MySQL Tutorial', 'MySQL is a relational database system that uses SQL to structure data stored'),
('Java Tutorial', 'Java is an object-oriented and platform-independent programming language'),
('Hadoop Tutorial', 'Hadoop is framework that is used to process large sets of data'),
('Big Data Tutorial', 'Big Data refers to data that has wider variety of data sets in larger numbers'),
('JDBC Tutorial', 'JDBC is a Java based technology used for database connectivity');

The table is created is as follows −

ID ARTICLE_TITLE DESCRIPTION
1 MySQL Tutorial MySQL is a relational database system that uses SQL to structure data stored
2 Java Tutorial Java is an object-oriented and platform-independent programming language
3 Hadoop Tutorial Hadoop is framework that is used to process large sets of data
4 Big Data Tutorial Big Data refers to data that has wider variety of data sets in larger numbers
5 JDBC Tutorial JDBC is a Java based technology used for database connectivity

Using the Natural Language Mode in Full-text search, search for records of articles relevant to data, with the keyword 'data set'.

SELECT * FROM ARTICLES 
WHERE MATCH(ARTICLE_TITLE, DESCRIPTION) 
AGAINST ('data set' IN NATURAL LANGUAGE MODE);

Output

Following is the output −

ID ARTICLE_TITLE DESCRIPTION
4 Big Data Tutorial Big Data refers to data that has wider variety of data sets in larger numbers
1 MySQL Tutorial MySQL is a relational database system that uses SQL to structure data stored
3 Hadoop Tutorial Hadoop is framework that is used to process large sets of data

As we see above, among all the articles present in the table, three search results are obtained which are relevant to the term 'data set' and are arranged in the order of their relevance. But note how keyword 'data set' is not a perfect match in the 'MySQL Tutorial' article record and its still retrieved because MySQL deals with data sets as well.

The Natural Language Full-text Search uses tf-idf algorithm, where 'tf' refers to term frequency and 'idf' is inverse document frequency. The search refers to the frequency of a word in a single document, and the number of documents the word is present in. However, there are some words that the search usually ignores, like words having less than certain characters. InnoDB ignores words with less than 3 characters while MyISAM ignores words less than 4 characters. Such words are known as Stopwords (the, a, an, are etc.).

Example

In the following example, we are performing a simple Natural Language Full-text Search on the ARTICLES Table created above. Let us see how stop words impact the Full-text search by performing it against two keywords: 'Big Tutorial' and 'is Tutorial'.

Searching 'Big Tutorial':

Following query performs the full-text search in Natural Language Mode against 'Big Tutorial' keyword −

SELECT ARTICLE_TITLE, DESCRIPTION FROM ARTICLES 
WHERE MATCH(ARTICLE_TITLE, DESCRIPTION)
AGAINST ('Big Tutorial' IN NATURAL LANGUAGE MODE);

Output:

The output is obtained as −

ARTICLE_TITLE DESCRIPTION
Big Data Tutorial Big Data refers to data that has wider variety of data sets in larger numbers
MySQL Tutorial MySQL is a relational database system that uses SQL to structure data stored
Java Tutorial Java is an object-oriented and platform-independent programming language
Hadoop Tutorial Hadoop is framework that is used to process large sets of data
JDBC Tutorial JDBC is a Java based technology used for database connectivity

Searching 'is Tutorial':

Following query performs the full-text search in Natural Language Mode against 'is Tutorial' keyword −

SELECT ARTICLE_TITLE, DESCRIPTION FROM Articles 
WHERE MATCH(ARTICLE_TITLE, DESCRIPTION)
AGAINST ('is Tutorial' IN NATURAL LANGUAGE MODE);

Output:

The output is obtained as −

ARTICLE_TITLE DESCRIPTION
MySQL Tutorial MySQL is a relational database system that uses SQL to structure data stored
Java Tutorial Java is an object-oriented and platform-independent programming language
Hadoop Tutorial Hadoop is framework that is used to process large sets of data
Big Data Tutorial Big Data refers to data that has wider variety of data sets in larger numbers
JDBC Tutorial JDBC is a Java based technology used for database connectivity

As we see in the example above, since the word 'Tutorial' is present in all the records of the table, all of them are retrieved in both cases. However, the order of relevance is determined by the second word of the keyword specified.

In the first case, as the word 'Big' is present in 'Big Data Tutorial', that record is retrieved first. In the second case, the order of records in the result-set are the same as that of original table since the word 'is' is a stop word, so it is ignored.

Natural-language-Fulltext-search Using a Client Program

We can also Perform Natural-language-fulltext-search operation on a MySQL database using the client program.

Syntax

To perform the Natural-language-Fulltext-search through a PHP program, we need to execute the following SELECT statement using the mysqli function query() as follows −

$sql = "SELECT * FROM Articles WHERE MATCH(ARTICLE_TITLE, DESCRIPTION)  AGAINST ('data set' IN NATURAL LANGUAGE MODE)";
$mysqli->query($sql);

To perform the Natural-language-Fulltext-search through a JavaScript program, we need to execute the following SELECT statement using the query() function of mysql2 library as follows −

sql = `SELECT * FROM Articles  WHERE MATCH(ARTICLE_TITLE, DESCRIPTION) AGAINST ('data set' IN NATURAL LANGUAGE MODE)`;
con.query(sql);

To perform the Natural-language-Fulltext-search through a Java program, we need to execute the SELECT statement using the JDBC function executeQuery() as follows −

String sql = "SELECT * FROM Articles WHERE MATCH(ARTICLE_TITLE, DESCRIPTION)  AGAINST ('data set' IN NATURAL LANGUAGE MODE)";
statement.executeQuery(sql);

To perform the Natural-language-Fulltext-search through a python program, we need to execute the SELECT statement using the execute() function of the MySQL Connector/Python as follows −

natural_language_search_query = 'SELECT * FROM Articles WHERE MATCH(ARTICLE_TITLE, DESCRIPTION) AGAINST ('data set' IN NATURAL LANGUAGE MODE)'
cursorObj.execute(natural_language_search_query)

Example

Following are the programs −

$dbhost = 'localhost';
$dbuser = 'root';
$dbpass = 'password';
$dbname = 'TUTORIALS';
$mysqli = new mysqli($dbhost, $dbuser, $dbpass, $dbname);
if ($mysqli->connect_errno) {
    printf("Connect failed: %s
", $mysqli->connect_error); exit(); } // printf('Connected successfully.
'); $s = "SELECT * FROM Articles WHERE MATCH(ARTICLE_TITLE, DESCRIPTION) AGAINST ('data set' IN NATURAL LANGUAGE MODE)"; if ($r = $mysqli->query($s)) { printf("Table Records: \n"); while ($row = $r->fetch_assoc()) { printf(" ID: %d, Title: %s, Descriptions: %s", $row["id"], $row["ARTICLE_TITLE"], $row["DESCRIPTION"]); printf("\n"); } } else { printf('Failed'); } $mysqli->close();

Output

The output obtained is as shown below −

Table Records:
ID: 4, Title: Big Data Tutorial, Descriptions: Big Data refers to data that has wider variety of data sets in larger numbers
ID: 1, Title: MySQL Tutorial, Descriptions: MySQL is a relational database system that uses SQL to structure data stored
ID: 3, Title: Hadoop Tutorial, Descriptions: Hadoop is framework that is used to process large sets of data   
var mysql = require("mysql2");
var con = mysql.createConnection({
  host: "localhost",
  user: "root",
  password: "password",
}); //Connecting to MySQL

con.connect(function (err) {
  if (err) throw err;
  //   console.log("Connected successfully...!");
  //   console.log("--------------------------");
  sql = "USE TUTORIALS";
  con.query(sql);

  //display the table details!...
  sql = `SELECT * FROM Articles  WHERE MATCH(ARTICLE_TITLE, DESCRIPTION)  AGAINST ('data set' IN NATURAL LANGUAGE MODE)`;
  con.query(sql, function (err, result) {
    if (err) throw err;
    console.log(result);
  });
});    

Output

The output obtained is as shown below −

We get the following output, after executing the above NodeJs Program.
[
  {
    id: 4,
    ARTICLE_TITLE: 'Big Data Tutorial',
    DESCRIPTION: 'Big Data refers to data that has wider variety of data sets in larger numbers'
  },
  {
    id: 1,
    ARTICLE_TITLE: 'MySQL Tutorial',
    DESCRIPTION: 'MySQL is a relational database system that uses SQL to structure data stored'
  },
  {
    id: 3,
    ARTICLE_TITLE: 'Hadoop Tutorial',
    DESCRIPTION: 'Hadoop is framework that is used to process large sets of data'
  }
]  
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

public class NaturalLanguageSearch {
   public static void main(String[] args) {
      String url = "jdbc:mysql://localhost:3306/TUTORIALS";
      String username = "root";
      String password = "password";
      try {
         Class.forName("com.mysql.cj.jdbc.Driver");
         Connection connection = DriverManager.getConnection(url, username, password);
         Statement statement = connection.createStatement();
         System.out.println("Connected successfully...!");

         //displaying the fulltext records in the Natural language mode:
         ResultSet resultSet = statement.executeQuery("SELECT * FROM Articles WHERE MATCH(ARTICLE_TITLE, descriptions)  AGAINST ('data set' IN NATURAL LANGUAGE MODE)");
         while (resultSet.next()){
            System.out.println(resultSet.getString(1)+" "+resultSet.getString(2)+ " "+resultSet.getString(3));
         }
         connection.close();
      } catch (Exception e) {
         System.out.println(e);
      }
   }
}          

Output

The output obtained is as shown below −

Connected successfully...!
4 Big Data Tutorial Big Data refers to data that has wider variety of data sets in larger numbers
1 MySQL Tutorial MySQL is a relational database system that uses SQL to structure data stored
3 Hadoop Tutorial Hadoop is framework that is used to process large sets of data
import mysql.connector
# Establishing the connection
connection = mysql.connector.connect(
   host='localhost',
   user='root',
   password='password',
   database='tut'
)
# Creating a cursor object
cursorObj = connection.cursor()
natural_language_search_query = '''
SELECT * FROM Articles 
WHERE MATCH(ARTICLE_TITLE, DESCRIPTION) 
AGAINST ('data set' IN NATURAL LANGUAGE MODE)
'''
cursorObj.execute(natural_language_search_query)
# Fetching all the results
results = cursorObj.fetchall()
# Display the result
print("NATURAL LANGUAGE search results:")
for row in results:
   print(row)
cursorObj.close()
connection.close()            

Output

The output obtained is as shown below −

NATURAL LANGUAGE search results:
(4, 'Big Data Tutorial', 'Big Data refers to data that has wider variety of data sets in larger numbers')
(1, 'MySQL Tutorial', 'MySQL is a relational database system that uses SQL to structure data stored')
(3, 'Hadoop Tutorial', 'Hadoop is framework that is used to process large sets of data')
Advertisements