Hive Interview Questions



Dear readers, these Hive Interview Questions have been designed specially to get you acquainted with the nature of questions you may encounter during your interview for the subject of Hive. As per my experience good interviewers hardly plan to ask any particular question during your interview, normally questions start with some basic concept of the subject and later they continue based on further discussion and what you answer −

There are two types. Managed table and external table. In managed table both the data an schema in under control of hive but in external table only the schema is under control of Hive.

No Hive does not provide insert and update at row level. So it is not suitable for OLTP system.

Alter Table table_name RENAME TO new_name

Using REPLACE column option

ALTER TABLE table_name REPLACE COLUMNS ……

It is a relational database storing the metadata of hive tables, partitions, Hive databases etc

Depending on the nature of data the user has, the inbuilt SerDe may not satisfy the format of the data. SO users need to write their own java code to satisfy their data format requirements.

Hive is a tool in Hadoop ecosystem which provides an interface to organize and query data in a databse like fashion and write SQL like queries. It is suitable for accessing and analyzing data in Hadoop using SQL syntax.

hdfs://namenode_server/user/hive/warehouse

  • Local mode
  • Distributed mode
  • Pseudodistributed mode

Yes. The TIMESTAMP data types stores date in java.sql.timestamp format

There are three collection data types in Hive.

  • ARRAY
  • MAP
  • STRUCT

Yes, using the ! mark just before the command.

For example !pwd at hive prompt will list the current directory.

The hive variable is variable created in the Hive environment that can be referenced by Hive scripts. It is used to pass some values to the hive queries when the query starts executing.

Using the source command.

Example −

Hive> source /path/to/file/file_with_query.hql

It is a file containing list of commands needs to run when the hive CLI starts. For example setting the strict mode to be true etc.

The default record delimiter is − \n

And the filed delimiters are − \001,\002,\003

The schema is validated with the data when reading the data and not enforced when writing data.

SHOW DATABASES LIKE ‘p.*’

With the use command you fix the database on which all the subsequent hive queries will run.

There is no way you can delete the DBPROPERTY.

It sets the mapreduce jobs to strict mode.By which the queries on partitioned tables can not run without a WHERE clause. This prevents very large job running for long time.

This can be done with following query

SHOW PARTITIONS table_name PARTITION(partitioned_column=’partition_value’)

org.apache.hadoop.mapred.TextInputFormat

org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

When we issue the command DROP TABLE IF EXISTS table_name

Hive throws an error if the table being dropped does not exist in the first place.

The data stays in the old location. It has to be moved manually.

ALTER TABLE table_name
CHANGE COLUMN new_col  INT
BEFORE x_col

No. It only reduces the number of files which becomes easier for namenode to manage.

By using the ENABLE OFFLINE clause with ALTER TABLE atatement.

By Omitting the LOCAL CLAUSE in the LOAD DATA statement.

The new incoming files are just added to the target directory and the existing files are simply overwritten. Other files whose name does not match any of the incoming files will continue to exist.

If you add the OVERWRITE clause then all the existing data in the directory will be deleted before new data is written.

It creates partition on table employees with partition values coming from the columns in the select clause. It is called Dynamic partition insert.

A table generating function is a function which takes a single column as argument and expands it to multiple column or rows. Example exploe()

If we set the property hive.exec.mode.local.auto to true then hive will avoid mapreduce to fetch query results.

The LIKE operator behaves the same way as the regular SQL operators used in select queries. Example −

street_name like ‘%Chi’

But the RLIKE operator uses more advance regular expressions which are available in java

Example − street_name RLIKE ‘.*(Chi|Oho).*’ which will select any word which has either chi or oho in it.

No. As this kind of Join can not be implemented in mapreduce

In a join query the smallest table to be taken in the first position and largest table should be taken in the last position.

It controls ho wthe map output is reduced among the reducers. It is useful in case of streaming data

Select cast(price as FLOAT)

Hive will return NULL

No. The name of a view must be unique whne compared to all other tables and views present in the same database.

No. A view can not be the target of a INSERT or LOAD statement.

Indexes occupies space and there is a processing cost in arranging the values of the column on which index is cerated.

SHOW INDEX ON table_name

This will list all the indexes created on any of the columns in the table table_name.

The values in a column are hashed into a number of buckets which is defined by user. It is a way to avoid too many partitions or nested partitions while ensuring optimizes query output.

It is query hint to stream a table into memory before running the query. It is a query optimization Technique.

Yes. A partition can be archived. Advantage is it decreases the number of files stored in namenode and the archived file can be queried using hive. The disadvantage is it will cause less efficient query and does not offer any space savings.

It is a UDF which is created using a java program to server some specific need not covered under the existing functions in Hive. It can detect the type of input argument programmatically and provide appropriate response.

The local inpath should contain a file and not a directory. The $env:HOME is a valid variable available in the hive environment.

The TBLPROPERTIES clause is used to add the creator name while creating a table.

The TBLPROPERTIES is added like −

TBLPROPERTIES(‘creator’= ‘Joan’)

What is Next ?

Further you can go through your past assignments you have done with the subject and make sure you are able to speak confidently on them. If you are fresher then interviewer does not expect you will answer very complex questions, rather you have to make your basics concepts very strong.

Second it really doesn't matter much if you could not answer few questions but it matters that whatever you answered, you must have answered with confidence. So just feel confident during your interview. We at tutorialspoint wish you best luck to have a good interviewer and all the very best for your future endeavor. Cheers :-)

hive_questions_answers.htm
Advertisements