Skip to main content

Lab – More Hive Queries and Hive Built-In functions

Objectives

This lab builds on the last lab. We will analyse the data uploaded to Hive using some HQL queries

Deliverable

Screenshot of each query results. One for each query. Each query should be fully displayed. No need to show the entire query result. Partial display of the result is sufficient.
  1. Open the Hive query editor in Cloudera
  2. Select all column from the salesdata table
  3. Select orderdate, salesamount and rowid from salesdata table and show 100 records.
  4. Modify the above step to show orderdate column heading as OrderDate, salesamount as Sales, and rowed as RowNum
  5. Create a new hive query to find OrderMonth, the total number of orders, total sales, average sales, minimum sales and maximum sales for each month. Sort so that newer sales statistics are at the top.  
  6. Modify the query you created for question 5 to show breakdown for each category
  7. Use GROUPING SETS to modify the above query to show the statistics for each category and for each ordermonth.
  8. Create a new query to extract ordered, orderdate, quantity, rate, discountpct, QuoteAmt and QuoteAmtRound.
    1. QuoteAmt = quantity*rate*(1-discountpct)
    2. QuoteAmtRound = quantity*rate*(1-discountpct)
Select orderid, orderdate, quantity, rate, discountpct,
   quantity*rate*(1-discountpct) as QuoteAmt,
   round(quantity*rate*(1-discountpct)) as QuoteAmtRound
from salesdata


  1. Use rand, floor and ceiling function to simulate new data. Create a query that has a random number, salesamount, ordered, wagemargin, Random sales amount, wage margin floor and wage margin ceiling column.
select rand(), saleamount, orderid, wagemargin,
   round(saleamount*rand()) as RandSaleAmount,
   floor(wagemargin) as WageMarginFlr,
   ceiling(wagemargin) as WageMarginCl
from salesdata


  1. Using the datediff function find the difference between two dates.  Create a query that has a ordered, productcategory, productsubcategory, orderdate, projectcompletedate, and duration.
Select orderid, productcategory, productsubcategory,
   orderdate, projectcompletedate,
   datediff(projectcompletedate, orderdate) as duration
from salesData


  1. Generate some statistics by month using year, month, and avg functions. Create a query that has productcategory,  productsubcategory, yearofOrder, monthofOrder, and Average duration.
select
   productcategory, productsubcategory,
   year(orderdate) year, month(orderdate) month,
   avg(datediff(projectcompletedate, orderdate)) Avgduration
from salesdata
group by    
   productcategory,productsubcategory,
   year(orderdate), month(orderdate)
order by
   3,4
  1. Find the last day of the month using data_sub(), to_date(), cast(), and concat() functions.
select distinct
date_sub(to_date(
concat(cast(year(orderdate) as string),"-",cast(month(orderdate)+1 as string),"-01")
)
,1 ),
orderdate
from salesdata
limit 100;


  1. Identify large sales with IF function.  
select orderid, saleamount,
if(saleamount > 5000, 1, 0) as LargeSale
from salesdata
limit 1000;


  1. Create sales size categories  
select orderid, saleamount,
   case
       when saleamount > 5000 then 'large'
       when saleamount > 1000 then 'medium'
       else 'small'
   end as SalesSize
from salesdata
limit 1000;


  1. Perform analysis by reassigning regions. Find yearly total sales for regions - southwest(south and west), east,  and central.
select  
case
    when lower(region) = 'west' then 'southwest'
    when lower(region) = 'south' then 'southwest'
    else region
end as newregion,
year(orderdate) as y,
sum(saleamount) as TotalSales
from salesdata
group by
case
    when lower(region) = 'west' then 'southwest'
    when lower(region) = 'south' then 'southwest'
    else region
end,
year(orderdate)
limit 100;


Comments

Popular posts from this blog

A Comprehensive Evaluation of the Internal Consulting Process: Steps and Considerations

Introduction Internal consulting has emerged as a critical function within organizations, offering in-house expertise to solve complex business problems and drive change. It closely mirrors external consulting in methodology but is differentiated by the consultant's intimate knowledge of the organization and a vested interest in its long-term success. This article aims to evaluate the key steps involved in the internal consulting process, offering insights into each phase's significance and challenges. Steps in the Internal Consulting Process The internal consulting process can generally be segmented into five distinct stages: Initial Assessment, Data Collection and Analysis, Solution Development, Implementation, and Evaluation. Below is an evaluation of each step: Step 1: Initial Assessment Objective: To understand the problem or opportunity area and define the scope of the project. Significance: A well-defined scope ensures that the consulting project stays focused and manage...

The Evolving Landscape of Consulting Practice: Changes and Implications

Introduction Consulting is a field that thrives on its ability to adapt to market demands and emerging trends. As businesses evolve due to technological advancements, shifts in consumer behavior, and fluctuations in global markets, consulting practices must keep pace. This article explores some of the significant changes currently transforming the consulting industry and discusses their implications for both consultants and clients. Technological Disruption Data Analytics and Artificial Intelligence Consulting firms are increasingly integrating data analytics and artificial intelligence into their service offerings. These technologies allow consultants to offer data-driven insights that can significantly enhance strategic decision-making. This evolution means consultants now need skills in data interpretation and analysis, alongside their traditional expertise in business strategy. Virtual Consulting Platforms The advent of digital platforms enables consulting services to be offered re...

The Imperative of Transition Structure in Implementing Change and A Model for Effective Transition

Introduction Organizational change is an inevitable phenomenon in the dynamic business landscape of today. While the conception of change is significant, its successful implementation is even more crucial. One key factor that often determines the success of implementing change is the presence of a well-designed transition structure. This article aims to discuss the necessity of having a transition structure in place and proposes a model to effectively guide the transition during organizational change. The Need for a Transition Structure Aligning Stakeholders Any significant change involves a variety of stakeholders, from senior management to front-line employees. A transition structure ensures that all parties are aligned, understand their roles, and are committed to the objectives of the change. Mitigating Risks Change often comes with risks, such as resistance from employees, potential loss in productivity, or lapses in quality. A structured approach can help mitigate these risks by ...