Problem Scenario 36 : You have been given a file named spark8/data.csv (type,name).
data.csv
1,Lokesh
2,Bhupesh
2,Amit
2,Ratan
2,Dinesh
1,Pavan
1,Tejas
2,Sheela
1,Kumar
1,Venkat
1. Load this file from hdfs and save it back as (id, (all names of same type)) in results directory. However, make sure while saving it should be
Problem Scenario 87 : You have been given below three files
product.csv (Create this file in hdfs)
productID,productCode,name,quantity,price,supplierid
1001,PEN,Pen Red,5000,1.23,501
1002,PEN,Pen Blue,8000,1.25,501
1003,PEN,Pen Black,2000,1.25,501
1004,PEC,Pencil 2B,10000,0.48,502
1005,PEC,Pencil 2H,8000,0.49,502
1006,PEC,Pencil HB,0,9999.99,502
2001,PEC,Pencil 3B,500,0.52,501
2002,PEC,Pencil 4B,200,0.62,501
2003,PEC,Pencil 5B,100,0.73,501
2004,PEC,Pencil 6B,500,0.47,502
supplier.csv
supplierid,name,phone
501,ABC Traders,88881111
502,XYZ Company,88882222
503,QQ Corp,88883333
products_suppliers.csv
productID,supplierID
2001,501
2002,501
2003,501
2004,502
2001,503
Now accomplish all the queries given in solution.
Select product, its price , its supplier name where product price is less than 0.6 using SparkSQL
Problem Scenario 65 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "cat", "owl", "gnu", "ant"), 2)
val b = sc.parallelize(1 to a.count.tolnt, 2)
val c = a.zip(b)
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(String, Int)] = Array((owl,3), (gnu,4), (dog,1), (cat,2>, (ant,5))
Problem Scenario 82 : You have been given table in Hive with following structure (Which you have created in previous exercise).
productid int code string name string quantity int price float
Using SparkSQL accomplish following activities.
1. Select all the products name and quantity having quantity <= 2000
2. Select name and price of the product having code as 'PEN'
3. Select all the products, which name starts with PENCIL
4. Select all products which "name" begins with 'P\ followed by any two characters, followed by space, followed by zero or more characters
Problem Scenario 37 : ABCTECH.com has done survey on their Exam Products feedback using a web based form. With the following free text field as input in web ui.
Name: String
Subscription Date: String
Rating : String
And servey data has been saved in a file called spark9/feedback.txt
Christopher|Jan 11, 2015|5
Kapil|11 Jan, 2015|5
Thomas|6/17/2014|5
John|22-08-2013|5
Mithun|2013|5
Jitendra||5
Write a spark program using regular expression which will filter all the valid dates and save in two separate file (good record and bad record)
Problem Scenario 45 : You have been given 2 files , with the content as given Below
(spark12/technology.txt)
(spark12/salary.txt)
(spark12/technology.txt)
first,last,technology
Amit,Jain,java
Lokesh,kumar,unix
Mithun,kale,spark
Rajni,vekat,hadoop
Rahul,Yadav,scala
(spark12/salary.txt)
first,last,salary
Amit,Jain,100000
Lokesh,kumar,95000
Mithun,kale,150000
Rajni,vekat,154000
Rahul,Yadav,120000
Write a Spark program, which will join the data based on first and last name and save the joined results in following format, first Last.technology.salary
Problem Scenario 42 : You have been given a file (sparklO/sales.txt), with the content as given in below.
spark10/sales.txt
Department,Designation,costToCompany,State
Sales,Trainee,12000,UP
Sales,Lead,32000,AP
Sales,Lead,32000,LA
Sales,Lead,32000,TN
Sales,Lead,32000,AP
Sales,Lead,32000,TN
Sales,Lead,32000,LA
Sales,Lead,32000,LA
Marketing,Associate,18000,TN
Marketing,Associate,18000,TN
HR,Manager,58000,TN
And want to produce the output as a csv with group by Department,Designation,State with additional columns with sum(costToCompany) and TotalEmployeeCountt
Should get result like
Dept,Desg,state,empCount,totalCost
Sales,Lead,AP,2,64000
Sales.Lead.LA.3.96000
Sales,Lead,TN,2,64000
Problem Scenario GG : You have been given below code snippet.
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "spider", "eagle"), 2)
val b = a.keyBy(_.length)
val c = sc.parallelize(List("ant", "falcon", "squid"), 2)
val d = c.keyBy(.length)
operation 1
Write a correct code snippet for operationl which will produce desired output, shown below. Array[(lnt, String)] = Array((4,lion))
PDF + Testing Engine |
---|
$59.6 |
Testing Engine |
---|
$51.6 |
PDF (Q&A) |
---|
$39.6 |
Cloudera Free Exams |
---|
|