Here is the code for the second model monthly_revenue.sql where we will be preserving the results in a table in specified s3 location. Here is the code for the first model order_details_exploded.sql where we will be preserving the logic for exploded order details in the form of a view. Update Project File (change project name and also make required changes related to the models).Develop Required DBT Models with core logic.Run Example Models and confirm if project is setup successfully.Here are the steps that are involved to complete the development process. We’ll break the overall logic to compute monthly revenue into 2 dependent DBT Models. Let us go ahead and setup the project to develop required DBT Models to compute monthly revenue. ORDER BY 1 Develop the DBT Models using Spark on AWS EMR WHERE order_status IN ('COMPLETE', 'CLOSED') Round(sum(order_item.order_item_subtotal), 2) AS revenue ) SELECT date_format(order_date, 'yyyy-MM') AS order_month, SELECT order_id, order_date, order_customer_id, order_status, Here is the final query which have the core logic to compute monthly revenue considering COMPLETE or CLOSED orders. SELECT order_id, order_date, order_customer_id, order_status,Įxplode_outer(from_json(order_items, 'array>')) AS order_item We can convert to Spark Metastore Array using from_json as below. The column order_items is of type string which have JSON Array stored in it. Spark SQL have the feature of providing the path of files using SELECT Query. Here are the queries to process the semi-structured JSON Data using Spark SQL. Develop required queries using Spark SQL on AWS EMR However, we need to make sure to specify the schema as second argument while invoking from_json on top of order_items column in our data set. We can covert string which contain JSON Array to Spark Metastore array using from_json function of Spark SQL.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |