Sunday, January 26, 2020

Concepts And Technology Of Data Etl Computer Science Essay

Concepts And Technology Of Data Etl Computer Science Essay Extraction-Transformation-Loading (ETL) is the process of moving data flow various sources into a data warehouse. In this research we will analyze the concept of ETL and illustrating using example of Microsoft SSIS (SQL Server Integration Services) as the basis of the research. Explanation on specific steps will be show in the research such as (a) Extracting Data From one or more external data source. (b) Transforming Data Ensure consistency and satisfy business requirements and (c) Loading Data To the resultant Data Warehouse. In depth analysis on Microsoft SSIS tools which supporting ETL process are including in the research for instance: (a) Data Flow Engine, (b) Scripting Environment and (c) Data Profiler. Key Words: ETL process, Microsoft SQL Server Integration, SSIS. 1. Introduction ETL is the most important process in a Business Intelligent (BI) project [1]. When international companies such as Toyota want to reallocate resources, the resources must be reallocated wisely. Consolidate data to useful information from multi regions such as Japan, US, UK and etc is difficult in many reasons including overlapping and inconsistency relationship among the region company. For example, the method of storing a name is different between the companies, in Japan its store as T.Yoon Wah, in US: Yoon Wah Thoo and UK is storing as YW.Thoo. When data is being combining to generate useful information, this may lead to inconsistent of data. In order to solve the problem, we need to use star schema/snowflake schema data warehouse takes the data from many transactional system, and copy the data into a common format with the completely different relational database design than a transactional system containing many star schema configuration. [7]. Performing the task associated with moving, correcting and transforming the data from transaction system to star schema data warehouse, it is called Extraction, Transformation and Loading (ETL). ETL allows migrating data from relational database into data warehouse and enable to convert the various format and types to one consistent system. It is a common use for data warehousing, where regular updates from one or more systems are merged and refined so that analysis can be done using more specialized tools. Typically the same process is run over and over, as new data appears in the source application [2]. The ETL process consists of the following steps: [3] 1. Import data from various data sources into the staging area. 2. Cleanse data from inconsistencies (could be either automated or manual effort). 3.  Ensure that row counts of imported data in the staging area match the counts in the original data source.  4. Load data from the staging area into the dimensional model. 2. In-depth research on ETL In Fig. 1, we abstractly describe the general framework for ETL processes. In the bottom layer we depict the data stores that are involved in the overall process. On the left side, we can observe the original data providers (typically, relational databases and files). The data from these sources are extracted (as shown in the upper left part of Fig. 1) by extraction routines, which provide either complete snapshots or differentials of the data sources. Then, these data are propagated to the Data Staging Area (DSA) where they are transformed and cleaned before being loaded to the data warehouse. The data warehouse is depicted in the right part of Fig. 1 and comprises the target data stores, i.e., fact tables and dimension tables. [4] 2.1 Extraction The extraction part will gathering the data from several resources and do analysis and cleaning data. Analyzing part will be getting raw data that was written directly into the disk, data written to float file or relational tables from structured system. Data can be read multiple times if needed in order to achieve consistency. Cleansing data will be done in extraction part either. The process will be eliminating duplicate or fragmented data and excluding the unwanted or unneeded information. The next step will move forward to transformation part. In Microsoft SSIS, we could use the tools in the Data Flow control which is called Integration Service Source in order to retrieve sources from several formats with connection manager. The source format is various such as OLE DB, Flat file, ADO NET source, Raw Files source and etc [11]. 2.2 Transformation The Transformation step might be the most complex part in the ETL process because it might be consist of much data processing during this step. The transformation part is to prepare the data to be store in the data warehouse. Converting the data such as changing data types and length, combining data, verification and standardize the data will be done in transformation part. Using SSIS, it provides plenty of transformation tools to help developer to achieve their target. There are categorized Transformation in SSIS to allow designer developing their project: Business Intelligence, Row Transformation, Row set, Split and Join Transformation, Auditing Transformation, and Custom Transformation. For instance which commonly use in ETL process are : Data Conversion Transformation Converts the data type of a column to a different data type , Conditional Split Transformation routes data rows to different outputs. More Transformation example can be found in SQL MSDN at [10]. 2.3 Loading The Loading step is the final step of the ETL process; it uses to store generated data into the data warehouse. The loading step can follow the star schema [5] or snowflake schema [6] in order to achieve data consolidation [7]. Implementing in SSIS will be using Integration Service Destination its similar with the Integration Service Source, using connection manager to choose one or more data destination to load the output. [12] 3. Microsoft SQL Server Integration Services ETL tools are created for developer to plan, configure and handle ETL process. With tools that develop by Microsoft, developer has now has the ability to more easily automate the importing and transformation data from many system across the state. The Microsoft SQL Server 2005 which assist to automate the ETL process, its call SQL Server Integration Service (SSIS). This tool is design to deal with common issues with ETL process. We will build up the research paper from ground-up base on studying the ELT tools that build by Microsoft which is SSIS. 3.1 SSIS Architecture In fig 2 shows the overview of the SSIS architecture. SSIS is a component of SQL Server 2005/2008, it able to design ETL process from scratch to automate the process with many supportive tools such as database engine, Reporting services, Analysis services and etc. SISS has segregated the Data Flow Engine from the Control Flow Engine or SSIS Runtime Engine, designed to achieve a high degree of parallelism and improve the overall performance. Figure 2: Overview of SSIS architecture. The SSIS will be consisting of two main components as listed down below: SSIS Runtime Engine The SSIS runtime engine manage the overall control flow of a package. It contains the layout of packages, runs packages and provides support for breakpoints, logging, configuration, connections and transactions. The run-time engine is a parallel control flow engine that locates the execution of tasks or units of work within SSIS and manages the engine threads that carry out those tasks. The SSIS runtime engine will performs the tasks inside a package in a traditional method. When the runtime engine meets a data flow task in a package during execution it will creates a data flow pipeline and lets that data flow task run in the pipeline. [9] SSIS Data Flow Engine SSIS Data Flow Engine handles the flow of data from data sources, thru transformations, and destination. When the Data Flow task executes, the SSIS data flow engine extracts data from data sources, runs any necessary transformations on the extracted data and then generate the data to one or more destinations. The architecture of Data flow engine is buffer oriented, Data flow engine pulls data from the source and stores it in a memory and does the transformation in buffer itself rather than  processing on a row-by-row basis. The benefit of this in-buffer processing is that processing is much quicker as there is not necessary to copy the data physically at every step of the data integration; the data flow engine processes data as it is transferred from source to destination. [9] We enable to do ETL practical in the Data Flow Task which can be found in the fig 2. Extract data from several sources, transform and manipulate the data, and load it into one or more destination. 3.1.1 Data Flow Engine Regarding the SSIS Data Flow Engine mentioned previously, here to discuss about how it is related with the process ETL with Data Flow Elements. SSIS consisting three different types of data flow components: sources, transformations, and destinations. Sources extract data from data stores such as relational tables and views in files, relational databases, and Analysis Services databases as the Extraction in ETL process. Transformations modify, summarize, and clean data. Destinations load data into data stores or create in-memory datasets as the Loading process in ETL. Plus, SSIS provides paths that connect the output of one component to the input of another component. Paths will definite the sequence of components, and allow user add labels to the data flow or view the source of the column. Figure 3: Data Flow Elements Figure 3 shows a data flow that has a source, a transformation with one input and one output, and a destination. The diagram includes the inputs, outputs, and error outputs in addition to the input, output, and external columns. Sources, in SSIS a source are the data flow component that generates data from several different external data sources. In a data flow, source normally has one. The regular output has output columns, which are columns the source adds to the data flow. An error output for a source has the same columns as the regular output, contains two extra columns that provide information about errors either. SSIS object model does not limit the number of normal outputs and error outputs that sources can contain. Most of the sources that SSIS includes, except the Script component, consisting one regular output, and many of the sources have one error output. Custom sources can be coded to implement multiple regular outputs and error outputs. All the output columns are available as input columns to the next data flow component in the data flow. Transformations, the possibility of transformations are infinite and vary wide. Transformations can execute tasks such as updating, summarizing, cleaning, merging, and distributing data. In and outputs of a transformation define the columns of incoming and outgoing data. Depends the operation runs on the data, some transformations have individual input and several outputs, while other transformations have several inputs and a output. Transformations can include error outputs either, which give data about the error that occurred, combine with the data that failed: for instance, string data that could not be converted to a date data type. Below are showing some built-in transformations: Derived Column Transformation creates new column values by applying expressions to transformation input columns. The output can be inserted into an existing column as a replacement value or added as a new column. Lookup Transformation execute lookups by joining data in input columns with columns in a reference dataset. Typically used in a case when working with a subset of master data set and seeking related transaction records. Union All Transformation aggregates multiple inputs and gives UNION ALL to the multiple result-sets. Merge Transformation aggregates two sorted datasets into an individual sorted dataset; is similar to the Union All transformations. Use the Union All transformation instead of the Merge transformation in case if the inputs are not sorted, the result does not need to be sorted or the transformation has more than two inputs. Merge Join Transformation supply an output that is created by joining two sorted datasets using either a FULL, LEFT, or INNER joins. Conditional Split Transformation route data rows to different outputs depending on the content of the data. The implementation of the Conditional Split transformation is similar to a IF-ELSE decision structure in a programming language. The transformation understanding expressions, and based on the results, directs the data row to the specified output. It has a default output, so if a row matches no expression it is directed to the default output. Multicast Transformation distributes its input to one or more outputs. This transformation is similar to the Conditional Split transformation. Both transformations direct an input to multiple outputs. The difference is that the Multicast transformation directs every row to every output, and the Conditional Split directs a row to a single output.[18] Destinations, a destination is the data flow component that writes the data from a data flow to a specific data store, or creates an in-memory dataset. SSIS destination must at least have one input. The input contains input columns, which come from another data flow component. The input columns will be map to columns in the destination. [17] 31.1.1 Example of Data Flow Task Here to presenting the example to create a simple data flow task a.k.a. ETL process. First thing, drag the Data Flow task from the toolbox into Control Flow. 3.1.2 Scripting Environment If all the build-in tasks and transformation doesnt meets the developer needs, SSIS Script task/Script Component to code the functions that developer desire to perform. By clicking the Design Scriptà ¢Ã¢â€š ¬Ã‚ ¦ button in the Script Task Editor, it is able to open a Visual Studio for Application to code the function. [19] That is improvement in scripting environment between SSIS 2005 and 2008. In SSIS 2005, you can find double click on Script Task and Script Task Editor will be appears. The Script language of SSIS 2005 is only for Microsoft Visual Basic .Net but in SSIS 2008, it is able to choose C# or VB.net. Figure: Visual Studio for Application (VSA)   Script task usually used for the following purposes: Achieve desire task by using other technologies that are not supported by built-in connection types. Generate a task-specific performance counter. For instance, a script can create a performance counter that is updated while a complex or poorly performing task runs. Point out whether specified files are empty or how many rows they contain, and then based on that information affect the control flow in a package. For example, if a file contains zero rows, the value of a variable set to 0, and a precedence constraint that evaluates the value prevents a File System task from copying the file. [20] 3.1.3 Data Profiler. The purpose of data profiling is to approach defining data quality.  A data profile is a collection of combination statistics about data that may consist the value of rows in the Customer table, the number of distinct values in the Street column, the number of null or missing values in the Name column, the distribution of values in the Country column, the strength of the functional dependency of the Street column on the Name column-that is, the Street should always be the same for a given name value etc. [16] SQL Server 2008 SSIS introduces the Data Profiling task in its toolbox, providing data profiling functionality inside the process of extracting, transforming, and loading data. By using the Data Profiling task, analysis of source data can be perform more efficiently, better understanding of source data and avoid data quality problems before load into the data warehouse. Outcome of this analysis generate XML reports that can be saved to an SSIS variable or a file that can be examine using the Data Profile Viewer tool. Data quality assessments can be performed on an ad hoc basis, the  data quality  process can also be automated by integrating quality assessment into the ETL process itself. [13] 3.1.3.1 Example of Data Profiling Task Using Adventure Works Database: After drag the Data Profiling Task into the Control Flow, double click it to enter properties window to do configuration. The Data profiling Task required connection manager in order to works. In properties menu, user chooses destination type in file destination or variable. Faster way to build profile using quick profile feature: Figure 4: Single Task Quick Profile Form The Data Profiling Task can compute eight different data profiles. Five of these profiles analyze individual columns, and the remaining three analyze multiple columns or relationships between columns and tables; for more details about each profile refer to MSDN.  [16] Few examples are made to explain further about Data Profiling: Figure 5: Editing the Data Profiling Task After done mapping the destination and other properties, run the package. Figure 6: Data Profiling Task Successfully Executed The Task successfully executed (Green), now need to use data profiler viewer to view the result. Data Profile Viewer is stand-alone tool which is used to view and analyze the result of profiling. It uses multiple panes to display the profiles requested and the computed results, with optional details and drilldown capability. [16] Column Value Distribution Profile: Used to obtain number of distinct value of a table. Figure 7: Result of Column Value Distribution Profile. Column Null Ratio Profile: Obtain the null column of the table. Figure 8: Result of Column Null Ratio Profile. Column Statistic Profile: Obtain the Min, Max, Means and Deviation of a table. Figure 9: Result of Column Statistic Profile. Column Pattern Profile: Obtain the pattern value of the column. Figure 10: Result of Column Pattern Profile. 3.3

Saturday, January 18, 2020

Critical Analysis of Frustration Essay

The concept of this story is that war can never be justified through self-righteousness. That is, if a person is trying to start a war because he thinks himself to be better than his target, there is no just reason for him to do it. Most, if not all, wars are primarily begun by one man believing himself to be better than is target. This is proven through the history of Rome and other nations. But what makes war justified? Is there ever a reason to destroy another’s life? The answer is no. Starting a war can never be justified. This story proves that. In â€Å"Frustration† a man is trying to find proof that a justified war is indeed possible. His advanced technology allows him to run through computer simulations which, in theory, would find a series of events that would make this man’s war, right. The main issue that is brought up is that a computer, in itself, lacks self-righteousness, meaning that it places a much higher value on human lives than the average human does. But how does this make any difference? Humans naturally believe that they are in some way better than each other. If a man thinks that he is better than another man than what is to stop him from attempting to subdue and enslave the lesser man? It is this self-righteousness, unchanged by generations, that causes all wars. All through that ages humanity has stayed the same in their behaviors and characteristics. This story implies this even at the very beginning of it. â€Å"The whole meal has been prepared by computer. Untouched by human hands†. With these words we know that this story takes place in the future. Herman Gelb, the main protagonist in the story, is at a meeting ith the computer programmer Peter Jonsbeck. Immediately the conversation is turned upon the activities of old man Hargrove, the Secretary of Foreign Affairs. Herman is intrigued by the fact that Hargrove is intent on finding a war that is justifiable and efficient. At questioning this, Herman get the response from Peter, â€Å"He wants the world to be the way we are-noble, honest, decent, full of respect for human rights and so on†. Now we know that it is the self-righteousness of Hargrove that is leading him to perform these computer simulations. It shows that, in this author’s view, humanity is likely to stay unchanged in the future; ignorant from their distaste of the other man, self-righteousness driving them to believe that they are the best in the world. And so our characteristic of self-righteousness never changes. Our personal beliefs can never justify war. Self-righteousness makes us blind to the true needs of our fellow man, and in the end brings dissent among people, and war. Peter makes the quote, â€Å"And they’re keeping the pressure on us, too. They don’t think we’re perfect†, this shows that self-righteousness never travels one way, that is, one person may think he is superior to someone else but that someone else will not think that person superior to him. In the story the three men all believe that they need to conquer the world in order to make it like them. They use this to justify their reason for war. When Peter says, â€Å"Hargrove thinks it is possible to find some combination of starting conditions and courses of development that will result in clear victory for us and not too much damage to the world, and he labors in constant frustration. , we know that Hargrove is so consumed by his self-righteousness that he labors furiously in order to find some kind of equation that will make his war just. This relates to human nature in a way that is explicitly obvious. We do not want to be alone in the world and therefore try to make others to be like us. Sadly in this process we begin to believe that we are better than our fellow man. This is not so. â€Å"After all†, says Peter, â€Å"It may be that even the losing nations would benefit from being directed by us, with our stronger economy and stronger moral sense. These men in the story are blind. In believing that they have a stronger moral sense, they have proven their humanity. What kind of people, with a strong moral sense, would attempt to take over nations, in order for those nations to be like them. As a people we desire not to be alone but to be supported. However, through this desire, we get blinded from our true motives. Attempting to assimilate a people is not moral or just. It is simply, wrong. As a race we tend to put a much higher value on ourselves and our desires than the needs of our neighbours. Computers have no sense of self-righteousness, which is one of the key components of humanity’s character. Herman mentions to Peter that there would be casualties. In reply Peter says, â€Å"Yes, of course. But the computer will presumably compare the casualties and other damage-to the economy and ecology, for instance-to the benefits that would derive from our control of the world, and if it decides the benefits outweigh the casualties, then it will give the go-ahead for a ‘just war’†. The computers in the story could not find a good enough excuse for a â€Å"just war† because they had no sense of self-righteousness. They did not believe that they â€Å"needed to be in control of the world and subdue other nations. In response to an accusation from Gelb that Peter is in the computer programming business for the money, Peter defends himself saying, â€Å"There won’t be a war. There’s no realistic combination of events that would make the computer decide on war†. The computers put a much higher value on life than Hargrove and the others. Hargrove believed that the other nations needed to be controlled by them in order that they be â€Å"proper† nations. The computers could not justify war by the human’s need to be in control. Gelb is curious as to why there will not be a war so Peter tells him, â€Å"I don’t know any way of programming a computer to give what is most needed in any war, any persecution, any devilry, while ignoring any harm that may be done in the process. † Sadly this is true. Any war, in order to be begun, needs some sort of devilry in order for it to get a kickoff. Computers have no sense of devilry because they have no self-righteousness, and, therefore, they can find a reason to start a war for just reasons. Those reasons simply do not exist. Human nature never changes. Wars are wrought. Lives are lost. These wars can never be justified by man’s need to be superior to his neighbour. Ever since the dawn of mankind, wars have been fought because one man, or people, believed they were the superior. This characteristic had never changed through our history, and it never will. God gave us the command to love our neighbour as ourselves. However, as this story dictates, people often love themselves way more than their fellow man. Destruction and chaos have been driven by humanities self-righteousness. But as the story dictates, self-righteousness is never focused one way. People will not freely believe that someone else is better than them. We want others to be like us because we do not want to be alone in the world. Unfortunately through this process, we begin to believe that we are better than other people. This desire for attention leads to some of the world’s greatest tragedies and depressions. Computers lack what war needs in order to be begun. They lack any sense of devilry and self-righteousness. They place a higher value on life than the average man. Never will they justify war for the sole reason to assimilate another people or nation. No set of equations or circumstances can lead a computer to giving the â€Å"OK† for a just war. It just will not happen.

Friday, January 10, 2020

The Ultimate Personal Statement for a Scholarship Trick

The Ultimate Personal Statement for a Scholarship Trick The Secret to Personal Statement for a Scholarship To engage the reader, you may also begin the essay with an anecdote, giving a little story associated with the elements that you plan to highlight in the statement. An essay outline will allow you to organize the total structure of your essay. An extensive personal statement is whereby you're not given specific instructions about what to cover in the essay. There are many essay designs as it's important to comprehend precisely what's feasible for you. An excellent essay is one which leaves an enduring impression. Individuals who don't know you well ought to be able to read your own personal statement and understand the important points you would like to convey. For instance, when speaking about your A Levels, say a bit about what you've learnt from every subject and the way it will assist you in your degree. An excellent example may be, I regularly opt to make the most of my though I doubt it can become much better. Type of Personal Statement for a Scholarship If you're not certain, don't compose any irrelevant thing that may cost you an opportunity. There are a really good deal of themes where you are able to use your writing abilities that are reflective. There are several kinds of the exact same available and you ought to pick out the absolute best one for yourself. In case there are specific experiences or things that you don't feel comfortable sharing, it's a good idea to just avoid them. It's simple to point out an essay that's been written solely for the interest of it. The introduction is where you want to grab the interest of the user and allow it to be attractive as much as possible. You should use your subject to reveal something deeper about yourself. Possessing boring business speak or filling the entire personal essay with clutter is going to be termed as a frequent kind of private statement and wind up sounding like everybody else out there. The Lost Secret of Personal Statement for a Scholarship It's similar in regards to a profession. There are a lot of reasons why you must prefer our services. Being part of a great educational institution can be very costly. If you know what things to search for in an award, you will be less inclined to find yourself paying to either enter to win an award or maybe to use a scholarship search site. The Secret to Personal Statement for a Scholarship Highlight what you've learned and how you're applying the learni ng. In any case, mastering the art may also significantly improve your writing skills in a variety of locations. Even if you believe a subject isn't relevant, it will still be teaching you tonnes of new skills that are bound to come in handy. If, on the flip side, you simply touch on a wide selection of topics, you are going to end up with an essay that reads more like a resume. For instance, the author addresses the manner that American history classes don't usually address about the Vietnam War, despite the fact that it happened just a limited time ago. At some time, you will possibly will need to rewrite the entire essay. The essays might be the absolute most important components of your application. In reality, a scholarship will not just give you the chance of career development but also higher odds of employment. Returning students who have to submit personal statement scholarship examples are going to be under a lot of pressure since they will be competing with hundreds of different students. When you have written a personal statement for a single scholarship, you will most likely have the ability to adapt it for others. A well-tailored statement of purpose scholarship will provide you the edge above the rest of the applicants.

Wednesday, January 1, 2020

The Miller s The Canterbury Tales - 1624 Words

â€Å"The Miller’s Tale,† part of Geoffrey Chaucer’s larger work, â€Å"The Canterbury Tales,† is a bawdy and irreverent story about lust, deception, and consequences. Chaucer’s work centers around four main characters: John is a dimwitted carpenter, Alison is John’s young and wife, Nicholas is a scholar who resides in John’s household, and Absolon is a priest’s assistant with a romantic fixation on Alison. Throughout the tale, deceptive plots and questionable decisions abound, and no one is completely innocent of wrongdoing. Of course, like with most any other story involving lies and selfish pursuits, every character gets an informal education on natural consequences. It is worth noting that Nicholas is the only character in the story with any†¦show more content†¦John is a controlling and foolish man who considers his wife as much his possession as any other material thing that his wealth has afforded him, and for th is he learns an emotionally and physically painful lesson about just how little control he really has. Immediately after describing John, Chaucer turns to the carpenter’s attitude about his young wife, saying, â€Å"Jalous he was, and heeld hire narwe in a cage, / For she was wilde and yong, and he was old, / And deemed himself been lik a cokewold† (Chaucer, 116-118). One cannot help but raise a proverbial eyebrow at John’s denial of Alison’s agency. In Paul A. Olsen’s â€Å"Poetic Justice in The Miller’s Tale,† the author asserts that, â€Å"The possessive and stupid, the Carpenter Johns, deserve to lose what they have even as they are eyeing a bigger take; they deserve to be set down for mad even as they think they are getting a corner on God’s secrets† (para. 9). Indeed, John suffers these exact fates. Having been convinced by Nicholas that God would send down a great flood, John hangs three tubs from his ceiling and falls asleep in one, secure in the belief that this plan will save himself, Alison, and Nicholas from God’s wrath. The next morning, when John awakens in confusion and cuts down his tub, he crashes to the floor and suffers physical injury. Later, he sustains emotional injury through being labeled a