Tuesday, October 13, 2009

Abstract and Reading for Today's Topic: The ETL process in Data Warehousing

Abstract of Presentation:

Data warehousing is one approach for data integration. Today, Our's topic is mainly focusing on ETL process for data warehousing.We would start with introducing data integrations approaches. One way to another, data warehousing is one of the most commonly used data storage methods used by data integration system designers. Advantaged and Drawbacks of data warehouse will all be covered in our presentation.

ETL is the process used in data warehousing. What ETL does, the issues within ETL and what's the general solutions to the problems would lead us to a practical world for data warehousing. SSIS is a platform for building enterprise-level data integration and data transformation solutions. Its features and development will also be discussed in our presentation.

Last but not least, most interesting part is from Mike's demostration! Please stay tuned and enjoy.

Reading materials suggestions:
1. Take a look at this website: http://www.datawarehouse4u.info/
It shows how ETL work in data warehouse, which could give you a intuitive view of data integration process within data warehouse.
2. http://communication.howstuffworks.com/data-integration.htm
This would be your startpoint in understanding data warehouse from data integration perspective. How data warehouse is better than other approached? Check it out.
3. Data Cleaning: Problems and Current Approaches. This article is for data cleaning and ETL, which is suggested by Dr, Zheng. It's a super important paper, highly related to our topic and presentation materials.
4. Another one suggested to read before our presentation would be:High impact Data Warehousing with SQL Server Integration Services and Analysis Services.(take a look at first few slides).

There are many other materials we've read for preparing our presentation. We included them in our presentation slices(the last slide as bibliography). Please check them out for further your further study.
(thanks bing, minor edits - mtm)

8 comments:

  1. One question that was asked during your presentation was for alternatives with data warehouses and a technology that I have used in the past that comes to mind when talking about data integration is Business Objects - Crystal Reports. With Crystal Reports you can actually add multiple data sources to feed information into the report. These data sources can range from ODBC connections (to SQL Databases, Oracle Databases, etc.) to an access database or an excel spreadsheet. The GUI (wizards) within Crystal Reports allows for mapping and you can also select certain types of schemas within Crystal Reports. One thing that is different about a data warehouse and Crystal Reports is the flow of information. Datawarehouse usually receive lots of information from different databases by having each one of those data sources push the information to the datawarehouse. With Crystal Reports it is more of a pull flow in which Crystal Reports actually connects out to different data sources and aggregates the data. Overall Crystal Reports is a powerful reporting tool that allows the aggregation of data from different data sources to present a clear and concise report on information that can allow businesses to make decisions.

    We have implemented Crystal Reports at my job and use it to gather information from several different SQL databases to present different types of reports.

    ReplyDelete
  2. There are quite a few options out there when we think of Data Warehousing. But having worked with SSIS, I feel that it is one of the best and most efficient methods available.
    The number and kinds of Transformations that are possible using this package is amazing. Each Character of each Column can be transformed as per the required business logic.
    I used SSIS for data migration and transformation of millions of records at a time and not once did it fail.

    ReplyDelete
  3. MO BLOGGER, I am a big Crystal Reports fan, having used it for several years. It is a great reporting tool for pulling in data from several databases.

    SSIS and Crystal Reports aren't really the same sort of product. Crystal Reports produces very slick reports from different data sources, but doesn't actually move and copy data over. SSIS can copy data over to a central data source and is much more powerful in actually doing things with the data.

    In short, Crystal Reports is best for reporting, SSIS is best for actually doing something with the data. I think a company would actually need to use both of these tools (or something similar) if both data warehousing and data reporting is needed. What do you think?

    ReplyDelete
  4. Avik, I had my first exposure to SSIS while preparing for this presentation. What a great tool. It's not the easiest tool to use, but it is extremely powerful and useful - I'm definitely going to try to learn more about it, as it's a huge weapon to have in my data toolbox. Thanks for the post.

    ReplyDelete
  5. Michael raises a compelling point in that SSIS and Crystal Reports are intended for different purposes. Too often tools experience their own versions of "scope creep". Crystal and SSIS each provide value, but with very different purposes.

    ReplyDelete
  6. I agree as well. I was specifically referring to the reporting capabilites whether it is reporting coming from a datawarehouse or crystal reports. I actually have read a few articles that speak about using SSIS with crystal reports but the alternative that I wanted to point out was that when coming to generating reports you can use a data warehouse to pull all the information from different sources and report on OR you could use crystal reports and reach out to multiple sources to report information.

    Sorry if that was confusing. I didn't want to do a comparison or alternative between SSIS and crystal reports (they are more complimentary than substitutes).

    ReplyDelete
  7. I had one year of experience on Crystal reports when I worked as an IT specialist in GSU's Housing department. This is a great tool, expecially useful when combined with MySQL queries. It helped me to retrieve meaningful data from various tables by linking them together.
    My manager was happy for those reports I created from Crystal reports, which helped her make decision on how many females and males in each residential hall in each semester, how many people paid their fees vs. how many people haven't paid their fees in the current semester, which residential hall hosts the most freshmen/sophmore/seniors. It is a great tool for reporting. Sometimes it took me three hours to figure out the queries when I was assigned to a complicated project. But the efforts paid off when I retrieved the report from a great amount of data by linking more than 5 tables and the result was motivating. Perhaps my job title on this position should be changed to Data Analyst, not IT Specialist.
    I have no idea on SSIS, but would love to try it if I get chance.

    ReplyDelete
  8. MO Blog, I'm going to have to look at using Crystal Reports in the way you mentioned, which would be sort of a "Virtual Database Reporting" I would call it. Thanks for bringing this to my attention.

    ReplyDelete

Followers