Advanced Data Management Technologies Unit 5 — Case Studies J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE ADMT 2015/16 — Unit 5 J. Gamper 1/47 Outline 1 Inventory Management 2 Order Management 3 OncoNet 4 MEDAN ADMT 2015/16 — Unit 5 J. Gamper 2/47 Inventory Management Outline 1 Inventory Management 2 Order Management 3 OncoNet 4 MEDAN ADMT 2015/16 — Unit 5 J. Gamper 3/47 Inventory Management Inventory Management Example (Inventory management) Consider a large grocery chain with a central warehouse and several retail stores. Advanced retail business requires inventory information. Making sure the right product is in the right store at the right time minimizes out-of-stocks and reduces overallinventory carrying costs. The retailer needs the ability to analyze daily quantity-onhand inventory levels by product and store. Design dimensional models that support the analysis of inventories for retail businesses (grocery stores). ADMT 2015/16 — Unit 5 J. Gamper 4/47 Inventory Management The Value Chain The value chain identifies the natural, logical flow of an organization’s primary activities. Operational systems provide snapshots at each step with interesting data and performance metrics. Retailer Issues Purchase Order Deliveries at Retailer Warehouse Retailer Warehouse Inventory Deliveries at Retail Store Retail Store Inventory Retail Store Sales ADMT 2015/16 — Unit 5 J. Gamper 5/47 Inventory Management Inventory Models 3 different inventory models Model 1: Inventory periodic snapshot model Model 2: Inventory transactions model Model 3: Inventory accumulating snapshot model ADMT 2015/16 — Unit 5 J. Gamper 6/47 Inventory Management Inventory Periodic Snapshot Model/1 Model 1: Inventory Periodic Snapshot: Every day (or at some other regular time interval) the inventory levels of each product is measured and stored as a new row in the fact table. Example: Inventory of retail store Business process: Analysis of retail store inventory. Granularity: Daily inventory by product at each individual store. Dimensions: Date, product, and store. Facts/measures: Quantity on hand. Date dimension DateKey (PK) FullDateDesc ... Store Dimension StoreKey (PK) StoreName ... ADMT 2015/16 — Unit 5 Store Inventory Snapshot Facts DateKey (FK) ProductKey (FK) StoreKey (FK) QuantityOnHand J. Gamper Product Dimension ProductKey (PK) ProductDesc ... 7/47 Inventory Management Inventory Periodic Snapshot Model/2 Inventory generates dense snapshot tables. In contrast, POS Retail Sales table was sparse. Consequently, inventory fact table is growing fast. 60.000 products x 100 stores = 6 Mio. rows each time. With a row width of 14 bytes, this is 84 MB each time. 1 year of daily snapshots would be 30 GB. Reduce snapshot frequencies over time. Last 60 days of inventory at daily level. Weekly snapshots for older data. Instead of 1.095 snapshots in 3 years, only 208 snapshots would be required. ADMT 2015/16 — Unit 5 J. Gamper 8/47 Inventory Management Inventory Periodic Snapshot Model/3 The quantity on hand is a semi-additive measure. Can be summarized across products and stores, but not across time Different in POS Retail Sales table: Sold entities are counted only once. All measures that record a static level (inventory, financial account balance, measures of intensity, e.g., temperature) are inherently non-additive across time and possibly other dimensions. Can be aggregated along time dimension by averaging. A note about SQL AVG function: Cannot be used to compute the average over time, since it averages over the number of rows. Avg inventory over a cluster of 3 products in 4 stores across 7 days would divide the summed value by 84. ADMT 2015/16 — Unit 5 J. Gamper 9/47 Inventory Management Inventory Transactions Model/1 Model 2: Inventory Transactions: Every transaction that affects the inventory is recorded. Example: Inventory transactions in the store chain Receive product. Place product in to inspection hold. Release product from inspection hold. Return product to vendor due to inspection failure. Place product in bin. Authorize product for sale. Pick product for shipment. Ship product to customer. Receive product form customer. Return product to inventory from customer return. Remove product from inventory. ADMT 2015/16 — Unit 5 J. Gamper 10/47 Inventory Management Inventory Transactions Model/2 Star schema of the inventory transaction model. Date Dimension DateKey (PK) FullDateDesc ... Warhouse Dimension WarehouseKey (PK) WarehouseName WarehouseAddress WarehouseCity WarehouseZip ... Store Inventory Snapshot Facts DateKey (FK) ProductKey (FK) WarehouseKey (FK) VendorKey (FK) InventoryTransactionTypeKey (FK) InventoryTransactionQuantity Product Dimension ProductKey (PK) ... Vendor Dimension VendorKey (PK) ... Inventory Transaction Type Dimension InventoryTransactionTypeKey (PK) InventoryTransactionTypeDesc InventoryTransactionTypeGroup ... Contains most detailed information, e.g., How many shipments from a given vendor? On which products more than one round of inspection? Reconstruction of exact inventory numbers is possible, but not practical! Used in combination with other fact table. ADMT 2015/16 — Unit 5 J. Gamper 11/47 Inventory Management Inventory Accumulative Snapshot Model/1 Model 3: Inventory Accumulating Snapshot: One row in the fact table for each shipment of a particular product to the warehouse. Assumption that the inventory goes through a series of events, e.g., receiving, inspection, bin placement, authorization to sell, picking, boxing, and shipping. A row tracks the disposition of a shipment through these events in the warehouse. Row is updated as the shipment moves through the warehouse until it leaves the warehouse. Characterized by many date dimensions and many updates. ADMT 2015/16 — Unit 5 J. Gamper 12/47 Inventory Management Inventory Accumulative Snapshot Model/2 Date Received Dimension DateKey (PK) ... Date Inspected Dimension DateKey (PK) ... Date Placed in Inventory Dimension DateKey (PK) ... Date authorized to Sell Dimension DateKey (PK) ... Date Picked Dimension DateKey (PK) ... Date Boxed Dimension DateKey (PK) ... Date Shipped Dimension DateKey (PK) ... Date of Last Return Dimension DateKey (PK) ... ADMT 2015/16 — Unit 5 Warehouse Inventory Accumulating Facts Date Received Key (FK) Date Inspected Key (FK) Date Placed in Inventory Key (FK) Date Authorized to Sell key (FK) Date Picked Key (FK) Date Boxed Key (FK) Date Shipped Key (FK) Date of Last Return Key (FK) Product Key (FK) Warehouse Key (FK) Vendor Key (FK) Quantity Received Quantity Inspected Quantity Returned to Vendor Quantity Placed in Bin Quantity Authorized to Sell Quantity Picked Quantity Boxed Quantity Shipped Quantity Returned by Customer Quantity Returned to Inventory Quantity Damaged Quantity Lost Unit Cost Unit List Price Unit Average Price Unit Recovery Price J. Gamper Product Dimension ProductKey (PK) ... Warhouse Dimension WarehouseKey (PK) ... Vendor Dimension VendorKey (PK) ... 13/47 Inventory Management Value Chain Integration Integration accross the value chain is important for the analysis. Allows analysis across the business to better evaluate the overall performance (not just at the individual department level). End-to-end perspective high-level management to customer. This requires the integration and consistent handling/use of data. Solution: Individual fact tables for processes + shared dimensions. Shared dimensions are used by different data marts. POS Retail Sales Transation Facts ... Store Dimension StoreKey (PK) Promotion Dimension PromotionKey (PK) Product Dimension ProductKey (PK) Retail Inventory Snapshot Facts ... Vendor Dimension VendorKey (PK) Date Dimension DateKey (PK) Retail Inventory Snapshot Facts ... ADMT 2015/16 — Unit 5 J. Gamper Warhouse Dimension WarehouseKey (PK) 14/47 Inventory Management Data Warehouse Bus Architecture Data Warehouse Bus Architecture is a standard bus interface that supports the incremental development of a DW. Based on conformed (similar) dimensions that are shared by the DMs. Guides the overall design and breaks down the development process into small chunks (DMs). DMs can be realized at different times and by different groups. ADMT 2015/16 — Unit 5 J. Gamper 15/47 Inventory Management Data Warehouse Bus Matrix/1 Data Warehouse Bus Matrix is a tool to document the bus architecture. Rows represent business processes (translate into DMs). Columns represent a suite of standardized, common and shared dimensions. ADMT 2015/16 — Unit 5 Vendor Contract Shipper X X X Warehouse X X X X X X Promotion X X X X X X Store Product Business Processes Retail Sales Retail Inventory Retail Deliveries Warehouse Inventory Warehouse Deliveries Purchase Orders Date Shared Dimensions X X X X X X X X X J. Gamper 16/47 Inventory Management Data Warehouse Bus Matrix/2 Creating the DW bus matrix is one of the most important up-front deliverables of a DW implementation. Create a comprehensive list of dimensions before filling in the matrix. The rows provide a concise overview about the dimensionality of the individual DMs. The columns show the interaction between the DMs and the common/shared dimensions ADMT 2015/16 — Unit 5 J. Gamper 17/47 Inventory Management Conformed Dimensions Conformed Dimensions are either identical or strict mathematical subsets of the most granular, detailed dimension. Roll-up dimensions conform to the base-level dimension Example: The sales process captures data at the product level, while the forecasting process does it at the brand level Brand table conforms to the atomic product table as it is a strict subset of product table Product Dimension Product Key (PK) Product Description SKU Number Brand Description Subcategory Description Category Description Department Description Package Type Description Package Size ... ADMT 2015/16 — Unit 5 conforms Brand Dimension Brand Key (PK) Brand Description Subcategory Description Category Description Department Description J. Gamper 18/47 Order Management Outline 1 Inventory Management 2 Order Management 3 OncoNet 4 MEDAN ADMT 2015/16 — Unit 5 J. Gamper 19/47 Order Management Order Management/1 Order management consists of several critical business processes (order, shipment, invoiceprocessing, etc.) and measures (sales volume, invoice revenue, etc.). Warehouse bus matrix ADMT 2015/16 — Unit 5 Product Customer Deal Sales Rep Ship From Shipper Business Processes Quotes Orders Shipment Invoicing Date Shared Dimensions X X X X X X X X X X X X X X X X X X X X X X X X J. Gamper 20/47 Order Management Order Management/2 Star Schema Order Date Dimension OrderDateKey (PK) OrderDate Order Date Day of Week ... Requested Ship Date Dimension Requested Ship Date Key (PK) Requested Ship Date ... Product Dimension ProductKey (PK) Order Transaction Facts Order Date Key (FK) Requested Ship Date Key (FK) Product Key (FK) Customer Ship To Key (FK) Sales Rep Key (FK) Deal key (FK) Order Number (DD) Order Quantity Gross Order Dollar Amount Order Deal Discount Dollar Amount Net Order Dollar Amount Customer Ship To Dimension CustomerShipToKey (PK) ... Sales Rep Dimension SalesRepKey (PK) ... Deal Dimension DealKey (PK) ... Issues Should there be a “Ship Date Key” in the fact table? Can/should Order Date Key and Requested Ship Date Key be foreign keys to the same dimension table? ADMT 2015/16 — Unit 5 J. Gamper 21/47 Order Management Role-Playing Role-playing in DW occurs when a single dimension appears several times in the same fact table, e.g., order date and requested ship date. Should not be FK to the same dimension table. SQL would require the two dates to be the same. We might want to constrain the two dimensions differently. The underlying dimension may exist as a single physical table, but each of the roles should be presented in a separate view with different labels. Date Dimension DateKey (PK) Date Date Day of Week ... create view Order Date Dimension Order Date Key (PK) Order Date Order Date Day of Week ... ADMT 2015/16 — Unit 5 create view Requested Ship Date Dimension Requested Ship Date Key (PK) Requested Ship Date Requested Ship Date Day of Week ... J. Gamper 22/47 Order Management Multiple Hierarchies Multiple hierarchies often coexist in a dimension table, especially for customer-oriented dimensions. Natural geographic hierarchy Zip Code defines a second hierarchy Can have different number of levels. Should all be supported in a DW. Customer Ship To Dimension Customer Ship to Key (PK) Customer Ship To ID Customer Ship To Name Customer Ship To Address Customer Ship To City Customer Ship To State Customer Ship To Zip + 4 Customer Ship To Zip Region Customer Ship To Zip Sectional Center Customer Bill To Name Customer Bill To Address Attributes Customer Credit Rating Customer URL ... ADMT 2015/16 — Unit 5 J. Gamper 23/47 Order Management Fact Normalization Fact normalization: Normalize the fact table and collapse all measures into a single measure along with a special fact dimension that identifies the type of the measure. Makes only sense if fact table is sparsely populated and no computations are made between measures of different type; e.g., medical tests where different things are measured and data is sparse. Example: Normalize Order Transaction Fact Table Fact Dimension table has entries for Order Quantity, Gross Order Dollar Amount, Order Deal Discount Dollar Amount, Net Order Dollar Amount. Not normalized Normalized Order Transaction Facts Order Date Key (FK) Requested Ship Date (FK) ... Order Quantity Gross Order Dollar Amount Order Deal Discount Dollar Amount Net Order Dollar Amount Order Transaction Facts Order Date Key (FK) Requested Ship Date (FK) ... Entry Type (FK) Amount ADMT 2015/16 — Unit 5 J. Gamper Fact Dimension Entry Type Key (PK) Entry Type Unit of Measurement ... 24/47 OncoNet Outline 1 Inventory Management 2 Order Management 3 OncoNet 4 MEDAN ADMT 2015/16 — Unit 5 J. Gamper 25/47 OncoNet OncoNet/1 Collaboration with Hospital Meran (BSc thesis of A. Heinisch) OncoNet is an application for the management of patients undergoing a cancer therapy. Cancer therapy follows a treatment plan/protocol. Event type Data collection Data collection Data collection Medication Medication ADMT 2015/16 — Unit 5 Event description Ct Thorax Ct Abdomen Hemogram Zofran Adriblastina J. Gamper Date Day 60, 100, 360, 720 Day 60, 100, 360, 720 Day 110 Day 1 Day 1 26/47 OncoNet OncoNet/2 Business process: Analysis of cancer therapies. Queries to answer: How many patients with normal blood pressure after medication X? Which dosages of drug A were successful to reduce parameter Y? etc. Granularity: Individual events of the chemotherapy. Includes measurements, examinations, questionnaires, etc. ADMT 2015/16 — Unit 5 J. Gamper 27/47 OncoNet OncoNet/3 Patient and Drug Dimension Patient Dimension Patient Key (SK) Patient ID Patient First Name Patient Last Name Patient Gender Patient Address Patient ZIP Code Patient Phone Number Patient Profession Patient First Language Patient Height Patient Weight Patient Body Surface Area Patient Place of Birth Patient Birthday Date Key Patient Death Date Key Patient First Admission Doctor Patient First Admission Area Patient Smoking Indicator Patient Cigarettes per Day Patient Alcohol Indicator Patient Alcohol Amount per Day ... ADMT 2015/16 — Unit 5 Drug Dimension Drug Key (SK) Drug ID Drug Name Drug Category Drug Active Substance Drug Manufacturer Drug Quantity Unit Drug Quantity Unit Description Drug Administration Type Drug Administration Location Drug Packaging Drug Packaging AIC Code ... J. Gamper 28/47 OncoNet OncoNet/4 Normalized fact table Only one measure is used in the fact table. Type of measure is described in Event Dimension Table and Investigation Dimension Table. Chemotherapy Event Facts Date Key (FK) Prescribing Date Key (FK) Relative Date Key (FK) Patient Key (FK) Therapy Key (FK) Drug Key (FK) Event Key (FK) Investigation Key (FK) Numverical Value Textual Value ADMT 2015/16 — Unit 5 J. Gamper 29/47 OncoNet OncoNet/5 Patient Key Patient ID First Name Last Name Gender Language Weight Height . . . 1 345 Hans Maier male German 68 185 ... Date Key Date Type Full Date Day of Week Weekday Ind Month Year . . . 1 Normal 05.06.2012 Tuesday Weekday June 2012 . . . ... Date key Patient Key Therapy Key Event Key Investigation Key Numerical Value . . . 1 1 1 1 1 32,7 ... Therapy Key Therapy ID Therapy Name Therapy Type . . . 1 25436 NHL Chop 14 Profile ... Investigation Key Group Level Label 1 Haemogram MCH 2 Haemogram HB 3 Haemogram MCV Unit . . . 1 g/dl pg Event Key Event ID Event Name Event Type Responsible . . . 1 9372 Urgent Laboratory Laboratory Test Nurse ADMT 2015/16 — Unit 5 J. Gamper 30/47 MEDAN Outline 1 Inventory Management 2 Order Management 3 OncoNet 4 MEDAN ADMT 2015/16 — Unit 5 J. Gamper 31/47 MEDAN MEDAN MEDAN MEdical Data Warehousing and ANalysis. Collaboration between the Hospital Meran and the FUB. Objectives Conduct research and create competences in the field of medical data warehousing and analysis. Build a BI/DW solution Administrative DW Medical DW Develop and apply data analysis/mining techniques. ADMT 2015/16 — Unit 5 J. Gamper 32/47 MEDAN MEDAN Data Sources Data sources in a health care environment. Internal production systems (SQL, Excel, Text files, ...). External information systems. ADMT 2015/16 — Unit 5 J. Gamper 33/47 MEDAN Budgeting Using a Spreadsheet Example: Budgeting in the controlling department Complex and error-prone Excel spreadsheets have been used in the past. Monthly reports are sent to the Province. ADMT 2015/16 — Unit 5 J. Gamper 34/47 MEDAN Budgeting Using QlickView/1 A QlickView application to replace Excel. QlickView is a state-of-the-art data analysis tool. All data are kept in main memory → fast. Easy-to-use for small/medium sized applications. No solution for ETL/data staging. Direct access to the data sources. Data integration in QlickView. No DW in place. Reporting Controlling Dept QlikView Excel ADMT 2015/16 — Unit 5 ... Oracle J. Gamper 35/47 MEDAN Budgeting Using QlickView/2 QlickView application: Actual budget Big improvement over Excel solution. ADMT 2015/16 — Unit 5 J. Gamper 36/47 MEDAN Budgeting Using QlickView/2 QlickView application: Budget over time ADMT 2015/16 — Unit 5 J. Gamper 37/47 MEDAN MEDAN DW Solution Difficult to convince decision makers to build a DW as the core of a BI solution. Too much technology-driven (QlikView, . . . ). QlickView is mainly an analysis tool and cannot replace a DW. Good for quick and small ad-hoc solutions. Difficult to do data cleaning and hence to control data quality. Not scalable for many applications, changing sources, etc. Since Oracle technology was already in place, we convinced them to use this technology. ADMT 2015/16 — Unit 5 J. Gamper 38/47 MEDAN MEDAN Architecture Oracle ODI for ETL part and Data Storage. Oracle DB for the DW. QlickView and OBI for data analysis. ADMT 2015/16 — Unit 5 J. Gamper 39/47 MEDAN MEDAN Bottom Up Approach Prototype of DM CONT for the Hospital of Meran; consists of 3 cubes: Hospital Stays Services Transfers Each cube corresponds to a business process. Deploy DM CONT in the Hospital of Meran, then in other hospitals in South Tyrol. Repeat the same cycle for other DMs: DM Personnel, DM Pharmacy, DM Laboratory, etc. ADMT 2015/16 — Unit 5 J. Gamper 40/47 MEDAN Conceptual Model of Hospital Stays Cittadinanza* Sigla Nome* Nazione Nome* Regione Nome* Nome* ASL Nome* Comprensorio Nome* Comunità comprensoriale Nome* Sigla Provincia Distretto sanitario Cittaninanza* Sigla Nome* Nome* Nazione Nome* Regione Nome* ASL Nome* Each event stores a hospital stay of a patient. Similar model for the other data cubes: Descrizione* Tipo Descrizione* Sesso Comune residenza Comprensorio Nome* nascita Descrizione* Stato civile Descrizione* Stato assistito Assistito Presidio Reparto ministeriale Nome* Reparto nascita decesso Ricovero Data Dimissione Età a Età da Ricoveri Fascia età* Età Nome* Nome* Tipo MDC Nome* Services Transfer Titolo studio* CAP Nome* DRG Nosografico numero trasferimenti gg degenza numero accessi dh gg permesso gg classe numero prestazioni laboratorio numero prestazioni radiologia tariffa drg_provinciale tariffa drg_nazionale peso_drg Descrizione* Regime ricovero Descrizione* Motivo DH Descrizione* Tipo ricovero Descrizione* Provenienza Descrizione* Modalità dimissione Descrizione* Onere degenza Sigla Traumatismi Tipo* Riscontro autoptico Descrizione* Nome* Tipo proponente Diagnosi L1 Nome* Classe Diagnosi L2 Nome* Dimissione protetta Versione ICD9 Descrizione* Descrizione* Descrizione* Descrizione* Dimissione protetta grado Diagnosi L3 Shared dimensions are used. Descrizione* Cattegoria* Classe anestesiologica Descrizione* Descrizione* Visita anestesiologica Sigla Cattegoria* Versione ICD9 Descrizione* Ricovero ripetuto Nome* Intervento L1 Nome* Intervento L2 Nome* Descrizione* Ricovero appropriato Anno Intervento L3 Quadrimestre Semestre Trimestre Nome* ASL Nome* Comprensorio Nome* Area amministrativa Nome* Settore Nome* Centro di responsabilità Nome* Centro di costo aziendale Settimana mese Settimana anno Descrizione* Giorno settimana Giorno mese Giorno anno Mese Descrizione breve* Descrizione lunga* Ultimo giorno settimana Ultimo giorno mese Ultimo giorno anno Data Dimissione Ricovero ADMT 2015/16 — Unit 5 J. Gamper 41/47 MEDAN Conceptual Model of Services and Transfers Cittadinanza* Cittadinanza* Sigla Sigla Nome* Nome* Nazione Nazione Cittaninanza* Nome* Nome* Sigla Regione Regione Nome* Nome* Nazione Nome* Regione Nome* ASL Nome* Nome* Nome* ASL Nome* Comprensorio Nome* Comunità comprensoriale Comprensorio Nome* Nome* Nome* Sigla Ora metà Giorno metà Descrizione* Tipo Descrizione* Sesso nascita Orario 12 Orario 24 nascita decesso Data Orario Tipo assistito (interno/esterno) Mese Descrizione breve* Nome* Descrizione lunga* ASL Nome* Ultimo giorno mese Comprensorio Nome* Giorno anno Ultimo giorno anno Area amministrativa Nome* Settore Nome* Centro di responsabilità Nome* Ultimo giorno settimana Data Gerarchia ministeriale (p. e. branca) erogante Centro di costo aziendale richiedente Prestazione LAB Descrizione* Prestazione ministeriale Descrizione* Prestazione RAD Nome* Livello 2 Nome* Descrizione* Prestazione interna Livello 4 Livello 5 decesso Ricovero Trasferiemnti Data Nosografico gg degenza numero prestazioni laboratorio numero prestazioni radiologia tariffa drg_provinciale tariffa drg_nazionale Descrizione* Regime ricovero Descrizione* Motivo DH Descrizione* Nome* Tipo ricovero ICD9 Nome* Provenienza Importanza Nome* Modalità dimissione Nome* Onere degenza Codice Intervento Descrizione* Descrizione* Descrizione* Descrizione* Traumatismi Interventi Scheda Reparto Descrizione* Riscontro autoptico Descrizione* Tipo proponente Descrizione* Nome* Classe ICD9 Nome* Dimissione protetta Importanza Nome* Dimissione protetta grado Codice Diagnosi Nome* Descrizione* Descrizione* Classe anestesiologica Descrizione* Descrizione* Visita anestesiologica Diagnosi Scheda Reparto Descrizione* Nome* Nome* Nome* Tariffa Provinciale Nome* Tariffa Nazionale Nome* Data Dimissione Nome* Data Ricovero Ricovero ripetuto Centro di costo provinciale richiedente Anno Quadrimestre Nome* Semestre Trimestre Settimana mese ASL Nome* Comprensorio Nome* Descrizione* Area amministrativa Nome* Giorno settimana Settore Nome* Centro di responsabilità Nome* Centro di costo aziendale Descrizione* Ricovero appropriato Nome* Scheda Reparto erogante nascita Dimissione Età Nome* Livello 1 Livello 3 Descrizione* Prestazione AMB Reparto Età a Età da Fascia età* Giorno mese Descrizione* Stato civile Descrizione* Stato assistito Assistito Nome* Nosologico erogante Descrizione* Giorno settimana Descrizione* Comprensorio nascita Presidio Nosografico richiedente Descrizione* Descrizione* Tipo Descrizione* Sesso Comune residenza MDC Nome* Codice esame quantità importo Trimestre Settimana anno ASL Nome* Tipo Prestazioni Semestre Settimana mese Regione Nome* Reparto ministeriale Nome* Titolo studio* CAP Nazione Nome* Nome* DRG Anno Quadrimestre Sigla Nome* Nome* Nome* Descrizione* Stato civile Descrizione* Stato assistito Assistito Giorno periodo Descrizione* Minuto ora Minuto giorno Nome* Provincia Distretto sanitario Nome* Titolo studio* CAP Comune Ora quarto Comunità comprensoriale Sigla Nome* Nome* Reparto Ora 12 Ora 24 Comprensorio Nome* Cittaninanza* Distretto sanitario residenza ASL Nome* Provincia Presidio Reparto ministeriale Nome* Settimana anno Giorno mese Giorno anno Mese Descrizione breve* Descrizione lunga* Ultimo giorno settimana Ultimo giorno mese Ultimo giorno anno Data Dimissione Ricovero ADMT 2015/16 — Unit 5 J. Gamper 42/47 MEDAN Logical Model of Hospital Stays Snowflake schema F_RICOVERI D_REPARTI P * KEY_REPARTO CODICE_REPARTO REPA_NOME_IT REPA_NOME_DE CODICE_REPARTO_PRESIDIO REPA_PRES_NOME_IT REPA_PRES_NOME_DE CODICE_PRESIDIO PRES_NOME_IT PRES_NOME_DE CODICE_COMPRENSORIO COMP_NOME_IT COMP_NOME_DE CODICE_ASL ASL_NOME_IT ASL_NOME_DE CODICE_REGIONE REGI_NOME_IT REGI_NOME_DE CODICE_NAZIONE NAZI_NOME_IT NAZI_NOME_DE NAZI_CITTADINANZA_IT NAZI_CITTADINANZA_DE NAZI_SIGLA FLAG_STATO D_REPA_PK P * KEY_DRG CODICE_DRG DRG_NOME_IT DRG_NOME_DE TIPO_DRG_SIGLA TIPO_DRG_NOME_IT TIPO_DRG_NOME_DE CODICE_MDC MDC_NOME_IT MDC_NOME_DE FLAG_STATO D_DRG NUMBER (10) VARCHAR2 (5 BYTE) VARCHAR2 (200 BYTE) VARCHAR2 (200 BYTE) VARCHAR2 (2 BYTE) VARCHAR2 (20 BYTE) VARCHAR2 (20 BYTE) VARCHAR2 (5 BYTE) VARCHAR2 (200 BYTE) VARCHAR2 (200 BYTE) VARCHAR2 (1 BYTE) D_DRG_PK P * KEY_RICOVERO NOSOGRAFICO F KEY_ASSISTITO F KEY_REPARTO_RICOVERO F KEY_REPARTO_DIMISSIONE F KEY_DATA_RICOVERO F KEY_DATA_DIMISSIONE F KEY_DRG F KEY_CDC_AZIENDALE F KEY_GRUPPO_DIAGNOSI F KEY_GRUPPO_INTERVENTI F KEY_CAMPO_SDO1 F KEY_CAMPO_SDO2 F KEY_CAMPO_SDO3 F KEY_CAMPO_SDO4 KEY_INDICATORI NUMERO_TRASFERIMENTI NUMERO_ACCESSI_DH GG_DEGENZA GG_PERMESSO GG_CLASSE GG_OLTRE_SOGLIA NUMERO_PRESTAZIONI_LABORATORIO NUMERO_PRESTAZIONI_RADIOLOGIA TARIFFA_DRG_PROVINCIALE TARIFFA_DRG_NAZIONALE PESO_DRG TRANS_INIZIO TRANS_FINE FLAG_STATO D_ASSISTITI NUMBER (10) NUMBER (10) NUMBER (10) NUMBER (10) NUMBER (10) NUMBER (10) NUMBER (10) NUMBER (10) NUMBER (10) NUMBER (10) NUMBER (10) NUMBER (10) NUMBER (10) NUMBER (10) NUMBER (10) NUMBER (10) NUMBER (5) NUMBER (5) NUMBER (5) NUMBER (5) NUMBER (5) NUMBER (5) NUMBER (5) NUMBER (5) NUMBER (14,4) NUMBER (14,4) NUMBER (8,4) DATE DATE CHAR (1 BYTE) P * KEY_ASSISTITO CODICE_ASSISTITO ASSI_CODI_ASSI_CORR ASSI_CODI_TIPO ASSI_TIPO_DESC_IT ASSI_TIPO_DESC_DE ASSI_CODI_SESS ASSI_SESS_DESC_IT ASSI_SESS_DESC_DE ASSI_ETA ASSI_CODI_FASC_ETA ASSI_FASC_ETA_DESC_IT ASSI_FASC_ETA_DESC_DE ASSI_CODI_STAT_CIVI ASSI_STAT_CIVI_DESC_IT ASSI_STAT_CIVI_DESC_DE ASSI_CODI_STAT_ASSI ASSI_STAT_ASSI_DESC_IT ASSI_STAT_ASSI_DESC_DE ASSI_CODI_STAT_ANAG ASSI_STAT_ANAG_DESC_IT ASSI_STAT_ANAG_DESC_DE ASSI_TITO_STUD_IT ASSI_TITO_STUD_DE F KEY_DATA_NASCITA F KEY_DATA_DECESSO F KEY_COMUNE_NASCITA F KEY_COMUNE_RESIDENZA FLAG_STATO D_CAMP_SDO1_PK D_CAMP_SDO2_PK D_CDC_AZIE_PK NUMBER (10) VARCHAR2 (9 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (9 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (9 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (9 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (6 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (6 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (1 BYTE) D_GRUPPI_DIAGNOSI P * KEY_GRUPPO_DIAGNOSI D_GRUP_DIAG_PK NUMBER (10) D_BRIDGE_DIAGNOSI F * KEY_GRUPPO_DIAGNOSI F * KEY_DIAGNOSI LIVELLO_DIAGNOSI PESO_DIAGNOSI NUMBER (10) NUMBER (10) NUMBER (2) NUMBER (4,4) D_DIAGNOSI P * KEY_DIAGNOSI CODICE_VERSIONE_ICD9 VERS_ICD9_SIGLA CODICE_DIAG_L3 DIAG_L3_NOME_IT DIAG_L3_NOME_DE CODICE_DIAG_L2 DIAG_L2_NOME_IT DIAG_L2_NOME_DE CODICE_DIAG_L1 DIAG_L1_NOME_IT DIAG_L1_NOME_DE TIPO_NOME_IT TIPO_NOME_DE CATEGORIA_NOME_IT CATEGORIA_NOME_DE FLAG_STATO NUMBER (10) NUMBER (2) VARCHAR2 (20 BYTE) VARCHAR2 (5 BYTE) VARCHAR2 (300 BYTE) VARCHAR2 (300 BYTE) VARCHAR2 (5 BYTE) VARCHAR2 (300 BYTE) VARCHAR2 (300 BYTE) VARCHAR2 (5 BYTE) VARCHAR2 (300 BYTE) VARCHAR2 (300 BYTE) VARCHAR2 (300 BYTE) VARCHAR2 (300 BYTE) VARCHAR2 (300 BYTE) VARCHAR2 (300 BYTE) VARCHAR2 (1 BYTE) NUMBER (10) D_CAMP_SDO3_PK D_BRIDGE_INTERVENTI F * KEY_GRUPPO_INTERVENTI F * KEY_INTERVENTO LIVELLO_INTERVENTO PESO_INTERVENTO NUMBER (10) NUMBER (10) NUMBER (2) NUMBER (4,4) D_INTERVENTI P * KEY_INTERVENTO CODICE_VERSIONE_ICD9 VERS_ICD9_SIGLA CODICE_INTE_L3 INTE_L3_NOME_IT INTE_L3_NOME_DE CODICE_INTE_L2 INTE_L2_NOME_IT INTE_L2_NOME_DE CODICE_INTE_L1 INTE_L1_NOME_IT INTE_L1_NOME_DE CATEGORIA_NOME_IT CATEGORIA_NOME_DE FLAG_STATO D_INTE_CODI_INTE_IDX NUMBER (10) NUMBER (2) VARCHAR2 (20 BYTE) VARCHAR2 (5 BYTE) VARCHAR2 (300 BYTE) VARCHAR2 (300 BYTE) VARCHAR2 (5 BYTE) VARCHAR2 (300 BYTE) VARCHAR2 (300 BYTE) VARCHAR2 (5 BYTE) VARCHAR2 (300 BYTE) VARCHAR2 (300 BYTE) VARCHAR2 (300 BYTE) VARCHAR2 (300 BYTE) VARCHAR2 (1 BYTE) D_COMU_PK D_CAMPI_SDO3 NUMBER (10) VARCHAR2 (2 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (2 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (2 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (1 BYTE) D_CAMPI_SDO4 P * KEY_CAMPO_SDO4 CODICE_TRAUMATISMI TRAU_DESCR_IT TRAU_DESCR_DE CODICE_RISCONTRO_AUTOPTICO RISC_AUTO_DESCR_IT RISC_AUTO_DESCR_DE CODICE_CLASSE_ANESTETICA CLAS_ANES_DESCR_IT CLAS_ANES_DESCR_DE CODICE_VISITA_ANESTETICA VISI_ANES_DESCR_IT VISI_ANES_DESCR_DE FLAG_STATO D_CAMP_SDO4_PK NUMBER (10) VARCHAR2 (10 BYTE) VARCHAR2 (70 BYTE) VARCHAR2 (70 BYTE) VARCHAR2 (17 BYTE) VARCHAR2 (10 BYTE) VARCHAR2 (70 BYTE) VARCHAR2 (70 BYTE) VARCHAR2 (15 BYTE) VARCHAR2 (10 BYTE) VARCHAR2 (70 BYTE) VARCHAR2 (70 BYTE) VARCHAR2 (10 BYTE) VARCHAR2 (50 BYTE) VARCHAR2 (50 BYTE) VARCHAR2 (50 BYTE) VARCHAR2 (50 BYTE) VARCHAR2 (12 BYTE) VARCHAR2 (10 BYTE) VARCHAR2 (70 BYTE) VARCHAR2 (70 BYTE) VARCHAR2 (1 BYTE) NUMBER (10) VARCHAR2 (2 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (2 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (2 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (2 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (2 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (1 BYTE) P * KEY_CAMPO_SDO3 CODICE_MODALITA_DIMISSIONE MODA_DIMI_DESCR_IT MODA_DIMI_DESCR_DE CODICE_DIMISSIONE_PROTETTA DIMI_PROT_DESCR_IT DIMI_PROT_DESCR_DE CODICE_DIMI_PROTETTA_GRADO DIMI_PROT_GRAD_DESCR_IT DIMI_PROT_GRAD_DESCR_DE FLAG_STATO D_GRUPPI_INTERVENTI P * KEY_GRUPPO_INTERVENTI D_GRUP_INTE_PK D_COMUNI P * KEY_COMUNE CODICE_COMUNE COMU_NOME_IT COMU_NOME_DE COMU_CAP CODICE_PROVINCIA PROV_NOME_IT PROV_NOME_DE PROV_SIGLA CODICE_REGIONE REGI_NOME_IT REGI_NOME_DE CODICE_NAZIONE NAZI_NOME_IT NAZI_NOME_DE NAZI_CITTADINANZA_IT NAZI_CITTADINANZA_DE NAZI_SIGLA CODICE_ASL ASL_NOME_IT ASL_NOME_DE FLAG_STATO D_CAMPI_SDO2 P * KEY_CAMPO_SDO2 CODICE_REGIME_RICOVERO REG_RIC_DESCR_IT REG_RIC_DESCR_DE CODICE_MOTIVO_DH MOTI_DH_DESCR_IT MOTI_DH_DESCR_DE CODICE_TIPO_RICOVERO TIPO_RICO_DESCR_IT TIPO_RICO_DESCR_DE CODICE_ONERE_DEGENZA ONER_DEGE_DESCR_IT ONER_DEGE_DESCR_DE CODICE_CLASSE CLAS_DESCR_IT CLAS_DESCR_DE FLAG_STATO NUMBER (10) DATE VARCHAR2 (10 BYTE) VARCHAR2 (11 BYTE) VARCHAR2 (12 BYTE) VARCHAR2 (18 BYTE) VARCHAR2 (19 BYTE) VARCHAR2 (10 BYTE) VARCHAR2 (10 BYTE) NUMBER (1) NUMBER (2) NUMBER (3) NUMBER (1) NUMBER (1) NUMBER (1) NUMBER (1) NUMBER (2) VARCHAR2 (3 BYTE) VARCHAR2 (3 BYTE) VARCHAR2 (10 BYTE) VARCHAR2 (10 BYTE) NUMBER (2) NUMBER (1) NUMBER (1) NUMBER (1) NUMBER (4) D_CDC_AZIENDALI P * KEY_CDC_AZIE CODICE_CDC CDC_NOME_IT CDC_NOME_DE CODICE_CDR CDR_NOME_IT CDR_NOME_DE CODICE_SETTORE SETT_NOME_IT SETT_NOME_DE CODICE_AREA_AMMINISTRATIVA AREA_AMMI_NOME_IT AREA_AMMI_NOME_DE CODICE_COMPRENSORIO COMP_NOME_IT COMP_NOME_DE CODICE_ASL ASL_NOME_IT ASL_NOME_DE FLAG_STATO NUMBER (10) NUMBER (8) NUMBER (8) VARCHAR2 (2 BYTE) VARCHAR2 (60 BYTE) VARCHAR2 (60 BYTE) VARCHAR2 (2 BYTE) VARCHAR2 (20 BYTE) VARCHAR2 (20 BYTE) NUMBER (3) NUMBER (3) VARCHAR2 (50 BYTE) VARCHAR2 (50 BYTE) VARCHAR2 (2 BYTE) VARCHAR2 (20 BYTE) VARCHAR2 (20 BYTE) VARCHAR2 (2 BYTE) VARCHAR2 (60 BYTE) VARCHAR2 (60 BYTE) VARCHAR2 (2 BYTE) VARCHAR2 (60 BYTE) VARCHAR2 (60 BYTE) VARCHAR2 (60 BYTE) VARCHAR2 (60 BYTE) NUMBER (10) NUMBER (10) NUMBER (10) NUMBER (10) VARCHAR2 (1 BYTE) D_CAMPI_SDO1 D_ASSI_CODI_ASSI_CORR_IDX P * KEY_CAMPO_SDO1 NUMBER (10) CODICE_PROVENIENZA VARCHAR2 (2 BYTE) PROV_DESCR_IT VARCHAR2 (100 BYTE) PROV_DESCR_DE VARCHAR2 (100 BYTE) CODICE_TIPO_PROPONENTE VARCHAR2 (2 BYTE) TIPO_PROP_DESCR_IT VARCHAR2 (100 BYTE) TIPO_PROP_DESCR_DE VARCHAR2 (100 BYTE) FLAG_STATO VARCHAR2 (1 BYTE) F_RICO_FLAG_STAT_IDX D_DATE P * KEY_DATA DATA DATA_BREVE DATA_MEDIA_IT DATA_MEDIA_DE DATA_LUNGA_IT DATA_LUNGA_DE GIORNO_SETTIMANA_IT GIORNO_SETTIMANA_DE GIORNO_SETTIMANA GIORNO_MESE GIORNO_ANNO ULTIMO_GIORNO_SETTIMANA ULTIMO_GIORNO_MESE ULTIMO_GIORNO_ANNO SETTIMANA_MESE SETTIMANA_ANNO MESE_BREVE_IT MESE_BREVE_DE MESE_LUNGO_IT MESE_LUNGO_DE MESE_ANNO TRIMESTRE QUADRIMESTRE SEMESTRE ANNO D_DATE_PK NUMBER (10) VARCHAR2 (10 BYTE) VARCHAR2 (90 BYTE) VARCHAR2 (90 BYTE) VARCHAR2 (10 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (10 BYTE) VARCHAR2 (90 BYTE) VARCHAR2 (90 BYTE) VARCHAR2 (10 BYTE) VARCHAR2 (90 BYTE) VARCHAR2 (90 BYTE) VARCHAR2 (10 BYTE) VARCHAR2 (90 BYTE) VARCHAR2 (90 BYTE) VARCHAR2 (10 BYTE) VARCHAR2 (70 BYTE) VARCHAR2 (70 BYTE) VARCHAR2 (10 BYTE) VARCHAR2 (50 BYTE) VARCHAR2 (50 BYTE) VARCHAR2 (50 BYTE) VARCHAR2 (50 BYTE) VARCHAR2 (15 BYTE) VARCHAR2 (1 BYTE) NUMBER (10) VARCHAR2 (2 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (2 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (2 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (2 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (100 BYTE) VARCHAR2 (1 BYTE) D_INDICATORI P * KEY_INDICATORE * CODICE_RICOVERO_RIPETUTO RICO_RIPE_DESCR_IT RICO_RIPE_DESCR_DE * CODICE_RICOVERO_APPROPRIATO RICO_APPR_DESCR_IT RICO_APPR_DESCR_DE FLAG_STATO NUMBER (10) NUMBER (2) VARCHAR2 (50 BYTE) VARCHAR2 (50 BYTE) NUMBER (2) VARCHAR2 (50 BYTE) VARCHAR2 (50 BYTE) VARCHAR2 (1 BYTE) D_INDI_PK D_DIAG_CODI_DIAG_IDX ADMT 2015/16 — Unit 5 J. Gamper 43/47 MEDAN Multi-valued Dimensions Diagnoses and procedures are examples of multi-valued attributes/dimensions, i.e., a patient typically has multiple diagnoses (up to > 10). Solutions Reserve multiple columns (one for each diagnosis). Results in many empty cells, i.e., sparse fact table. Use several facts for a single hospital stay (one for each diagnosis). Similar to fact normalization for measures. Increases the number of tuples in fact table. Multiple columns Multiple facts Hospital Stay Facts Data Hospital Stay Admission (FK) Data Hospital Stay Discharge (FK) Health Record Number (FK) Health Record Year (FK) Patient Key (FK) ... Diagnosis Key 1 (FK) Diagnosis Key 2 (FK) Diagnosis Key 3 (FK) ... Hospital Stay Facts Data Hospital Stay Admission (FK) Data Hospital Stay Discharge (FK) Health Record Number (FK) Health Record Year (FK) Patient Key (FK) ... Diagnosis Key (FK) ... Bridge tables ADMT 2015/16 — Unit 5 J. Gamper 44/47 MEDAN Bridge Tables Bridge tables help to deal with multi-valued dimensions, i.e., many-to-many relationships. Bridge table implements two one-to-many relationships. Bridge table can also be used to represent many-to-many relationships between dimensional attributes, e.g., between books and authors. Hospital Stay Facts Data Hospital Stay Admission (FK) Data Hospital Stay Discharge (FK) Health Record Number (FK) Health Record Year (FK) Patient Key (FK) ... Diagnosis Group (FK) ... Diagnosis Group Dimension Diagnosis Group Key (PK) Diagnosis Group Name ... Diagnosis Group Bridge Diagnosis Group Key (FK) Diagnosis Group Key (FK) Diagnosis Level Weighting Factor ... ADMT 2015/16 — Unit 5 J. Gamper Diagnosis Dimension Diagnosis Key (FK) Diagnosis Name Diagnosis Code Diagnosis Type Diagnosis Category ... 45/47 MEDAN MEDAN Lessons Learned Developing a BI platform is a process that takes years. A well-designed and consistent DW is the foundation for BI. QlikView is a tool for quick analyses; it cannot replace a DW. Do not put anything in a single data mart. Use one DM for one business process (set of closely related business queries). Different opinions on bottom-up vs. top-down, but bottom- up seems to have more acceptance. Data modeling is difficult but very important. Helps to get a conformed view on the business. e.g., what is an admission/hospital stay? Different granularity by different users, e.g., Province, Hospital. Establishing a Business Intelligence Competence Center (BICC) is crucial in order to coordinate the whole project, take important decisions about the data. ADMT 2015/16 — Unit 5 J. Gamper 46/47 MEDAN Summary Different Inventory models: periodic snapshot model, transactions model, accumulating. Shared and conformed dimensions are crucial to integrate several DMs across a value chain. DW bus architecture is a standard interface to support the incremental development and successfull integration of separate DMs. DW bus matrix is a tool to document the DW bus architecture. Role-playing allows to physically store a dimension only once, but use it several times in different roles and with different names. Fact normalization collapses all measures into a single measure together with a special fact dimension to determine the type of the measure. Only useful when the fact table is sparse. Multi-valued dimensions if a dimension occurs more than once in a single fact, e.g., a patient has typically several diagnoses. Bridge tables can be used to represent such dimensions Bridge tables can be used to represent n-to-m relationships between dimensional attributes (i.e., multiple arcs in DFM) ADMT 2015/16 — Unit 5 J. Gamper 47/47