求教的意思是什么DATASTAGE中MERGE,JOIN和LOOKUP三者之间的区别

相关文章推荐
如果你使用Datastage7.5.x来做ETL,那么建议你先看下一IBM官方公布的已修正的BUG,
http://www-/support/docview.wss?uid=sw...
1、DS服务器端在WIN7上安装不上,在进行到33%的时候报: 问题原因:当前用户下不能注册license信息,需要进入到Administratior用户下进行安装。解决步骤:右键点击我的电脑,选择管...
版本:IBM InfoSphere DataStage V11.3.1
操作系统:linux redhat 6.4
场景:最近开发环境,DS
作业运行不显示数据状态(即Link颜色变化以及数据...
datastage客户端连接不上,报80011的错误,如下图。
意为:(用户名和/或密码不正确。如果使用了凭证映射,请检查凭证映射用户名和密码是否配置正确。 (80011)
解决方案:
Failed a keylookup for record
DataStage作为一款极易上手的ETL工具,在国内占据着大部分的市场。但即使DataStage操作起来已经相当直观,且日志...
按照安装教程安装虚拟机版的datastage 8.7后,使用命令启动was失败
infosrvr:~ # /opt/IBM/WebSphere/AppServer/bin/startServer.s...
这三个组件网上看了半天也没内
Datastage Routine示例——使用Routine提取JOB错误信息录入数据库
环境:Red Hat Linux AS 4 + Datastage EE 7.5.2 + Oracle 10...
他的最新文章
讲师:董晓杰
讲师:姚远
他的热门文章
您举报文章:
举报原因:
原文地址:
原因补充:
(最多只允许输入30个字) 上传我的文档
 下载
 收藏
该文档贡献者很忙,什么也没留下。
 下载此文档
datastage常用Stage
下载积分:200
内容提示:datastage常用Stage
文档格式:PDF|
浏览次数:370|
上传日期: 10:08:16|
文档星级:
全文阅读已结束,如果下载本文需要使用
 200 积分
下载此文档
该用户还上传了这些文档
datastage常用Stage
关注微信公众号sql server - What are the differences between Merge Join and Lookup transformations in SSIS? - Stack Overflow
Join Stack Overflow to learn, share knowledge, and build your career.
or sign in with
Hi I'm new to SSIS packages and writing a package and reading up about them at the same time.
I need to convert a DTS into a SSIS package and I need to perform a join on two sources from different databases and was wondering what was the better apporach, to use a lookup or
a merge join?
On the surface they seem very similar. The 'Merge Join' requires that the data be sorted before hand whereas the 'Lookup' doesn't require this. Any advice would be very helpful. Thank you.
3,03411823
Screenshot #1 shows few points to distinguish between Merge Join transformation and Lookup transformation.
Regarding Lookup:
If you want to find rows matching in source 2 based on source 1 input and if you know there will be only one match for every input row, then I would suggest to use Lookup operation. An example would be you OrderDetails table and you want to find the matching Order Id and Customer Number, then Lookup is a better option.
Regarding Merge Join:
If you want to perform joins like fetching all Addresses (Home, Work, Other) from Address table for a given Customer in the Customer table, then you have to go with Merge Join because the customer can have 1 or more addresses associated with them.
An example to compare:
Here is a scenario to demonstrate the performance differences between Merge Join and Lookup. The data used here is a one to one join, which is the only scenario common between them to compare.
I have three tables named dbo.ItemPriceInfo, dbo.ItemDiscountInfo and dbo.ItemAmount. Create scripts for these tables are provided under SQL scripts section.
Tablesdbo.ItemPriceInfo and dbo.ItemDiscountInfo both have 13,349,729 rows. Both the tables have the ItemNumber as the common column. ItemPriceInfo has Price information and ItemDiscountInfo has discount information. Screenshot #2 shows the row count in each of these tables. Screenshot #3 shows top 6 rows to give an idea about the data present in the tables.
I created two SSIS packages to compare the performance of Merge Join and Lookup transformations. Both the packages have to take the information from tables dbo.ItemPriceInfo and dbo.ItemDiscountInfo, calculate the total amount and save it to the table dbo.ItemAmount.
First package used Merge Join transformation and inside that it used INNER JOIN to combine the data. Screenshots #4 and #5 show the sample package execution and the execution duration. It took 05 minutes 14 seconds 719 milliseconds to execute the Merge Join transformation based package.
Second package used Lookup transformation with Full cache (which is the default setting). creenshots #6 and #7 show the sample package execution and the execution duration. It took 11 minutes 03 seconds 610 milliseconds to execute the Lookup transformation based package. You might encounter the warning message Information: The buffer manager has allocated nnnnn bytes, even though the memory pressure has been detected and repeated attempts to swap buffers have failed. Here is a
that talks about how to calculate lookup cache size. During this package execution, even though the Data flow task completed faster, the Pipeline cleanup took lot of time.
This doesn't mean Lookup transformation is bad. It's just that it has to be used wisely. I use that quite often in my projects but again I don't deal with 10+ million rows for lookup everyday. Usually, my jobs handle between 2 and 3 millions rows and for that the performance is really good. Upto 10 million rows, both performed equally well. Most of the time what I have noticed is that the bottleneck turns out to be the destination component rather than the transformations. You can overcome that by having multiple destinations.
is an example that shows the implementation of multiple destinations.
Screenshot #8 shows the record count in all the three tables. Screenshot #9 shows top 6 records in each of the tables.
Hope that helps.
SQL Scripts:
CREATE TABLE [dbo].[ItemAmount](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ItemNumber] [nvarchar](30) NOT NULL,
[Price] [numeric](18, 2) NOT NULL,
[Discount] [numeric](18, 2) NOT NULL,
[CalculatedAmount] [numeric](18, 2) NOT NULL,
CONSTRAINT [PK_ItemAmount] PRIMARY KEY CLUSTERED ([Id] ASC)) ON [PRIMARY]
CREATE TABLE [dbo].[ItemDiscountInfo](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ItemNumber] [nvarchar](30) NOT NULL,
[Discount] [numeric](18, 2) NOT NULL,
CONSTRAINT [PK_ItemDiscountInfo] PRIMARY KEY CLUSTERED ([Id] ASC)) ON [PRIMARY]
CREATE TABLE [dbo].[ItemPriceInfo](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ItemNumber] [nvarchar](30) NOT NULL,
[Price] [numeric](18, 2) NOT NULL,
CONSTRAINT [PK_ItemPriceInfo] PRIMARY KEY CLUSTERED ([Id] ASC)) ON [PRIMARY]
Screenshot #1:
Screenshot #2:
Screenshot #3:
Screenshot #4:
Screenshot #5:
Screenshot #6:
Screenshot #7:
Screenshot #8:
Screenshot #9:
user756519
A Merge Join is designed to produce results similar to how JOINs work in SQL.
The Lookup component does not work like a SQL JOIN.
Here's an example where the results would differ.
If you have a one-to-many relationship between input 1 (e.g., Invoices) and input 2 (e.g., Invoice Line Items), you want the results of the combining of these two inputs to include one or more rows for a single invoice.
With a Merge Join you will get the desired output.
With a Lookup, where input 2 is the look up source, the output will be one row per invoice, no matter how many rows exist in input 2.
I don't recall which row from input 2 the data would come, but I'm pretty sure you will get a duplicate-data warning, at least.
So, each component has its own role in SSIS.
17.6k94563
I will suggest a third alternative to consider. Your OLE DBSource could contain a query rather than a table and you could do the join there. This is not good in all situations
but when you can use it then you don't have to sort beforehand.
74.2k678144
Lookup is similar to left-join in Merge Join component. Merge can do other types of joins, but if this is what you want, the
difference is mostly in performance and convenience.
Their performance characteristics can be very different depending on relative amount of data to lookup (input to lookup component) and amount of referenced data (lookup cache or lookup data source size).
E.g. if you only need to lookup 10 rows, but referenced data set is 10 millions rows - Lookup using partial-cache or no-cache mode will be faster as it will only fetch 10 records, rather than 10 millions. If you need to lookup 10 millions rows, and referenced data set is 10 rows - fully cached Lookup is probably faster (unless those 10 millions rows are already sorted anyway and you can try Merge Join). If both data sets are large (especially if more than available RAM) or the larger one is sorted - Merge might be better choice.
2,25111215
Merge Join allows you to join to multiple columns based on one or more criterion, whereas a Lookup is more limited in that it only fetches a one or more values based on some matching column information -- the lookup query is going to be run for each value in your data source (though SSIS will cache the data source if it can).
It really depends on what your two data sources contain and how you want your final source to look after the merge. Could you provide any more details about the schemas in your DTS package?
Another thing to consider is performance. If used incorrectly, each could be slower than the other, but again, it's going to depend on the amount of data you have and your data source schemas.
71.2k15116156
there are 2 differences:
a merge join requires both inputs to be sorted the same way
lookup does not require either input to be sorted.
Database query load:
a merge join does not refer to the database , just the 2 input flows (although the reference data is typically in the form of 'select * from table order by join critera' )
lookup will issue 1 query for each (distinct, if cached) value that it is being asked to join on. This rapidly becomes more expensive than the above select.
This leads to:
if it is no effort to produce a sorted list, and you want more than about 1% of the rows (single row selects being ~100x the cost of the same row when streaming) (you don't want to sort a 10 million row table in memory ..) then merge join is the way to go.
If you only expect a small number of matches (distinct values looked up, when caching is enabled) then lookup is better.
For me, the tradeoff between the two comes between 10k and 100k rows needing to be looked up.
The one which is quicker will depend on
the total number of rows to be processed. (if the table is memory resident, a sort of the data to merge it is cheap)
the number of duplicate lookups expected. (high per-row overhead of lookup)
if you can select sorted data (note, text sorts are influence by code collation, so be careful that what sql considers sorted is also what ssis considers sorted)
what percentage of the entire table you will look up. (merge will require selecting every row, lookup is better if you only have a few rows on one side)
the width of a row (rows per page can strongly influences the io cost of doing single lookups vs a scan) (narrow rows -> more preference for merge)
the order of data on disk (easy to produce sorted output, prefer merge, if you can organised the lookups to be done in physical disk order, lookups are less costly due to less cache misses)
network latency between the ssis server and the destination (larger latency -> prefer merge)
how much coding effort you wish to spend (merge is a bit more complex to write)
the collation of the input data -- SSIS merge has wierd ideas about sorting of text strings which contain non-alphanumeric characters, but are not nvarchar. (this goes to sorting, and getting sql to emit a sort which ssis is happy to merge is hard)
I know this is an old question but one critical point that I feel was not covered by the answers given is that because the merge join is merging two data flows, it can combine data from any source. Whereas with the lookup, one data source must be held in an OLE DB.
6,78832242
Your Answer
Sign up or
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Post as a guest
By posting your answer, you agree to the
Not the answer you're looking for?
Browse other questions tagged
Stack Overflow works best with JavaScript enableddatastage面试300题_甜梦文库
datastage面试300题
1. What are the Environmental variables in Datastage? 2. Check for Job Errors in datastage 3. What are Stage Variables, Derivations and Constants? 4. What is Pipeline Parallelism? 5. Debug stages in PX 6. How do you remove duplicates in dataset 7. What is the difference between Job Control and Job Sequence 8. What is the max size of Data set stage? 9. performance in sort stage 10. How to develop the SCD using LOOKUP stage? 12. What are the errors you expereiced with data stage 13. what are the main diff between server job and parallel job in datastage 14. Why you need Modify Stage? 15. What is the difference between Squential Stage & Dataset Stage. When do u use them. 16. memory allocation while using lookup stage 17. What is Phantom error in the datastage. How to overcome this error. 18. Parameter file usage in Datastage 19. Explain the best approch to do a SCD type2 mapping in parallel job? 20. how can we improve the performance of the job while handling huge amount of data 21. HI How can we create read only jobs in Datastage. 22. how to implement routines in data stage,have any one has any material for data stage 23. How will you determine the sequence of jobs to load into data warehouse? 24. How can we Test jobs in Datastage?? 25. DataStage - delete header and footer on the source sequential 26. How can we implement Slowly Changing Dimensions in DataStage?. 27. Differentiate Database data and Data warehouse data? 28. How to run a Shell Script within the scope of a Data stage job? 29. what is the difference between datastage and informatica 30. Explain about job control language such as (DS_JOBS) 32. What is Invocation ID? 33. How to connect two stages which do not have any common columns between them? 34. In SAP/R3, How do you declare and pass parameters in parallel job . 35. Difference between Hashfile and Sequential File? 36. How do you fix the error &OCI has fetched truncated data& in DataStage 37. A batch is running and it is scheduled to run in 5 minutes. But after 10 days the time changes to 10 minutes. What type of error is this and how to fix it? 38. Which partition we have to use for Aggregate Stage in parallel jobs ? 39. What is the baseline to implement parition or parallel execution method in datastage job.e.g. more than 2 millions records only advised ? 40. how do we create index in data satge? 41. What is the flow of loading data into fact & dimensional tables? 42. What is a sequential file that has single input link?? 43. Aggregators C What does the warning “Hash table has grown to ‘xyz’ ….” mean? 44. what is hashing algorithm? 45. How do you load partial data after job failed source has 10000 records, Job failed after 5000 records are loaded. This status of the job is abort , Instead of removing 5000 records from target , How can i resume the load 46. What is Orchestrate options in generic stage, what are the option names. value ? Name of an Orchestrate operator to call. what are the orchestrate operators available in datastage for AIX environment. 47. Type 30D hash file is GENERIC or SPECIFIC? 48. Is Hashed file an Active or Passive Stage? When will be it useful? 49. How do you extract job parameters from a file? 50. 1.What about System variables? 2.How can we create Containers? 3.How can we improve the performance of DataStage? 4.what are the Job parameters? 5.what is the difference between routine and transform and function? 6.What are all the third party tools used in DataStage? 7.How can we implement Lookup in DataStage Server jobs? 8.How can we implement Slowly Changing Dimensions in DataStage?. 9.How can we join one Oracle source and Sequential file?. 10.What is iconv and oconv functions?51What are the difficulties faced in using DataStage ? or what are the constraints in using DataStage ? 52. Have you ever involved in updating the DS versions like DS 5.X, if so tell us some the steps you have 53. What r XML files and how do you read data from XML files and what stage to be used? 54. How do you track performance statistics and enhance it? 55. Types of vies in Datastage Director? There are 3 types of views in Datastage Director a) Job View - Dates of Jobs Compiled. b) Log View - Status of Job last run c) Status View - Warning Messages, Event Messages, Program Generated Messag 56. What is the default cache size? How do you change the cache size if needed? Default cache size is 256 MB. We can incraese it by going into Datastage Administrator and selecting the Tunable Tab and specify the cache size over there. 57. How do you pass the parameter to the job sequence if the job is running at night? 58. How do you catch bad rows from OCI stage? 59. what is quality stage and profile stage? 60. what is the use and advantage of procedure in datastage? 61. What are the important considerations while using join stage instead of lookups. 62. how to implement type2 slowly changing dimenstion in datastage? give me with example? 63. How to implement the type 2 Slowly Changing dimension in DataStage? 64. What are Static Hash files and Dynamic Hash files? 65. What is the difference between Datastage Server jobs and Datastage Parallel jobs? 66. What is ' insert for update ' in datastage 67. How did u connect to DB2 in your last project? Using DB2 ODBC drivers. 68. How do you merge two files in DS? Either used Copy command as a Before-job subroutine if the metadata of the 2 files are same or created a job to concatenate the 2 files into one if the metadata is different. 69. What is the order of execution done internally in the transformer with the stage editor having input links on the lft hand side and output links?70. How will you call external function or subroutine from datastage? 71. What happens if the job fails at night? 72. Types of Parallel Processing? Parallel Processing is broadly classified into 2 types. a) SMP - Symmetrical Multi Processing. b) MPP - Massive Parallel Processing. 73. What is DS Administrator used for - did u use it? 74. How do you do oracle 4 way inner join if there are 4 oracle input files? 75. How do you pass filename as the parameter for a job? 76. How do you populate source files? 77. How to handle Date convertions in Datastage? Convert a mm/dd/yyyy format to yyyy-dd-mm? We use a) &Iconv& function - Internal Convertion. b) &Oconv& function - External Convertion. Function to convert mm/dd/yyyy format to yyyy-dd-mm is Oconv(Iconv(Filedname,&D/M78. How do you execute datastage job from command line prompt? Using &dsjob& command as follows. dsjob -run -jobstatus projectname jobname 79. Differentiate Primary Key and Partition Key? Primary Key is a combination of unique and not null. It can be a collection of key values called as composite primary key. Partition Key is a just a part of Primary Key. There are several methods of 80 How to install and configure DataStage EE on Sun Micro systems multi-processor hardware running the Solaris 9 operating system? Asked by: Kapil Jayne 81. What are all the third party tools used in DataStage? 82. How do you eliminate duplicate rows? 83. what is the difference between routine and transform and function? 84. Do you know about INTEGRITY/QUALITY stage? 85. how to attach a mtr file (MapTrace) via email and the MapTrace is used to record all the execute map errors 86. Is it possible to calculate a hash total for an EBCDIC file and have the hash total stored as EBCDIC using Datastage? Currently, the total is converted to ASCII, even tho the individual records are stored as EBCDIC. 87. If your running 4 ways parallel and you have 10 stages on the canvas, how many processes does datastage create? 88. Explain the differences between Oracle8i/9i? 89. How will you pass the parameter to the job schedule if the job is running at night? What happens if one job fails in the night? 90. what is an environment variable?? 91. how find duplicate records using transformer stage in server edition 92. what is panthom error in data stage 93. How can we increment the surrogate key value for every insert in to target database94. what is the use of environmental variables? 95. how can we run the batch using command line? 96. what is fact load? 97. Explain a specific scenario where we would use range partitioning ? 98. what is job commit in datastage? 99. hi..Disadvantages of staging area Thanks,Jagan 100. How do you configure api_dump 102. Does type of partitioning change for SMP and MPP systems? 103. what is the difference between RELEASE THE JOB and KILL THE JOB? 104. Can you convert a snow flake schema into star schema? 105. What is repository? 106. What is Fact loading, how to do it? 107. What is the alternative way where we can do job control?? 108.Where we can use these Stages Link Partetionar, Link Collector & Inter Process (OCI) Stage whether in Server Jobs or in Parallel Jobs ?And SMP is a Parallel or Server ? 109. Where can you output data using the Peek Stage? 110. Do u know about METASTAGE? 111. In which situation,we are using RUN TIME COLUMN PROPAGATION option? 112. what is the difference between datasatge and datastage TX? 113. 1 1. Difference between Hashfile and Sequential File?. What is modulus?2 2. What is iconv and oconv functions?.3 3. How can we join one Oracle source and Sequential file?.4 4. How can we implement Slowly Changing Dimensions in DataStage?.5 5. How can we implement Lookup in DataStage Server jobs?.6 6. What are all the third party tools used in DataStage?.7 7. what is the difference between routine and transform and function?.8 8. what are the Job parameters?.9 9. Plug-in?.10 10.How can we improv 114. Is it possible to query a hash file? Justify your answer... 115. How to enable the datastage engine? 116. How I can convert Server Jobs into Parallel Jobs? 117. Suppose you have table &sample& & three columns in that tablesample:Cola Colb Colc1 10 3 30 300Assume: cola is primary keyHow will you fetch the record with maximum cola value using data stage tool into the target system 118. How to parametarise a field in a sequential file?I am using Datastage as ETL Tool,Sequential file as source. 119. What is TX and what is the use of this in DataStage ? As I know TX stand for Transformer Extender, but I don't know how it will work and where we will used ? 120. What is the difference betwen Merge Stage and Lookup Stage? 121. Importance of Surrogate Key in Data warehousing? Surrogate Key is a Primary Key for a Dimension table. Most importance of using it is it is independent of underlying database. i.e Surrogate Key is not affected by the changes going on with a databas 122. What is the difference between Symetrically parallel processing,Massively parallel processing? 123.What is the diffrence between the Dynamic RDBMS Stage & Static RDBMS Stage ? 124. How to run a job using command line? 125. What is user activity in datastage? 126. how can we improve the job performance? 127. how we can create rank using datastge like in informatica 128. What is the use of job controle?? 129. What does # indicate in environment variables? 130. what are two types of hash files?? 131. What are different types of star schema?? 132. what are different types of file formats?? 133. What are different dimension table in your project??Plz explain me with an example??134. what is the difference between buildopts and subroutines ? 135. how can we improve performance in aggregator stage?? 136. What is SQL tuning? how do you do it ? 137. What is the use of tunnable?? 138. how to distinguish the surogate key in different dimensional tables?how can we give for different dimension tables? 139. how can we load source into ODS? 140. What is the difference between sequential file and a dataset? When to use the copy stage? 141. how to eleminate duplicate rows in data stage? 142. What is complex stage? In which situation we are using this one? 143. What is the sequencer stage?? 144. where actually the flat files store?what is the path? 145. what are the different types of lookups in datastage? 146. What are the most important aspects that a beginner must consider doin his first DS project ? 147. how to find errors in job sequence? 148. it is possible to access the same job two users at a time in datastage? 149. how to kill the job in data stage? 150. how to find the process id?explain with steps? 151. Why job sequence is use for? what is batches?what is the difference between job sequence and batches? 152. What is Integrated & Unit testing in DataStage ? 153. What is iconv and oconv functions? 154. For what purpose is the Stage Variable is mainly used? 155. purpose of using the key and difference between Surrogate keys and natural key 156. how to read the data from XL FILES?my problem is my data file having some commas in data,but we are using delimitor is| ?how to read the data ,explain with steps? 157. How can I schedule the cleaning of the file &PH& by dsjob? 158. Hot Fix for ODBC Stage for AS400 V5R4 in Data Stage 7.1 159. what is data stage engine?what is its purpose? 160. What is the difference between Transform and Routine in DataStage? 161. what is the meaning of the following..1)If an input file has an excessive number of rows and can be split-up then use standard 2)logic to run jobs in parallel3)Tuning should occur on a job-by-job basis. Use the power of DBMS. 162. Why is hash file is faster than sequential file n odbc stage?? 163. Hello,Can both Source system(Oracle,SQLServer,...etc) and Target Data warehouse(may be oracle,SQLServer..etc) can be on windows environment or one of the system should be in UNIX/Linux environment.Thanks,Jagan 164. How to write and execute routines for PX jobs in c++? 165. what is a routine? 166. how to distinguish the surrogate key in different dimentional tables? 167. how can we generate a surrogate key in server/parallel jobs? 168. what is NLS in datastage? how we use NLS in Datastage ? what advantages in that ? at the time of installation i am not choosen that NLS option , now i want to use that options what can i do ? to reinstall that datastage or first uninstall and install once again ? 169. how to read the data from XL FILES?explain with steps? 170. whats the meaning of performance tunning techinque,Example?? 171. differentiate between pipeline and partion parallelism? 172. What is the use of Hash file??insted of hash file why can we use sequential file itself?173. what is pivot stage?why are u using?what purpose that stage will be used? 174. How did you handle reject data? 175. Hiwhat is difference betweend ETL and ELT? 176. how can we create environment variables in datasatage? 177. what is the difference between static hash files n dynamic hash files? 178. how can we test the jobs? 179. What is the difference between reference link and straight link ? 180. What are the command line functions that import and export the DS jobs? 181. what is the size of the flat file? 182. Whats difference betweeen operational data stage (ODS) & data warehouse? 183. I have few questions1. What ar ethe various process which starts when the datastage engine starts?2. What are the changes need to be done on the database side, If I have to use dB2 stage?3. datastage engine is responsible for compilation or execution or both? 184. Could anyone plz tell abt the full details of Datastage Certification.Title of Certification?Amount for Certification test?Where can v get the Tutorials available for certification?Who is Conducting the Certification Exam?Whether any training institute or person for guidens?I am very much pleased if anyone enlightwn me abt the above saidSuresh 185. how to use rank&updatestratergy in datastage 186. What is Ad-Hoc access? What is the difference between Managed Query and Ad-Hoc access? 187. What is Runtime Column Propagation and how to use it? 188. how we use the DataStage Director and its run-time engine to schedule running the solution, testing and debugging its components, and monitoring the resulting e/xecutable versions on ad hoc or scheduled basis? 189. What is the difference bitween OCI stage and ODBC stage? 190. Is there any difference b/n Ascential DataStage and DataStage. 191. How do you remove duplicates without using remove duplicate stage? 192. if we using two sources having same meta data and how to check the data in two sorces is same or n if we using two sources having same meta data and how to check the data in two sorces is same or not?and if the data is not same i want to abort the job ?how we can do this? 193. If a DataStage job aborts after say 1000 records, how to continue the job from 1000th record after fixing the error? 194. Can you tell me for what puorpse .dsx files are used in the datasatage 195. how do u clean the datastage repository. 196. give one real time situation where link partitioner stage used? 197. What is environment variables?what is the use of this? 198. How do you call procedures in datastage? 199. How to remove duplicates in server job 200. What is the exact difference betwwen Join,Merge and Lookup Stage?? 202. What are the new features of Datastage 7.1 from datastage 6.1 203. How to run the job in command prompt in unix? 204. How to know the no.of records in a sequential file before running a server job? 205. Other than Round Robin, What is the algorithm used in link collecter? Also Explain How it will works? 206. how to drop the index befor loading data in target and how to rebuild it in data stage? 207. How can ETL excel file to Datamart? 208. what is the transaction size and array size in OCI stage?how these can be used? 209. what is job control?how it is developed?explain with steps? 210. My requirement is like this :Here is the codification suggested: SALE_HEADER_XXXXX_YYYYMMDD.PSVSALE My requirement is like this :Here is the codification suggested: SALE_HEADER_XXXXX_YYYYMMDD.PSVSALE_LINE_XXXXX_YYYY MMDD.PSVXXXXX = LVM sequence to ensure unicity and continuity of file exchangesCaution, there will an increment to implement.YYYYMMDD = LVM date of file creation COMPRESSION AND DELIVERY TO: SALE_HEADER_XXXXX_YYYYMMDD.ZIP AND SALE_LINE_XXXXX_YYYYMMDD.ZIPif we run that job the target file names are like this sale_header_1_ & sale_line_1_.If we run next time means the 211. what is the purpose of exception activity in data stage 7.5? 212. How to implement slowly changing dimentions in Datastage? 213. What does separation option in static hash-file mean? 214. how to improve the performance of hash file? 215. Actually my requirement is like that :Here is the codification suggested: SALE_HEADER_XXXXX_YYYYMM Actually my requirement is like that :Here is the codification suggested: SALE_HEADER_XXXXX_YYYYMMDD.PSVSALE_LINE_XXXXX_YYYY MMDD.PSVXXXXX = LVM sequence to ensure unicity and continuity of file exchangesCaution, there will an increment to implement.YYYYMMDD = LVM date of file creation COMPRESSION AND DELIVERY TO: SALE_HEADER_XXXXX_YYYYMMDD.ZIP AND SALE_LINE_XXXXX_YYYYMMDD.ZIPif we run that job the target file names are like this sale_header_1_ & sale_line_1_.if we run next 216. How do u check for the consistency and integrity of model and repository? 217. how we can call the routine in datastage job?explain with steps? 218. what is job control?how can it used explain with steps? 219. how to find the number of rows in a sequential file? 220. If the size of the Hash file exceeds 2GB..What happens? Does it overwrite the current rows? 221. where we use link partitioner in data stage job?explain with example? 222 How i create datastage Engine stop start script.Actually my idea is as below.!#bin/bashdsadm usersu rootpassword (encript)DSHOMEBIN=/Ascential/DataStage/home/dsadm/Ascential/DataStage/DSEngine/binif check ps -ef | grep DataStage (client connection is there) { kill -9 PID (client connection) }uv -admin - stop & dev/nulluv -admin - start & dev/nullverify processcheck the connectionecho &Started properly&run it as dsadm 223. can we use shared container as lookup in datastage server jobs? 224. what is the meaning of instace in data stage?explain with examples? 225. wht is the difference beteen validated ok and compiled in datastage. 226. hi all what is auditstage,profilestage,qulaitystages in datastge please explain indetail 227what is PROFILE STAGE , QUALITY STAGE,AUDIT STAGE in datastage..please expalin in detail.thanks in adv 228. what are the environment variables in datastage?give some examples? 229. What is difference between Merge stage and Join stage? 230. Hican any one can explain what areDB2 UDB utilitiesub 231. What is the difference between drs and odbc stage 232. Will the data stage consider the second constraint in the transformer once the first condition is satisfied ( if the link odering is given) 233. How do you do Usage analysis in datastage ? 234. how can u implement slowly changed dimensions in datastage? explain?2) can u join flat file and database in datastage?how? 235. How can you implement Complex Jobs in datastage 236. DataStage from Staging to MDW is only running at 1 row per second! What do we do to remedy? 237. what is the mean of Try to have the constraints in the 'Selection' criteria of the jobs i what is the mean of Try to have the constraints in the 'Selection' criteria of the jobs itself. This will eliminate the unnecessary records even getting in before joins are made? 238. * What are constraints and derivation?* Explain the process of taking backup in DataStage?*What are the different types of lookups available in DataStage? 239. # How does DataStage handle the user security? 240. What are the Steps involved in development of a job in DataStage? 241. What is a project? Specify its various components?242. What does a Config File in parallel extender consist of? Config file consists of the following. a) Number of Processes or Nodes. b) Actual Disk Storage Location. 243. how to implement type2 slowly changing dimensions in data stage?explain with example? 244. How much would be the size of the database in DataStage ?What is the difference between Inprocess and Interprocess ? 245. Briefly describe the various client components? 246. What are orabulk and bcp stages? 247. What is DS Director used for - did u use it? 248. what is meaning of file extender in data stage server jobs.can we run the data stage job from one job to another job that file data where it is stored and what is the file extender in ds jobs. 249. What is the max capacity of Hash file in DataStage? 250. what is merge and how it can be done plz explain with simple example taking 2 tables ....... 251. it is possible to run parallel jobs in server jobs? 252. what are the enhancements made in datastage 7.5 compare with 7.0 253. If I add a new environment variable in Windows, how can I access it in DataStage? 254. what is OCI? 255. Is it possible to move the data from oracle ware house to SAP Warehouse using with DATASTAGE Tool. 256. How can we create Containers? 257. what is data set? and what is file set? 258. How can I extract data from DB2 (on IBM iSeries) to the data warehouse via Datastage as the ETL tool. I mean do I first need to use ODBC to create connectivity and use an adapter for the extraction and transformation of data? Thanks so much if anybody could provide an answer. 259. it is possible to call one job in another job in server jobs? 260. how can we pass parameters to job by using file. 261. How can we implement Lookup in DataStage Server jobs? 262. what user varibale activity when it used how it used !where it is used with real example 263. Did you Parameterize the job or hard-coded the values in the jobs? Always parameterized the job. Either the values are coming from Job Properties or from a ‘Parameter Manager’ C a third part tool. There is no way you will hardCcode some parameters in your jobs. The o 264. what is hashing algorithm and explain breafly how it works? 265. what happends out put of hash file is connected to transformer ..what error it throughs 266. what is merge ?and how to use merge? merge is nothing but a filter conditions that have been used for filter condition 267. What will you in a situation where somebody wants to send you a file and use that file as an input What will you in a situation where somebody wants to send you a file and use that file as an input or reference and then run job. 268. What is the NLS equivalent to NLS oracle code American_America.US7ASCII on Datastage NLS? 269. Why do you use SQL LOADER or OCI STAGE? 270. What about System variables? 271. what are the differences between the data stage 7.0 and 7.5in server jobs? 272. How the hash file is doing lookup in serverjobs?How is it comparing the key values? 273. how to handle the rejected rows in datastage? 274. how is datastage 4.0 functionally different from the enterprise edition now?? what are the exact changes? 275. What is Hash file stage and what is it used for? Used for Look-ups. It is like a reference table. It is also used in-place of ODBC, OCI tables for better performance.276. What is the utility you use to schedule the jobs on a UNIX server other than using Ascential Director? Use crontab utility along with d***ecute() function along with proper parameters passed. 277. How can I connect my DB2 database on AS400 to DataStage? Do I need to use ODBC 1st to open the database connectivity and then use an adapter for just connecting between the two? Thanks alot of any replies. 278. what is the OCI? and how to use the ETL Tools? OCI means orabulk data which used client having bulk data its retrive time is much more ie., your used to orabulk data the divided and retrived Asked by: ramanamv 279. what is difference between serverjobs & paraller jobs 280. What is the difference between Datastage and Datastage TX? 281. Hi!Can any one tell me how to extract data from more than 1 hetrogenious Sources.mean, example 1 sequenal file, Sybase , Oracle in a singale Job. 282. How can we improve the performance of DataStage jobs? 283. How good are you with your PL/SQL? On the scale of 1-10 say 8.5-9 284. What are OConv () and Iconv () functions and where are they used? IConv() - Converts a string to an internal storage formatOConv() - Converts an expression to an output format. 285. If data is partitioned in your job on key 1 and then you aggregate on key 2, what issues could arise? 286. How can I specify a filter command for processing data while defining sequential file output data? 287. There are three different types of user-created stages available for PX. What are they? Which would you use? What are the disadvantage for using each type? 288. What is DS Manager used for - did u use it? 289. What are Sequencers? Sequencers are job control programs that execute other jobs with preset Job parameters. 290. Functionality of Link Partitioner and Link Collector? 291. Containers : Usage and Types? Container is a collection of stages used for the purpose of Reusability. There are 2 types of Containers. a) Local Container: Job Specific b) Shared Container: Used in any job within a project. 292. Does Enterprise Edition only add the parallel processing for better performance?Are any stages/transformations available in the enterprise edition only? 293. what are validations you perform after creating jobs in designer.what r the different type of errors u faced during loading and how u solve them 294. how can you do incremental load in datastage? 295. how we use NLS function in Datastage? what are advantages of NLS function? where we can use that one? explain briefly? 296. Dimension Modelling types along with their significance Data Modelling is Broadly classified into 2 types. a) E-R Diagrams (Entity - Relatioships). b) Dimensional Modelling. 297. Did you work in UNIX environment? Yes. One of the most important requirements. 298. What other ETL's you have worked with? Informatica and also DataJunction if it is present in your Resume. 299. What is APT_CONFIG in datastage 300. Does the BibhudataStage Oracle plug-in better than OCI plug-in coming from DataStage? What is theBibhudataStage extra functions? 301. How do we do the automation of dsjobs? 302. what is trouble shhoting in server jobs ? what are the diff kinds of errors encountered while running any job? 303. what is Data stage Multi-byte, Single-byte file conversions?how we use that conversions in data stage? 304. What are other Performance tunings you have done in your last project to increase the performance of slowly running jobs? Staged the data coming from ODBC/OCI/DB2UDB stages or any database on the server using Hash/Sequential files for optimum performance also for data recovery in case job aborts.Tuned the OCI stage for '305. what is DataStage Multi-byte, Single-byte file conversions in Mainframe jobs? what is UTF 8 ? whats use of UTF 8 ? 306. What Happens if RCP is disable ? 307. What are Routines and where/how are they written and have you written any routines before? Routines are stored in the Routines branch of the DataStage Repository, where you can create, view or edit. The following are different types of routines: 1) Transform functions 308. What is version Control? 309. Hi, What are the Repository Tables in DataStage and What are they? 310. I want to process 3 files in sequentially one by one , how can i do that. while processing the files it should fetch files automatically . 311. where does unix script of datastage executes weather in clinet machine or in server.suppose if it eexcutes on server then it will execute ? 312. please list out the versions of datastage Parallel , server editions and in which year they are realised. 313. what are the Job parameters? 314. defaults nodes for datastage parallel Edition 315. Orchestrate Vs Datastage Parallel Extender? 316. Dimensional modelling is again sub divided into 2 types. a)Star Schema - Simple & Much Faster. Denormalized form. b)Snowflake Schema - Complex with more Granularity. More normalized form. 317. Tell me the environment in your last projects Give the OS of the Server and the OS of the Client of your recent most project 318. How can we join one Oracle source and Sequential file?. 319. What is Modulus and Splitting in Dynamic Hashed File? In a Hashed File, the size of the file keeps changing randomly. If the size of the file increases it is called as &Modulus&. If the size of the file decreases it is called as &Splitting 320. Scenario based Question ........... Suppose that 4 job control by the sequencer like (job 1, job 2, job 3, job 4 )if job 1 have 10,000 row ,after run the job only 5000 data has been loaded in target table remaining are not loaded and your job going to be aborted then.. How can short out the problem. Suppose job sequencer synchronies or control 4 job but job 1 have problem, in this condition should go director and check it what type of problem showing either data type problem, warning massage, job Asked by: Mukesh Kumar Madhav 321. What is the Batch Program and how can generate ? Batch programe is the programe it's generate run time to maintain by the datastage it self but u can easy to change own the basis of your requirement (Extraction, Transformation,Loading) .Batch progr 322. How many places u can call Routines? Four Places u can call (i) Transform of routine (A) Date Transformation (B) Upstring Transformation (ii) Transform of the Before & After Subroutines(iii) XML transformation(iv)Web base t Asked by: Mukesh Kumar Madhav 323. How many jobs have you created in your last project? 100+ jobs for every 6 months if you are in Development, if you are in testing 40 jobs for every 6 months although it need not be the same number for everybody 324. what's the difference between Datastage Developers and Datastage Designers. What are the skill's required for this. 325. Could you please help me with a set of questions on Parallel Extender? 326. what is difference between data stage and informatica 327. Suppose if there are million records did you use OCI? if not then what stage do you prefer? 328. What are types of Hashed File? 329. How do you eliminate duplicate rows? 330. What is DS Designer used for - did u use it? 331. Compare and Contrast ODBC and Plug-In stages? ODBC : a) Poor Performance. b) Can be used for Variety of Databases. c) Can handle Stored Procedures. Plug-In: a) Good Performance. b) Database specific.(Only one database) c) Cannot handle Stored Pr 332. What is project life cycle and how do you implement it? 333. Explain your last project and your role in it.?334. What are the often used Stages or stages you worked with in your last project? A) Transformer, ORAOCI8/9, ODBC, Link-Partitioner, Link-Collector, Hash, ODBC, Aggregator, Sort. 335. Have you ever involved in updating the DS versions like DS 5.X, if so tell us some the steps you have taken in doing so? Yes. The following a I have taken in doing so:1) Definitely take a back up of the whole project(s) by exporting the project as a .dsx file2) See that you are using the same parent 336. What versions of DS you worked with? DS 7.0.2/6.0/5.2 337. If worked with DS6.0 and latest versions what are Link-Partitioner and Link-Collector used for? Link Partitioner - Used for partitioning the data.Link Collector - Used for collecting the partitioned data. 338. How did you handle an 'Aborted' sequencer? In almost all cases we have to delete the data inserted by this from DB manually and fix the job and then run the job again. 339. How did u connect with DB2 in your last project? Most of the times the data was sent to us in the form of flat files. The data is dumped and sent to us. In some cases were we need to connect to DB2 for look-ups as an instance then we used ODBC drive 340. Read the String functions in DS Functions like [] -& sub-string function and ':' 341. How would call an external Java function which are not supported by DataStage? Starting from DS 6.0 we have the ability to call external Java functions using a Java package from Ascential. In this case we can even use the command line to invoke the Java function and write the re 342. The above might rise another question: Why do we have to load the dimensional tables first, then fact tables: As we load the dimensional tables the keys (primary) are generated and these keys (primary) are Foreign keys in Fact tables. 343. Tell me one situation from your last project, where you had faced problem and How did u solve it? A. The jobs in which data is read directly from OCI stages are running extremely slow. I had to stage the data before sending to the transformer to make the jobs run faster.B. The job aborts344. Does the selection of 'Clear the table and Insert rows' in the ODBC stage send a Truncate statement to the DB or does it do some kind of Delete logic. There is no TRUNCATE on ODBC stages. It is Clear table blah blah and that is a delete from statement. On an OCI stage such as Oracle, you do have both Clear and Truncate options. They are radically di 345. How do you rename all of the jobs to support your new File-naming conventions? Create a Excel spreadsheet with new and old names. Export the whole project as a dsx. Write a Perl program, which can do a simple rename of the strings looking up the Excel file. 346. When should we use ODS? DWH's are typically read only, batch updated on a scheduleODS's are maintained in more real time, trickle fed constantly 347. how to create batches in Datastage from command prompt
更多相关文档}

我要回帖

更多关于 寇准求教 的文章

更多推荐

版权声明:文章内容来源于网络,版权归原作者所有,如有侵权请点击这里与我们联系,我们将及时删除。

点击添加站长微信