Quantcast
Channel: SCN : Blog List - Data Services and Data Quality
Viewing all 222 articles
Browse latest View live

Match strings with Match_Pattern

$
0
0

Match pattern is a function in validation transform. It is used to match the input strings. This function can be used to compare alphabets (a-z, A-Z), numbers (0-9) and special characters.

 

Match_pattern cannot be used to match sub-strings.

 

Syntax

 

match_pattern(input_string,pattern_string)

 

Here,

 

input_string is the string to be matched. It could be alphabets, numbers, etc.

pattern_string is the pattern that you want to find in the whole string.

 

Return value

 

The return value for this function is 0 or 1.

 

If the return value is 1 then the input string matches.

If the return value is 0 then the input string does not match.

 

The below table shows the examples with patterns:

 

Pattern with examples

Use

Result

  print(match_pattern(‘Janani’, ‘Xxxxxx’));

x - Used for lowercase alphabets.

Return value:1

  print(match_pattern(‘JANANI’, ‘Xxxxxx’));

 

 

 

 

  print(match_pattern(‘JANANI’, ‘XXXXXX’));

X – Used for uppercase letters.

Return value: 0

 

 

 

 

Return value:1

  print(Match_pattern('Jeni Krish', 'Xxxx Xxxxx'));

 

Return value:1

print(Match_pattern(123,999));

9 - Used for numbers

Return value:1

print(match_pattern('jeni4','jeni[!\3]'));

 

 

 

 

 

print(match_pattern('jeni3','jeni[!\3]'));

\  - Escape character. It is used to avoid a number specifically.

Return value:1 since number 3 is not found in the string.

 

 

 

Return value: 0 since number 3 is found in the string.

print(match_pattern('janani','*'));

*- Used for characters appearing 0 or more times.

Return value:1

print(match_pattern('a1','a?'));

 

 

 

 

 

 

 

 

print(match_pattern('a1sdf','a?'));

? -- Characters occurring one and only once

Return value: 1 since after the character a only one character should appear.

 

 

 

 

 

Return value: 0 since after the character a many characters appear.

print(match_pattern('a1','a[123]'));

 

 

 

 

 

 

 

 

 

print(match_pattern('a4','a[123]'));

[ ]-- Characters occurring inside the braces only one time.

Return value: 1 since character 1 is in the list of pattern string.

 

 

 

 

 

 

 

Return value: 0 since character 4 is not in the list of pattern string.

print(match_pattern('Akash' , '[!A]' ));

[!]--Any character but not the characters that appears after the exclamation point.

Eg: (i.e. [!AB] can allow any, say Name, that does not start with a A or B.

Return value: 0 since string starting with alphabet A should be avoided.


Consuming REST web service (with Parameters) in BODS using HTTP Adapter

$
0
0

Let us try to understand, how to access a REST based web service (with parameters) in batch job from BODS. This feature is supported from BODS 4.0 SP3 patch3.

 

In this example, we will try to call an external web service by passing two values viz. material number and type as arguments and will capture the material attribute details as return by the web service function.

 

To implement this in SAP Data Services, first of all we need to create a HTTP adapter instance, then add an operation to the adapter instance and create HTTP Adapter datastore. Next we need to Import the required function module. Finally build a Job and a Dataflow to consume the function module.

 

To add a HTTP adapter instance in the Administrator

 

  • Select Adapter Instances > Job Server.
  • Click the Adapter Configuration tab.
  • Click Add.
  • Select the adapter from the list of those available on this Job Server.
  • Enter the required information to create an HTTP adapter instance.
  • Click Apply.

Capture1.PNG

 

 

Let’s name the Adapter Instance name as HttpAdapter. For the other fields, select default values.

 

The next step is to configure adapter operations. Adapter operations identify the integration operations available with the configured adapter instance.

 

To add an operation instance to an adapter instance

 

  • Select Adapter Instances > Job Server.
  • Click the Adapter Configuration tab.
  • Under Dependent Objects, click Operations.
  • Click Add to configure a new operation.
  • Select an operation type from the list (here Request/Reply), then click Apply. The options that appear depend on the operation specific design.
  • Complete the operation instance configuration form.
  • Click Apply.

 

Here for this demonstration I am using the below dummy web service.

 

http://test-batch-webapp/d/api/testcatalog/v1/shcoffering/pid/?client=COMMERCIAL&pid=71240403000

 

Capture2.PNG

 

 

 

  • Give name of the input xsd in ‘Request message format’.

  Input parameters can be converted to xsd as follows(the reason is that BODS can only use xml format as input and output):

 

 

Capture3.PNG

 

 

  • Give schema name of the reply xml in ‘Response message format’.
  • Give ‘param-value’ in ‘convert input XML to’ column. This is required to convert the input xml to parameters.

 

   The next step is to create the Datastore. Go to the Datastore tab of the Local Object Library. Right-click and select New to create a Datastore.

 

 

For Datastore type, select Adapter. For Job Server, select the Job Server configured to handle HTTP adapter. For Adapter instance name, choose the instance name configured in the Administrator. Click OK to save values and create the datastore.

 

.

Capture4.PNG

 

To import message functions

 

 

  • In the Designer, double-click the HTTP data store .The Adapter Metadata Browser window opens.
  • Right-click the operation instance to import and select Import from the menu.

 

  

The operation instance OPERATION_GET_DESC is added to the datastore.

 

So now we can find the imported function under the Message Functions section of the HTTP Datastore of the Local Object Library.

 

 

Double-click the function module to preview the Schema Definition. As mentioned previously this function module expects an input XML Schema as REQUEST and also returns an output XML Schema as REPLY or response.

Capture5.PNG

 

Now a Dataflow in a data services job can consume the function call. Below is the implementation screenshot of the Dataflow.

 

Capture6.PNG

 

 

Here the Source Table has columns CLIENT and PID. The requirement here is to pass this information to web service and get the reply message.

 

Here the query transform Q_Test_Nest is used to generate a XML Schema(TEST_ADAPTER_INPUT) as per the required input template for the Web Service Function. In query transform Q_Test_XML_Func_Call , function OPERATION_GET_DESC is called by using New function call and passed the generated XML schema (TEST_ADAPTER_INPUT)as input arguments to the Web Service Function.

Capture7.PNG

 

 

Query Transform Q_Unnest_Records is used to UNNEST the Return Schema and finally the required fields from the response xml is populated to the table C_TEST_STG.

Creating a data flow for a real-time service in Designer to be exposed as a web service call

$
0
0

Hi everyone! I am new to SAP Data Services and have been assigned to create a project in Data Services that can be accessed with a Web service. The function of this project is to cleanse the input addresses. After reading the documentations, I have got the basics of how to do this. For example, I know that I need to use the Designer to create a real-time service that wraps a data flow to cleanse the address. Then I use the console manager to add this real-time service to an Access server and publish it as an operation of the Data Services Webservices. The issue that I have is that I am not sure about how to actually create the data flow. I know that I need an input/output schema with an address-cleanse transform in the middle and the data mappings. But I have not found a document or a page that step by step describes such a data flow and walks through an example. I wonder if you could please help me with a link or point me to a blog or tutorial somewhere that explains this scenario. I sincerely appreciate your answers. Thanks!

SCD Type Implementation in BODS

$
0
0

Here I am trying to explain the methods to implement SCD types in BO Data Service. The different types of slowly changing dimension types are given below.

1.       Type 0

2.       Type 1

3.       Type 2

4.       Type 3

5.       Type  4

6.       Type 6/Hybrid.

 

In this document I will explain about first five types of SCD types with examples.

 

Source Data

 

Name

IDate

designation

John

2002.12.01

A

Jay

2012.12.01

A

Jasil

2012.12.01

A

 

SCD TYPE 0

 

The SCD Type 0 method is passive. Value remains the same as it were at the time the dimension record was first entered. Type 0 also applies to most date dimension attributes.

 

SCD TYPE 1

 

This method does not track any history data .This methodology overwrite old data with new data without keeping the history. This method mainly used for misspelled names.

Let consider below given data is our target data after the first run.

 

ID

Name

IDate

designation

1

John

2002.12.01

A

2

Jay

2012.12.01

A

3

Jasil

2012.12.01

A

 

 

During the next run, Consider the designation of John is changed to ‘B’ on the date 2003.12.01 then the output will be.

 

ID

Name

IDate

designation

1

John

2003.12.01

B

2

Jay

2012.12.01

A

3

Jasil

2012.12.01

A

 

Here the no history is preserved the designation and Idate are updated with new value.

 

In BODS it can be implemented by using following transformation.

SCD1.JPG

 

Source:- Test_Effective_date is our source given above (Section - Source Data)

 

QR_MAP :- Map the source data to query transform without applying any transformation.

 

TBL_CPM :- Table comparison used to compare source data and the target table data .

 

MP_OPR :- This will be used to insert new data and update old data.

 

KEY_GEN :- This transform used to generate a surrogated key .

 

SCD TYPE 2

 

In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key. Here 2 new column columns are inserted called start_date and end_date.

 

Let consider below given data is our target data after the first run.

 

ID

Name

IDate

designation

Start_Date

End_Date

1

John

2002.12.01

A

2002.12.01

9000.12.31

2

Jay

2002.12.01

A

2002.12.01

9000.12.31

3

Jasil

2002.12.01

A

2002.12.01

9000.12.31

 

During the next run, Consider the designation of John is changed to ‘B’ on the date 2003.12.01 then the output will be.

 

 

ID

Name

IDate

Designation

Start_date

End_date

1

John

2002.12.01

A

2002.12.01

2003.12.01

2

Jay

2002.12.01

A

2002.12.01

9000.12.31

3

Jasil

2002.12.01

A

2002.12.01

9000.12.31

4

John

2003.12.01

B

2003.12.01

9000.12.31

 

Here a new row is inserted and the end_date of the first row will be the start date of the newly  updated value. (note:-‘ 9000.12.31’ is the default value)

 

In BODS it can be implemented by using following transformation.

 

 

SCD2.JPG

 

SOURCE : This is our data as given above(Section - Source Data)

 

QR_MAP : Map the source data to query transform without applying any transformation.

 

TBL_CPM :- Table comparison used to compare source data and the target table data .

 

HIS_PRES:- This transform is used to store the history data and if any updates on source data then new row will be inserted.

 

KEY_GEN :- This transform used to generate a surrogated key .

 

SCD TYPE 3

 

This method tracks changes using separate columns and preserves limited history. The Type II preserves unlimited history as it's limited to the number of columns designated for storing historical data. The original table structure of Type II differs from Type II by Type III adds additional columns. In the following example, an additional column has been added to the table to record the original target- only the previous history is stored.

 

Let consider below given data is our target data after the first run.

 

ID

Name

IDate

Curr_Designation

Effective_date

Pre_Designation

1

John

2002.12.01

A

2002.12.01

A

2

Jay

2002.12.01

A

2002.12.01

A

3

Jasil

2002.12.01

A

2002.12.01

A

 

During the next run, Consider the designation of John is changed to ‘B’ on the date 2003.12.01 then the output will be.

 

In the target output an extra column will be added to keep the previous value of that particular column.

 

ID

Name

IDate

Curr_Designation

Effective_date

Pre_designaion

1

John

2003.12.01

B

2003.12.01

A

2

Jay

2002.12.01

A

2002.12.01

A

3

Jasil

2002.12.01

A

2002.12.01

A

 

 

(Note:- on the first run the Pre_designaion and Current Designation are same)

 

In BODS it can be implemented by using following transformation.

 

SCD3.JPG

 

SOURCE : This is our data as given above(Section - Source Data).

 

QR_JOIN : This query transform is used to join the Source with target.

                       This will be a left outer join and the Source will be the outer source and the target is the inner source. This join will based on the name column on the both side.(Source.name = SCD3.name)

 

QR_INSERT :- This query transform used to filter the data which is new that means SCD3.name is null .

 

QR_UPDATE : This query transform used to filter the data which is already existed in target table but the designation is updated that means SCD3.name is not null and the designation from the source and the previous designation from the SCD3 table are not same.

 

MP_UPDATE : This transform used to update the target table by setting the map operation as  ‘normal to update’

 

KEY_GEN :- Key generation is used to generate an surrogated key for newly inserted row.

 

During the first runthe source file have the data as given above and the target table will not have any data so that all set of data will moved to QR_INSERT flow and loaded into target table.

 

SCD TYPE 4

 

SCD Type 4 design technique is used when SCD Type 2 dimension grows rapidly due to the frequently changing dimension attributes. In SCD Type 4, frequently changing attributes will be removed from the main Table and added in to a History Table.

The Type 4 method is usually referred to as using "history tables", where one table keeps the current data and an additional table is used to keep a record of some or all changes.

 

Let consider below given data is our target data after the first run.

 

ID

Name

IDate

designation

1

John

2002.12.01

A

2

Jay

2002.12.01

A

3

Jasil

2002.12.01

A

 

And the History table will not have any data.

 

During the next run, Consider the designation of John is changed to ‘B’ on the date 2003.12.01 then the output will be. There will be two table one will keep the current data and other will be the changed data (history).

 

SCD4_CUR

 

ID

Name

IDate

designation

1

John

2003.12.01

B

2

Jay

2002.12.01

A

3

Jasil

2002.12.01

A

 

SCD4_HIST

 

The history table will have an addition column compared to  the source that will be created date.

 

ID

Name

IDate

designation

Created_date

1

John

2002.12.01

A

  1. 2003.12.01

 

(Note: created date will be is the date when the designation is changed)

 

In BODS it can be implemented by using following transformation.

 

SCD4.JPG

 

SOURCE : This is our data as given above(Section - Source Data).

 

QR_JOIN : This query transform is used to join the Source with target.

                       This will be a left outer join and the Source will be the outer source and the target is the inner source. This join will based on the name column on the both side.(Source.name = SCD4_Curr.name)

 

QR_INSERT :- This query transform used to filter the data which is new that means SCD3.name is null .

 

QR_UPDATE : This query transform used to filter the data which is already existed in target table but the designation is updated that means SCD4_Curr.name is not null and the designation from the source and the previous designation from the SCD4_curr table are not same.

 

QR_CURR: This Query transform is used to updated the target table with new designation .

 

QR_HIST : This Query transform is used to updated the target table with Previous designation .

 

MP_UPDATE : This transform used to update the target table by setting the map operation as  ‘normal to update’

 

KEY_GEN :- Key generation is used to generate an surrogated key for newly inserted row.

 

During the first runthe source file have the data as given above and the target table will not have any data so that all set of data will moved to QR_INSERT flow and loaded into target table.

There are different methods to implement SCD types in BODS and above explained is one method to implement the same.

 

Regards

Asgar

SAP Business Objects Data Quality Management ( DQM ) Enhancements for more fields other than address data for Duplicate Check.

$
0
0

Relevant Systems


These enhancements were successfully implemented in the following systems:

 

  1. Customer Relationship Management ( CRM ) 7.02 SP 4
  2. SAP Business Objects Data Quality Management, version for SAP Solutions 4.0 SP2

 

This blog is relevant for:

 

  • ABAP Developers who have been asked to include additional fields ( such as Date of Birth ) for duplicate matching with DQM
  • ABAP Developers who have been asked to encapsulate the DQM functionality into a re-usable function / object.

 

This blog does not cover:

 

  • Details of the Postal Validation, we are only concerned with Duplicate Check

 

 

Reference Material


User Guide:

http://help.sap.com/businessobject/product_guides/DQMgSAP/en/1213_dqsap_usersguide_en.pdf

 

 

 

Introduction


DQM is an add-on to your ERP / CRM system to provide address validation ( SAP calls it PV for Postal Validation ) and duplicate checking for Business Partners.

 

Technical details in this blog are relevant for a CRM 7.02 system as this is where I’ve implemented the enhancements explained later successfully however they ( enhancements ) should be relevant for any system in accordance with the DQM  install guide.

 

In this blog entry I’ll only cover off items within a CRM system, I’m trying to get my colleague Brian Kuzmanoski to write a separate blog that deals with the DQM side of things for completeness of the solution overview.

 

Keep in mind I’m not going to explore every nook and cranny about the DQM add-on and this blog is just one answer to a couple of questions I’ve seen SCN, this is the first time I’ve touched DQM so have probably gone through the same learning curve as yourselves.

 

This is a great tool to provide address validation and formatting plus Business Partner duplicate checks in conjunction with an address directory, in my case the Australian address directory which is added to the DQM server ( every address in Australia ).

 

Installing the add-on out of the box with CRM and performing the initial setup, you get address and duplicate checking via T-CODE “BP” and automatic integration into the Account BSP Components.

 

The one catch I’ve found with DQM is that the duplicate checking is address focused which leads nicely into our problem definition.   But first a brief look at how the duplicate check solution works:

 

 

 

SPRO Activities


When you’ve successfully installed DQM on your CRM system, you can find all your SPRO activities under:

 

SAP NetWeaver->Application Server->Basis Services->Address Management->SAP BusinessObjects Data Quality Management

SAP NetWeaver->Application Server->Basis Services->Address Management->Duplicate Check

 

spro.png

 

 

Typical Scenario

 

You need to create a new customer account record in CRM, you go to your BP Account BSP or transaction BP in the backend CRM system and start entering their details.  You then click Save and trigger the duplicate check.  The following process is triggered in CRM:

 

driver_flow.png

 

  1. A “Driver” record is constructed which is the record you want to create in the SAP system ( only in memory, not in the DB yet ).
  2. A selection of “Passenger” records are selected based on the Match Code from the CRM Database
  3. The “Driver” and “Passenger” records are passed to DQM to perform the matching algorithms
  4. A result set is determined with percentile weightings for the matching routines and takes into consideration your configured “Threshold” ( configured in SPRO ).

 

 

The Match Code – Table /FLDQ/AD_MTCCODE


As part of the setup of DQM you would have run three reports ( refer to the user guide ) one of these reports went through all your Business Partners and generated a Match Code and stored it in table /FLDQ/AD_MTCCODE.


The Match Code is just an index that reduces the number of potential records with which DQM needs to run it’s matching routines.

 

When a Driver record is created, DQM knows how to generate it’s Match Code. When the “Candidate Selection” occurs ( selecting all the records with the same Match Code in CRM ) you are taking a slice of possible data rather than comparing every Business Partner in your database.  This is obviously for efficiency however there are some pitfalls, we’ll cover these later.

 

 

The Results


If there are any possible matches then you may see several screens, one for the Address Validation and one if there are any possible duplicate records.

Presto, you’ve stopped a potential duplicate record entering your CRM system or you’ve validated and formatted the new customer’s address.

Notice from the screenshots there is a Similarity %, this is what we’re interested in, which begs the questions:

 

  • How can DQM put more weight on Date of Birth and Gender rather than the address details
  • How do I pass the additional fields across to DQM
  • Do I need to go through a stack of enhancements and potentially break the standard SAP solution?
  • What if I have my own services to create customers in CRM, can I encapsulate the Duplicate Check functionality and re-use it in my own code?

 

 

 

Problem Definition


You’ve installed DQM and taken advantage of the standard functionality provided, but you realise that you have requirements to match on more than address data such as Date of Birth, Middle Name or Gender.

 

You also want to create your own matching algorithms based on these additional fields on the DQM side which are not so address focused and want to put percentile weightings on the additional fields over address fields.

 

Remember what we talked about before, DQM is address so it doesn’t support all the Business Partner fields in the Driver and Passenger records when they are passed to DQM.

 

If you think of a typical scenario you probably want to match Date of Birth or Middle Name or Gender in combination with address details.  Out of the box you can’t use these three fields.

 

You may have custom services that create your customers and therefore want to plug in the DQM Duplicate Check ( outside of the BAdi framework ) such as a web service

 

For this blog we’ll focus on passing across the following additional fields to DQM:

 

  • Date of Birth
  • Middle Name
  • Gender
  • Country of Birth

 

 

Technical Implementation


We took a path of least resistance approach to these enhancements. What we didn’t want to do was to completely re-write the duplicate checking framework just to accommodate a few additional fields from the Business Partner.

 

Yes , our solution can be considered a hack, but what it doesn’t do is force you down a path where you completely own the solution ( insert your favourite statement about TCO here ) where you have enhancement after enhancement all through standard SAP code which is a nightmare to maintain in the long run.

First let’s define some requirements:

 

  • I want to use Date of Birth, Middle Name, Gender, Country of Birth in my driver and passenger records so DQM can match on these fields as well
  • I want a class/function that encapsulates the call to DQM and I can re-use it elsewhere in my SAP System.
  • I want to provide different % weightings to the other fields.

 

 

Encapsulating Badi ADDRESS_SEARCH


This is the Badi that is called to perform the duplicate check or address search.  If you dive into the implementation you will find:

 

badi.png

 

class.png

 

The method ADDRESS_SEARCH is where all the magic happens and SAP makes an RFC call into DQM for processing of matching algorithms and returns some results.

 

 

 

 

 

Create a Class

 

We created a class with a global variable go_fldq_duplicate_check which is an instantiation of class CL_EX_ADDRESS_SEARCH during the constructor.

We added a method called FIND_DUPLICATES where the driver record is constructed.  To make it simple we added two structures as importing attributes for capturing the customer details:

 

  • IS_BUILHEADER – BOL Entity Structure for BuilHeader
  • IS_ADDRESS -  BOL Entity Structure for BuilAddress

 

When constructing the Driver record, all you are doing is adding lines to lt_search_params such as:

 

CLEAR ls_search_param.
ls_search_param-tablename = 'ADRC'.
ls_search_param-fieldname = 'CITY1'.
ls_search_param-content = is_address-city.
APPEND ls_search_param TO lt_search_params.

 

Passing Your Own Attributes – Hijack Other Fields


This is where we need to pass in our other fields Date of Birth, Middle Name, Gender and Country of Birth.  We do this by hijacking a couple of available fields from the supported field list. NB: We also pass in all other address information as well just to adhere to the standard functionality.

 

hijack1.png

 

You add these field values as search parameters:

 

CLEAR: ls_search_param.
ls_search_param-tablename = 'ADRC'.
ls_search_param-fieldname = 'NAME1'.
CONCATENATE is_builheader-birthdate is_builheader-middlename INTO ls_search_param-content SEPARATED BY space.
CONDENSE ls_search_param-content NO-GAPS.
APPEND ls_search_param TO lt_search_params.


CLEAR: ls_search_param.
ls_search_param-tablename = 'ADRC'.
ls_search_param-fieldname = 'NAME2'.
CONCATENATE is_builheader-*** is_builheader-countryorigin INTO ls_search_param-content SEPARATED BY space.
CONDENSE ls_search_param-content NO-GAPS
APPEND ls_search_param TO lt_search_params

 

Pitfalls Here


Yes, hijacking NAME1 and NAME2 fields is not optimal however it represents an opportunity to pass in additional fields  so you can unpack them n the DQM side.

 

There is obviously some limitations here, i.e. the field lengths and if you have a lot of additional fields to pass in, how do you separate them logically so you know how to unpack them in DQM correctly every time.

 

 

 

Supported Field List - /FLDQ/AD_MTCCODE


This structure is used to construct the passenger records sent to DQM. Start here and also cross check on the DQM side the structure that is present there.

 

 

 

 

 

 

 

 

 

 

Call to DQM


Once the Driver Record is constructed i.e. your lt_search_params are populated the call is constructed like this:

 

badi_call.png

 

Sample Code is provided below, we’ll just hard code a few items for demo purposes.

 

If IF_EX_ADDRESS_SEARCH~ADDRESS_SEARCH is executed successfully you will hopefully have some results in the importing parameter EX_T_SEARCH_RESULTS that contains the % weightings for your potential duplicates.

 

From here you can build your own return structure with these and other customer details.

 

So there you have it, you should have successfully encapsulated the DQM functionality in a class where you can now re-use it in other parts of your CRM system if you need to.

 

 

 

Pitfalls Here


When you encapsulate this functionality, you are constructing the Driver record manually which ensures your custom fields are passed in correctly.

 

In the typical scenario where you are creating a Business Partner via transaction BP or through the CRM Web UI, the Driver record is constructed using only available address fields plus First Name and Last Name.

 

You will need to make a small enhancement ( discussed next  ) in order to ensure the scenario’s where you are using SAP GUI or CRM Web UI also pass the desired fields for the DQM matching algorithms to ensure consistent results across your business processes.

 

 

Driver and Passenger Records Enhancement

 

Before the physical RFC call is made, a selection of candidate records is chosen based on the Match Code.

Your Passenger records will not have all the custom fields you want sent across to DQM to perform the matching algorithms.

What we need to do here is create a small implicit enhancement where we can populate the other details of the candidate records with the same hijacking idea as we saw above.

 

 

Program /FLDQ/LAD_DUPL_CHECK_SEARCHF01 ( Transaction SE38 ) FORM GET_CANDIDATES


This is where the candidate selection occurs.  The selections are stored in it_cand_matches.  This has a table structure /FLDQ/AD_MTCCODE.

If you look at the table structure, you will see the available fields that make up the Passenger Records, as you can see there is mainly address fields in here including NAME1 and NAME2.

 

This is where we need to make a small enhancement in order to populate Date of Birth, Gender, Middle Name and Country of Birth as we did above when you made your class.  Only this time, we’re dealing with actual Business Partners in the database so we need to read their records and populate the candidate selection record.

 

Make an implicit enhancement at the end of FORM GET_CANDIDATES, you can use the following code as a guide:

 

candidate.png

 

 

The ZCL_DQM_PARTNER class is a simple class in order to read BUT000 and BUT020 without having to do direct SELECT statements.

When the code in the enhancement is executed, you are populating Date of Birth, Gender, Middle Name into the passenger records so on arrival to DQM your matching algorithms can use them.

 

By performing this enhancement you are doing two things

 

  1. All your additional fields are populated in the candidate selection so DQM can actual use those fields in the matching routines when comparing to the Driver record
  2. When a standard scenario is executed, such as creating a Business Partner in SAP GUI or CRM Web UI, these details are correctly passed to DQM for matching.

 

 

You are obviously placing a little more load on the system here by retrieving more details about the Business Partner.  Try and be as efficient as possible in this code to reduce any performance issues.

 

In reality, your Match Code should have reduced the number of candidates so any impact here should be minimal.

 

 

 

Populating the Driver Record when creating a Business Partner via transaction BP or via CRM Web UI


As mentioned a few times earlier, during the scenario when you create a Business Partner via SAP GUI ( transaction BP ) or CRM Web UI, these scenario’s you don’t have control over the Driver record as you did in your class with method FIND_DUPLICATES.

 

There is a relatively simple solution for this that doesn’t involve any more enhancements to standard SAP code.

The DQM add-on in CRM allows you to implement your own code to determine the Driver record details and plug your code in via some settings in SPRO.

Under activity “Maintain Operating Parameters” you have two places where you can maintain your own function modules for some scenarios. You can refer to the User Guide for these.

 

The one we’re interested in is AD_ADDRESS_SEARCH_FUNC_NM.

 

operating_parm.png

 

Follow the user guide and implement your own Function Module as per those guidelines.

 

You place the name of your Function Module in the Parameter Value field above.

 

What your function module is going to do is essentially the same as what you did before in FIND_DUPLICATES except this time you need to pull the Business Partner context out of memory and populate the Driver record with any Business Partner details ( such as Date of Birth ) that have been entered by the user.

Here is a sample implementation, feel free to make improvements etc:

 

FUNCTION zdqm_match_code_duplicate_chk.
 *"----------------------------------------------------------------------
 *"*"Local Interface:
 *"  TABLES
 *"      IM_DRIVER STRUCTURE  /FLDQ/AD_MTCCODE
 *"----------------------------------------------------------------------
 *LOCAL TABLES
 DATA: lt_but000         TYPE TABLE OF   bus000___i,
 *LOCAL STRUCTURES
 wa_match_codes    TYPE            /fldq/ad_mtccode,
 ls_but000         LIKE LINE OF    lt_but000,
 *LOCAL VARIABLES
 lv_gender         TYPE            char1,
 lv_records_in_mem TYPE            i.
 LOOP AT im_driver INTO wa_match_codes.
 CALL FUNCTION 'BUP_BUPA_MEMORY_GET_ALL'
 EXPORTING
 i_xwa          = 'X'
 TABLES
 t_but000       = lt_but000.
 DELETE lt_but000 WHERE partner IS INITIAL AND type IS INITIAL.
 DESCRIBE TABLE lt_but000 LINES lv_records_in_mem.
 CHECK lv_records_in_mem > 0.
 IF lv_records_in_mem  = 1.
 READ TABLE lt_but000 INTO ls_but000 INDEX 1.
 ELSE.
 READ TABLE lt_but000 INTO ls_but000 WITH KEY name_first = wa_match_codes-name_first
 name_last  = wa_match_codes-name_last.
 ENDIF.
 CONCATENATE ls_but000-birthdt ls_but000-namemiddle INTO wa_match_codes-name1.
 CONDENSE wa_match_codes-name1 NO-GAPS.
 IF ls_but000-xsexf = abap_true.
 lv_gender = '1'.
 ELSEIF ls_but000-xsexu = abap_true.
 lv_gender = '0'.
 ELSEIF ls_but000-xsexm = abap_true.
 lv_gender = '2'.
 ENDIF.
 CONCATENATE lv_gender ls_but000-cndsc INTO wa_match_codes-name2.
 CONDENSE wa_match_codes-name2 NO-GAPS.
 MODIFY im_driver FROM wa_match_codes INDEX sy-tabix.
 CLEAR: ls_but000, lt_but000.
 ENDLOOP.
 ENDFUNCTION

 

By implementing this Function Module you can now ensure that your additional fields are passed to DQM in the Driver Record when using the scenario of creating a Business Partner via SAP GUI or CRM Web UI.

 

 

Check Point – Is it working so far?


If you have:

 

  • Installed DQM correctly
  • Performed all necessary post install configuration and setup
  • Successfully performed the “Hand Shake” procedure with DQM
  • Setup your matching routines in DQM
  • Built your own class in CRM which can be tested in isolation
  • Created your enhancement point for Passenger records ( candidate selection )
  • Created a Function Module to implement the Driver record population during standard Business Partner create scenario’s

 

You should be able to get some results back from testing your class. Just ensure you’ve actually setup some test data that makes sense in order to get results back from DQM.

 

 

Some Gotcha’s and Other Thoughts


With anything you work on for the first time, there are almost always downstream impacts that you didn’t consider.

 

Here are a couple:

 

 

BAPI_BUPA_CREATE_FROM_DATA


If you use this BAPI anywhere and you have installed DQM, then you will probably start to get failures if you’re trying to create a Business Partner that is a possible duplicate or the address is invalid if you have Postal Validation switched on. This is a problem if you have any interfaces that consume this function module.

 

 

Testing


Apart from your normal unit testing you will need to go through some regression tests because you’ve created an implicit enhancement point that affects some standard Business Partner creation scenarios.

 

Ensure the following behaves as expected:

 

  • Create a business partner in SAP GUI via Transaction BP
  • Create a Business Partner in CRM Web UI
  • Ensure any interfaces that consume BAPI_BUPA_CREATE_FROM_DATA are working correctly now that DQM is switched on

 

 

Switching off PV and Duplicate Check


There are a few options for you to switch off the Postal Validation and Duplicate check if you need to.  Although this doesn’t really tie in to the theme of this blog, we discovered some limitations here.

 

  1. You can activate / de-activate the Postal Validation and Duplicate check in a SPRO activity.  This completely turns off these checks.  If you do this, you will need to re-run the match-code generation when you turn it back on in case new Business Partners were created whilst DQM was switched off.
  2. You can assign a program or transaction code ( see screen shot below ).  Here you can suppress several functions, Validation (PV) Search (Duplicate Check) Suggestions ( Postal address suggestions ).

 

exception_list.png

 

The technical implementation of this check happens in Function Module /FLDQ/AD_CHECK_TCODE_VALIDITY.


This is where it gets interesting.  The code behind this looks for the program name in SY-CPROG at runtime. Just beware that when this executes in a non-dialog process such as an RFC call or Web Service call, the value in SY-CPROG will be the overarching framework program.  In the case of RFC calls , program SAPMSSY1. What this tells me is that “Maintain Exception Table of Transaction Codes” activity is only meant for dialog processes ( e.g. creating a business partner via transaction BP ).

 

cprog.png

 

 

 

 

 

 

 

 

Summary and Final Thoughts


The DQM add-on is really good piece of kit, we had the scenarios in this blog up and running in a couple of days including the DQM side which hopefully my colleague Brian Kuzmanoski will blog about very soon.

 

There are obviously a couple of limitations that we discovered, but remember the product is address focused, we’ve just demonstrated that you can include additional fields for matching even though our solution is not entirely optimal however we’ve avoided major enhancements which was the goal.

 

Just keep in mind that if you’re trying to switch off DQM via the “Exception” list, there is a limitation here for non-dialog processes.

 

Finally, DQM should be implemented in conjunction with an overall master data governance strategy, this is just an enabling tool for that strategy but by no means will solve all your master data problems.

 

There are further things to explore here such as how to connect DQM to a MDG system or even MDM, where you would effectively be cleansing your master data before it’s even created in an SAP system such as CRM.

 

Hope you enjoyed this blog. Please look out for Brian Kuzmanoski’s blog that covers the matching algorithms on the DQM side..

 

 

 

 

 

 

 

How to import XML schema

$
0
0

  Below are the Steps to import the XSD file in BODS 

  • Open the local object library and go to the Formats tab.
  • Right-click XML Schema and click New. The Import XML schema format dialog box opens.

Import2.JPG

  • In the Format Name box, name the XML Schema.
  • For the File name/URL, click Browse to navigate to the XML schema file and open it.
  • For Namespace, click on the drop down to select the namespace .

Import1.JPG

  • In the Root element name list, click on the root element.
  • Click OK.
  • The XML schema is imported and we can see it in the local object library.

import7.jpg

 

 

Some tips to avoid XML parse errors at run-time:

  • The order of the elements in the XML file should have the same order as in XSD.
  • All mandatory fields specified in the XSD must be available in XML File
  • The datatype of the elements in the XML file must match with the specification in XSD

BODS - SCD2 Terradata lock issue Resolved

$
0
0

Problem:

 

While developing the SCD2 data flow using the Teradata tables a lock happened between the updates and the reads of the Teradata target table and the job was hanging for long time.

 

 

SCD2

The goal of a slow changing dimension of type two is to keep the old versions of records and justinsert the new ones.

 

 

Solution:

 

If we follow the normal SCD2 conventional method explained above using Teradata tables the job will hang with dead lock. Because the TC is pointing to the same Teradata target, the incoming records are trying to compare with the Teradata target table and trying to insert/update in the same target table. Due to this comparison and the manipulation in same table, the Teradata issues a dead lock and the job hangs for long time.

 

 

Normal Flow:

Source Table---> Query --> TC--> HP --> KG --> Target Tab

 

 

Solution:

 

create a view over the target table using the locking row for access method as mentioned below.

 

 

CREATE VIEW TABLE_VW 

AS LOCKINGROW FORACCESS

SELECT * FROM TABLE  WHERE EFF_STAT = 'A';

 

 

Then use the view in  TC for comparing the target records. Now we have same records for comparing and manipulation but in different objects as a view in TC and as a table in target.

 

LOCKING ROW FOR ACCESS used in the view allows the dirty reads from the table and allows INSERT/UPDATE/DELETE operations on the table. Thus we can read the records from the same table as a view and INSERT/UPDATE/DELETE in the same table as table itself.

1.png

New Flow:

Source Table---> Query --> TC (View)--> HP --> KG --> Target Tab

Selecting only alphanumeric data

$
0
0

Use the below regular expression function to select onlt the alphanumeric data from the source

 

match_regex(SQL_Latest_Employees.HRRA01_LAST_N , '[0-9A-Za-z]*',NULL ) = 1


Step by Step for establishing RFC Connection between SAP-BW & Data Service

$
0
0

There are many descriptive documents, which talks about the RFC Connections between Data Services and BW, but I created this blog thinking that it will be much more helpful if the same is provided with appropriate screen shots.

 

I assume that before creating the RFC Connections, Data Services is installed and your BASIS has already imported SAP delivered Functions in the SAP BW Server. These functions are provided in the form of two transport files.

 

Following are the steps:

  • Installing Functions on the SAP Server
  • Creating RFC Connection in BO Data Services
  • Creating RFC Connection in BW
  • Install Authorizations

 

Overview with SAP Provided Diagram

SAP Provided.gif

 

This blog only will focus on the RFC Connection establishment between 2 servers.

 

Create RFC Connection in Data Services

 

 

1. Logon to Data Services Management Console

clip_image002.jpg

 

2. Go to SAP Connections - > RFC Server Interface

 

clip_image002.jpg

 

3. Click on RFC Service Interface

11.jpg

 

4. Select Tab RFC Server Interface Configuration

12.jpg

 

5. Select Add -> Provider necessary server parameters

    Note: In the parameters, the Program ID is the one which you need to provide. You

       can provide any name based on the naming convention your client prefers.

 

Ex: Program_ID: XX_YYYY

13.jpg

 

Click on Apply.

The program ID which you are going to define will be used in SAP BW Server ( or any other SAP Server).

 

6. Select Tab RFC Server Interface Status.

     Now you should be able to view the Server Interface name starts with the Program ID which you have provided.

    

    Select the check box for the Server Interface and click on Start. If all the parameters are appropriate then the server instance will start with a Green status.

14.jpg

 

With the above step you have created the RFC connection in Data Services. Now remember the Program_ID.  

 

In the next two steps I will talk about the RFC Connection in SAP BW. 

 

7. Now logon into the SAP BW.

    Use Transaction SM59 to create the RFC Connection.

15.jpg

 

Expand TCP/IP Connection, then Select Create.

 

16.jpg

 

The Program ID is the one which you have already created in DS Management Console.

Then once you click the Connection Test button you should see the message Connection Successful.

 

At this point your RFC Connection configuration is ready in both the systems ( DS & BW) and both the systems are ready for data transfer.

SAP BusinessObjects Data Quality Management (DQM): Enhancements for more fields other than address data for Duplicate Check (Part 2)

$
0
0

Relevant Systems

These enhancements were successfully implemented in the following systems:

  1. Customer Relationship Management (CRM) 7.02 SP4
  2. SAP BusinessObjects Data Quality Management (DQM), version for SAP Solutions 4.0 SP2
  3. SAP BusinessObjects Data Services 4.0

 

This blog is relevant for:

  • Data Services/DQM consultants who have been asked to customise the default DQM matching and break key generation jobs to include additional fields for duplicate matching.

 

This blog does not cover:

  • Details around Postal Validation; we are only concerned with the Duplicate Check process.
  • Details on how to make the required ABAP changes to pass through additional field data to the DQM jobs – see Part 1 of this article for more information.

 

Reference Material

User’s Guide: http://help.sap.com/businessobject/product_guides/DQMgSAP/en/1213_dqsap_usersguide_en.pdf

 

Introduction

DQM is an add-on for ERP/CRM systems to provide address validation (SAP calls it Postal Validation or PV) and duplicate checking for Business Partners. After installing the add-on and performing the initial setup you get address validation and duplicate checking via the “BP” transaction and automatic integration into the Account BSP Components out of the box.

 

The duplicate checking aspect of this product specifically, however, is heavily focused on address fields. It uses fuzzy logic algorithms to compare name and address data between Business Partners in order to determine whether duplicates exist. In some customer-specific cases, this isn’t a valid approach – e.g. in situations where address data is known to be either unreliable or variable, you would want your duplicate checking to happen on other available fields and reduce the emphasis on address.

 

This is the situation that arose at a particular customer that had purchased the DQM solution. This blog post will cover the steps we took to extend the standard DQM messages between ERP/CRM and the Data Services jobs and the DQM matching algorithm to include additional, customer-specific fields.

 

For additional background and a more detailed problem definition, see part 1 of this article: http://scn.sap.com/community/data-services/blog/2013/07/08/sap-business-objects-data-quality-management-dqm-enhancements-for-more-fields-other-than-address-data-for-duplicate-check written by Leigh Mason. He covers off the changes required on the ERP/CRM side and the DQM add-on. I will now proceed to detail the considerations on the Data Services side.

 

Final note before we begin: this article assumes a baseline installation of DQM has been installed (including the required DQM Data Services repositories).

 

Break Keys

A Break Key (or Match Code – an older term) is an index made up of search terms that are arranged in a specific order. It is used to reduce the number of potential records which are sent to DQM to match against to determine whether a record is a duplicate by filtering records have no realistic probability of being duplicates.

 

DQM comes with five best practice Break Keys. All of them segment the record set on the country field along with some combination of the other address fields. For example, the default best practice Break Key “Match Code 1” is made up of the two-character ISO country code, the first three characters of the postcode and the first character of the street name.

 

Since we’re adding additional fields to the matching algorithm and de-emphasising the importance of addresses data matching, we need to re-think the Break Key we use. The Break Key must be generic enough to support the widest possible match scenario you want to support. For example: if you want DQM to indicate a potential duplicate even where the address details between records are completely different (i.e. a potential match is identified based on matching non-address fields), then the Break Key cannot use any address fields to segment the record set – the segment (or “break group”) must be broader than this to capture these potential matches.

 

One pitfall here is that if your matching algorithm is broad enough, you may run out of options as far as which fields you can use as part of your Break Key. The risk here is that using very wide Break Keys can result in large record sets being sent to DQM for matching leading to performance issues, especially in a real-time processing scenario. Keep this in mind when defining custom Break Keys and refer to Appendix E the User Guide for instructions on using the included Break Key Profiler to assess the effectiveness of your Break Key.

 

We ended up using the first three characters of the Last Name field processed by a phonetic algorithm as our break key. This is a rather wide index, however it worked with our specific dataset.

 

Customising the Break Key

To customise the Break Key there are changes you must make on both the ERP/CRM back-end and in Data Services.

 

Data Services

  1. In the Data Services Designer change the “$$DQ_SAP_Match_Function” substitution parameter to “MatchCodeCustom”.
  2. Change the “CF_DQ_SAP_Calculate_Match_Code_Custom” custom function to consume additional data fields as required and return the Break Key for each record.

 

ERP/CRM Back End

  1. If required, create the required Function Modules to pass through additional fields to DQM when calculating Break Keys. Refer to Leigh’s earlier blog for more information on this and the “Understanding break keys and data fields” section in the User’s Guide.
  2. Break Key values are calculated by running an initial report in ERP/CRM for existing records and real-time on commit for newly created records. If the Break Key algorithm is changed, the Break Key values for existing records must be updated before it will take effect. To do this you must re-run the initial reports to re-generate Break Key Values.  Refer to Leigh’s earlier blog for more information on this and the “Run the initialization program” section in the User’s Guide.

 

Duplicate Check Algorithms

Now that we’ve configured our custom Break Key, let’s look at how we can customise the standard DQM matching algorithms to take into account the additional fields we want to match against.

 

Accessing Additional Fields

The standard input structure for duplicate check messages passed to DQM is as follows:

Image3.PNG

 

As Leigh explained in his blog post, the DQM RFC server places a hard constraint on the messages passed from an ERP/CRM back-end to the DQM services (both in terms of fields and the length of the overall message). The XML structure above cannot be customised (practically speaking).

 

In order to meet our requirements for matching on fields additional to those listed above, we created a design where we re-purposed fields that weren’t normally utilised in our use case. As we were dealing with Person Business Partner records (e.g. where ADDR_TYPE = “2”), the Organisation NAME1 and NAME2 fields in the structure above were typically blank. Depending on your particular requirements and Business Partner record types in use you may have to repurpose different fields to these.

 

In our case, the CRM component that calls DQM matching was customised to pipe Birthdate, Middle Name, Gender and Country of Origin data into these two fields (up to 80 characters of data in total). Refer to Leigh’s blog post for details more details on the ABAP side. As a result, we created the following specification for consumption by DQM:

Image4.PNG

 

The Data Services Dataflow where the DQM input structure is customised is DF_Realtime_DQ_SAP_Name_And_Address_Match (in the Data Services Job Job_Realtime_DQ_SAP_Name_And_Address_Match). The Create_Compound_Fields Query transform was customised to extract these four pieces of data from the NAME1 and NAME2 fields into their own fields on the output structure. Note: this was only done where the ADDR_TYPE is “2” – this piece of conditional logic is important in order to not break DQM matching functionality for non-Person record types.

 

Once you have these additional fields in your structure you are free to modify the rest of the Job_Realtime_DQ_SAP_Name_And_Address_Match job to suit your requirements. There are a couple of things to watch out for though – keep reading.

 

Updating the Match Transform

The Match transform in the DF_Realtime_DQ_SAP_Name_And_Address_Match dataflow performs matching using three match groups:  Level_Name_O (Organisation), Level_Name_P (Person) and Level_Address.

 

The Organisation and Person match group scores are taken into account irrespective of the type of Business Partner record being matched because the Organisation match group is used for name cross-matching for Person records and vice versa. Names are cross-matched into the fields used for matching in the Create_Compount_Fields Query transform – look for the “CompoundNames*” output fields.

 

Thus, if you’re not replacing the default matching logic but simply enhancing it by adding fields and adjusting weighting, keep this cross-matching process in mind. If you add fields to the Organisation match group, you must do so for the Person match group irrespective of the Business Partner record type you’re interested in. Moreover, be careful to set the “One field blank operation” and “Both field blank operation” comparison rules to “IGNORE” in the field matching options:

Image1.PNG

 

This will ensure that if a field is not passed through in either the driver (new un-committed record) and passenger (existing record being matched against) record – as can happen if you’ve got Person-specific fields configured in the Organisation match group for cross-matching, but an Organisation-type record is being processed – then the weighting assigned to these fields will be redistributed equally to the other fields in the match group.

 

Please validate this thinking to ensure it meets your specific requirements, but this is the setup that we used in our case.

 

Weightings

Weightings occur on two levels. The first is on individual fields within a match group  in the Match transform discussed above. These can be adjusted within the Match transform in the Data Services Dataflow DF_Realtime_DQ_SAP_Name_And_Address_Match.

 

Subsequently, the match groups themselves are weighted and summed in order to calculate an overall match score. This weighting and the overall match score calculation is performed in the CF_DQ_SAP_Calculate_Match_Score Custom Function.

 

This Custom Function has two key elements to consider: by default it assigns a 30% weighting on the Person or Organisation match group score (e.g. “Name” data only in the out-of-the-box configuration) and 70% weighting  on the Address match group score. This is something you will need to re-evaluate in conjunction to the relative weightings you give to each field within the Match transform as it will impact on the overall contribution of a particular field to the final match score DQM passes back to the back-end system.

 

Also, it has procedural rules to take the higher of the Person or Organisation match group score (as it assumes cross-matching was performed, as discussed in the section above). This may also have to be adjusted depending on your matching setup.

 

In our case, we simply adjusted the high-level weightings to 90% on the Person match group and 10% on the Address match group but largely kept the function the same.

 

This Custom Function is also where any records with a match score below a particular threshold value are excluded from the outgoing result set:

 

#if the matchScore is below the Threshold we set it to -1 and will drop the record in the next transform.
IF ($MatchScore < $Threshold)
begin
     $MatchScore = -1;
end

 

Records with a match score of “-1” are then filtered out in the subsequent Query transform. While this functionality can be bypassed or customised, by default the $Threshold is passed to DQM by the back-end system and is configurable via the DQM IMG in either ERP or CRM as part of the initial DQM setup (refer to the User’s Guide for more information on this).

 

The threshold value you use should be determined by your requirements and matching rules – e.g. determine what the lowest possible match score, given your particular matching rules, should constitute a potential duplicate that is presented to an end-user for their consideration.

 

Note: after making and saving your changes to the Dataflow (Match Transform) or Custom Function in Data Services you must restart the corresponding Realtime Service (Service_Realtime_DQ_SAP_Name_And_Address_Match) in the Data Services Management Console before changes take effect.

 

Translating Requirements

Often times when the business wants to customise the matching process in DQM they will present their requirements as a list of procedural rules – e.g.: if fields a, b and c match then return 100% match; else if fields a and b match return 66%, else if fields a and c match return 50%, etc.

 

This is slightly different, conceptually, to the way DQM performs matching by default. DQM comes with a standard matching algorithm (using the Data Services Match transform) that considers all input fields in one pass. Matching is granular – i.e. a similarity score is calculated for each field instead of a binary match/no-match result assumed by the example above – and happens in stages. In the first stage, asimilarity score is calculated for fields that are grouped together (e.g. Person, or Address fields) and in a subsequent stage, these group match scores are weighted again to determine the overall match score for the full record.

 

While the matching logic used by DQM can be completely replaced by procedural logic that literally follows customer requirements, it often doesn’t make sense to rip out standard DQM functionality for a number of reasons:

  • Requirements can be fully met by delivering substantially similar or better results can be simply by carefully setting the field weightings used during matching
  • Instead of a binary match/no-match result for each field comparison, using DQM-style matching logic can return a more granular similarity score for each field (e.g. a First Name match can be, say, 60% similar instead of simply 0% (not the same) or 100% (the same))
  • Similarity scores are calculated using data type-specific functionality in Data Services where possible (i.e. Data Services uses specific algorithms and fuzzy-logic matching for names, dates, addresses, etc.), leading to improved accuracy and a higher likelihood of finding matches where duplicates exist
  • DQM considers all of the input fields in one pass reducing the need to build difficult to maintain decision trees
  • Matching via weighted similarity scores is infinitely more adjustable as weightings can be tweaked throughout testing and even when the solution has gone live
  • This framework is delivered in the out-of-the-box DQM installation and leveraging it therefore saves a lot of time and effort

 

I include this section in this write up as this is a hurdle that we had to cross with our customer – but one I believe we were better off for crossing as it resulted in a more robust solution that didn’t require us to remove standard DQM functionality (on the Data Services side) and replace it with custom rules.

 

Without going into too much detail, we went through an analysis exercise where we decomposed the business rules we were provided and came up with field similarity score weightings that would result in the desired outcome in each case identified by the business.

 

Indicatively, the weightings (and fields) we ended up going into testing with looked like this (note the significant reduction in the emphasis on address data):

Image2.PNG

 

Summary and Final Thoughts

The DQM add-on combined with the Data Quality functionality of Data Services is a really powerful option for ensuring the ongoing maintenance of data quality levels in operational systems. The out of the box functionality will be 100% fit with requirements for some customers and thus represents a great option that can simply be “plugged in”.

 

Between Leigh Mason’s original blog post on customising this solution and this follow-up, we have collectively shown how you can customise the solution where the out of the box fields and matching algorithms do not meet customer requirements. Our customisations aimed to minimise enhancements and attempted to leverage the core DQM functionality as much as possible while still meeting our customer’s specific matching requirements.

 

Ultimately, however, it is still important to ensure a good fit between requirements and DQM’s capabilities up-front before going ahead with an implementation. While DQM is easily customisable in the ways that we have demonstrated, attempting to circumvent its core matching is a more involved, error-prone and time-consuming endeavour and certainly not recommended.

 

This requirements analysis, however, should be performed both from a bottom-up perspective and a top-down perspective simultaneously – i.e. consider the approach DQM takes to match for duplicates and assess whether it represents a better way of meeting the same matching outcomes as the customer’s stated requirements. That is certainly what we discovered at our customer.

 

And finally, as Leigh mentioned, DQM should be implemented in conjunction with an overall master data governance strategy – it’s not a replacement for one!

Adding a second to date

$
0
0

Hi All,

 

I found this interesting in share with you all. There was a requirement where because of timestamp of a date column value, BODS job was getting fail and we had to go update the date value to ADD one second to it. Later the job got completed successfully. Here is the below query for your reference.

 

select TO_CHAR(sysdate, 'DD-MON-YYYY HH:MI:SS AM') NOW, TO_CHAR(sysdate+1/(24*60*60),'DD-MON-YYYY HH:MI:SS AM') NOW_PLUS_1_SEC from dual;

 

Hope this is helpful.

 

Thanks,

Abdulrasheed.

Python code to keep only alphanumeric character

$
0
0

User_defined transform using Python script to keep only alpha numeric character

Input to the User_defined transform will be a Description field which will contain invalid character set as below.

 

Source

Name,Description

AAA,AAdesc@1

BBB,BBdesc$1

CCC,CCdesc*&1

DDD,DDKl@£$%[};'\D

 

Python script to remove the invalid character and keep only alphanumeric character

 

-------------------------------------------------------------------------------------------------

import re

var1 = locals()

var1[u'Description'] = record.GetField( u'Description')

var1[u'Description'] = re.sub("[^a-zA-Z0-9.]","",var1[u'Description'])

record.SetField(u'New_Description',var1[u'Description'])

-------------------------------------------------------------------------------------------------

 

Target

 

DESCRIPTION,NEW_DESCRIPTION

AAdesc@1, AAdesc1

BBdesc$1, BBdesc1

CCdesc*&1, CCdesc1

DDKl@�$%[};'\D, DDKlD

I installed DQM for SAP. You can too!

$
0
0

I just helped a customer install Data Quality Management for SAP from scratch.  2 days start-to-finish, including customizations.  We probably could have saved some time if we had all the files downloaded prior to starting the install.

 

Ingredients (all available from Software Download Center):

  • IPS 4.0 SP4 (release notes)
  • DS 4.1 SP1 Patch 1 (release notes)
  • DQM 4.0 SP02 -- Data Services features -- e.g. the jobs themselves (release notes)
  • DQM 4.0 SP02 -- Java features  (e.g. the RFC servers)
  • DQM 4.0 SP02 -- ABAP features (440_731 SP02)
  • an existing SAP ECC EHP6 system running on NW 7.31
  • Data Quality address directories

 

Instructions:

 

Most importantly, a list of SAP Notes we found helpful and got us past errors and hiccups.

 

1740516 - Compatibility requirements between DS, IPS, and Cleansing Package

1720236 - Available releases and patch levels of Data Services

1742633 - Permission error when logging into default local repo after DS 4.1 installation

1887978 - Address Cleanse Suggestions realtime job crashes with Access Violation

1732816 - How to manually import DQM ATL files correctly after failed install

1506464 - What version of DQM is deployed into SAP?

1857608 - BC set activation error on /FLDQ/RSPRODVER report

1517544 - How to activate or reactivate Business Configuration sets for DQM

1373324 - How long can expired USPS directories be used when running in noncertified mode?

1644004 - Enable or disable RFC Server trace logging (traces DQM traffic between SAP & DS)

1529071 - Sizing how many RFC Servers you should use

1544413 - Troubleshooting RFC Server - composite note

1764059 - Troubleshooting RFC Server - can't find log file

1851293 - Troubleshooting RFC Server - not all services found

 

Sure, this presumes you know how to install & configure a basic SAP Data Services environment. But it went surprisingly quick and worked basically out-of-the-box. And if you have errors and roadblocks, SAP is here to support you if you create a message to component EIM-DQM-SAP.

Loading special characters from a flat file to target table

$
0
0

Scenario:

 

     To load special characters such as non-English characters  from a flat file(.txt) to target table whose code page is set to UTF-8 .

 

In order to load special characters from a flat file to target table we need to set appropriate code page or encoding   in :

 

a)   a)    In the source file

 

We need to set the encoding of source file to utf-8. This differs based on the text editor we use. Below mentioned are the steps for changing the encoding in notepad.

 

 

File --> Save As . Change the encoding to UTF-8

 

 

1.png

b)    b)   Flat file properties.

 

    In BODS, normally the code page is set to <default> in flat file properties  as given below:

2.jpg

     Change it to utf-8.

 

     3.jpg

 

c)Datastore properties

 

    Go to the target data store properties in BODS. Normally the code page is set to <default> as given below:

 

4.jpg

 

 

 

    

      Change Code page and Server code page to utf-8.

 

   5.jpg

Complete information on BODS.

$
0
0

 

 

 

What is “SAP Business Objects Data Services”?

This is a software tool designed by Business Object ( a company that got acquired by SAP in 2007) Some of the basic purpose of this tool is to perform jobs like –

§  ETL (Extraction Transformation and Loading)– Pulling out data from any system/database/tables ,applying changes to modify the data or applying programming logic to enhance the extracted data ,and loading data into any other system/database or tables. E.g.: ETL of data from SQL server database to Oracle.

§  Data Ware Housing– A database specifically designed and developed in a particular format to enable easy data analysis or reporting. This could be developed using data from various databases or any other data sources.

§  Data Migration - Moving of data from one place to another. This is a subset of ETL where data is relocated from one software system or database to another .This also involves modification and alteration of data.

§  Business Intelligence– A concept which combines the data warehousing system and reporting. This is applied to analyse data of organization to effectively perform functions like Business performance improvement.

Why SAP Business Objects Data Services?

There are many other software tools in the market which are capable of doing the same functions or activities as mentioned above or even more. They are the direct competitors for Business Objects. They are Informatics, Data stage, Congo’s, and SSIS etc. The above mentioned activities can also be performed using programming tools like .Net or even Java and also directly within database end like SQL Server or Oracle.

The tool BODS provides a very easy and efficient interface to perform these specialist tasks which involve data manipulation. The objects and functions within BODS are specifically designed to perform manipulations and transformation of huge and complex volume of data very efficiently. There are system provided objects and functions which can be dragged and dropped easily and jobs can be created. And, being a SAP tool has very god compatibility with SAP applications compared to any other similar tool.

Common terms and terminologies

Designer

Designer is the graphical user interface that lets you create, test, execute and debug BODS Job. This is the space where the data transformations take place.

Repository

Repository is like a database that stores the objects in a designer. The job metadata, the transformation rules and the source and target metadata also. There are primarily three types of repositories Local, Central and Profiler. The designer cannot even be opened for any task without having a local repository. In other words local repository is a mandatory repository for BODS functioning. At this point we are not bothered about the other two repositories.

Engine

The BODS Engine executes the jobs created using the Designer. When the BODS application is started, there are enough Engines launched to effectively accomplish defined tasks.

Job Server

The Job Server is an application that launches the Data Services processing engine and serves as an interface to the engine and other components in the Data Services suite.

Access Server

The Access Server passes messages between web applications and the Data Services Job Server and engines.

Data store

A data store provides a connection to a data source such as a database. This is a linking interface between the actual backend database and Data services. Through the data store connection, Data Services is able to import descriptions of the data source such as its metadata.

CMC (Central Management Console)

This is a web based administration tool for BODS which is used for some basic functions such as repository registration, User Management etc.

These terms should be at the finger tips of a BODS programmer as these would be used very often while working on BODS.

 

BODS Architecture

 

The illustration below (Figure 1.1) shows the basic architecture of BODS.

Figure 1.1

Above diagram explains the relation among Designer, central and local repository and web application. We will gradually understand these later. In the next page we will start creating our first BODS application

In the first chapter of this article we have learnt the very basic of BODS. In this part of the article we will begin with BODS Repository creation process

Before launching the BODS application and starting actual BODS programming there are three mandatory activities which needs to completed they are –

1.    Repository Creation

2.    JobServe Configuration

3.    Registering Repository with CMC.

 

Creating a Repository

 

The first and foremost activity one has to do after installing BODS and database is, creating a local repository. The below mentioned are the step by step process to create it-

  1. 1. Log on to your Database (SQL server 2008 Express in this case)
  2. 2. Create a new database (Example - DS_LOCAL_TEST in the figure 2.1).This database would be used for Local repository. (This article does not explain how to create database and tables outside BODS.)

Figure 2.1

  1. 3. Go to Start menu open SAP Business Objects Data Services 4.0 SP1 -> Data Services Repository Manager
  2. 4. A screen as shown in the below diagram (Figure 2.2) would popup. Enter the required parameters into the text boxes as stated below -

Figure 2.2

Below are the credentials one would need to fill in the Repository Manager.

Repository type– Local
Database Type– Microsoft SQL Server (in this case)
Database Sever name– This should be the machine/server name where the database is residing. If you have installed database in your local machine the machine name or machine IP has to be provided.
Windows authentication– DONOT check this option .It is always recommended to use password authentication.
User name– The login name of that is used to login to the above mentioned Database residing in the Database server mentioned.
Password - The same password used to login to the above mentioned Database residing in the Database server mentioned for the login id specified.

  1. 5. Click on create. The below displayed message could been seen(Figure 2.3), indicating “The local repository was created successfully”. (If you do not get the success message, then, any one of the credentials is incorrect like the database server name or the database name or user id password)

Figure 2.3

  1. 6. Close the window

 

Attaching repository to JobServe

 

The next step following the creation of repository is attaching the repository to a JobServe. Only once it is attached to a job server, it can be opened using designer. Below are the steps to associate a repository to JobServe-

  1. 1. Go to Start Menu -> SAP Business Objects Data Services 4.0 SP1->Data Services Server Manager. A screen as shown below would be displayed (Figure 2.4)–

Figure 2.4

  1. 2. Click on “Configuration Editor” and the screen Figure 2.5 would be displayed.

Figure 2.5

  1. 3. Next click on “Add” button and below screen would be displayed (Figure 2.6)

Figure 2.6

  1. 4. Click on “Add” button on the “Associated Repositories” label to enable the options in the “Repository Information” label.
  2. 5. Enter the credentials as shown in Figure 2.7 and click “OK”.

Figure 2.7

§  Job Server name– Enter any name that you wish to assign for the JobServe. Example in the above figure JobServer_Test in this case.

§  Job Server port– By default it is 3500.If there occurs any error assigning the port as 3500, then try to increment the value like 3501 or 3502 or higher.

§  Ignore the Check boxes and go to the label “Repository Information”.

§  Database type– Select the database in which you created the repository. Microsoft SQL Server in this case.

§  Database Server name– Name of the machine/server where database resides.

§  Database name– Name of the database which was created for local repository.

§  Username and Password– Same user name and password used to log in to SQL Server database in which the local repository database resides.

§  Leave the “Default repository” check box as is.

  1. 6. Click on “Apply” and screen as shown in Figure 2.8 would be displayed.

Figure 2.8

  1. 7. The “Associated Repositories” box would be automatically populated with the concerned repository name.
  2. 8. Click on OK. Screen as in figure 2.9 can be seen.

Figure 2.9

  1. 9. Click on OK and we get a screen as shown in figure 2.10

Figure 2.10

  1. 10. Click "Close and Restart” to finish the JobServe configuration for the repository created. This will restart the BODS Engine and dialog box may be popped confirming it would restart the engine.

In the first chapter of this article we have learnt the very basic of BODS. In this part of the article we will begin with BODS Repository creation process

Before launching the BODS application and starting actual BODS programming there are three mandatory activities which needs to completed they are –

1.    Repository Creation

2.    JobServe Configuration

3.    Registering Repository with CMC.

Creating a Repository

The first and foremost activity one has to do after installing BODS and database is, creating a local repository. The below mentioned are the step by step process to create it-

  1. 1. Log on to your Database (SQL server 2008 Express in this case)
  2. 2. Create a new database (Example - DS_LOCAL_TEST in the figure 2.1).This database would be used for Local repository. (This article does not explain how to create database and tables outside BODS.)

Figure 2.1

  1. 3. Go to Start menu open SAP Business Objects Data Services 4.0 SP1 -> Data Services Repository Manager
  2. 4. A screen as shown in the below diagram (Figure 2.2) would popup. Enter the required parameters into the text boxes as stated below -

Figure 2.2

Below are the credentials one would need to fill in the Repository Manager.

Repository type– Local
Database Type– Microsoft SQL Server (in this case)
Database Sever name– This should be the machine/server name where the database is residing. If you have installed database in your local machine the machine name or machine IP has to be provided.
Windows authentication– DONOT check this option .It is always recommended to use password authentication.
User name– The login name of that is used to login to the above mentioned Database residing in the Database server mentioned.
Password - The same password used to login to the above mentioned Database residing in the Database server mentioned for the login id specified.

  1. 5. Click on create. The below displayed message could been seen(Figure 2.3), indicating “The local repository was created successfully”. (If you do not get the success message, then, any one of the credentials is incorrect like the database server name or the database name or user id password)

Figure 2.3

  1. 6. Close the window

Attaching repository to JobServe

The next step following the creation of repository is attaching the repository to a JobServe. Only once it is attached to a job server, it can be opened using designer. Below are the steps to associate a repository to JobServe-

  1. 1. Go to Start Menu -> SAP Business Objects Data Services 4.0 SP1->Data Services Server Manager. A screen as shown below would be displayed (Figure 2.4)–

Figure 2.4

  1. 2. Click on “Configuration Editor” and the screen Figure 2.5 would be displayed.

Figure 2.5

  1. 3. Next click on “Add” button and below screen would be displayed (Figure 2.6)

Figure 2.6

  1. 4. Click on “Add” button on the “Associated Repositories” label to enable the options in the “Repository Information” label.
  2. 5. Enter the credentials as shown in Figure 2.7 and click “OK”.

Figure 2.7

§  Job Server name– Enter any name that you wish to assign for the JobServe. Example in the above figure JobServer_Test in this case.

§  Job Server port– By default it is 3500.If there occurs any error assigning the port as 3500, then try to increment the value like 3501 or 3502 or higher.

§  Ignore the Check boxes and go to the label “Repository Information”.

§  Database type– Select the database in which you created the repository. Microsoft SQL Server in this case.

§  Database Server name– Name of the machine/server where database resides.

§  Database name– Name of the database which was created for local repository.

§  Username and Password– Same user name and password used to log in to SQL Server database in which the local repository database resides.

§  Leave the “Default repository” check box as is.

  1. 6. Click on “Apply” and screen as shown in Figure 2.8 would be displayed.

Figure 2.8

  1. 7. The “Associated Repositories” box would be automatically populated with the concerned repository name.
  2. 8. Click on OK. Screen as in figure 2.9 can be seen.

Figure 2.9

  1. 9. Click on OK and we get a screen as shown in figure 2.10

Figure 2.10

  1. 10. Click "Close and Restart” to finish the JobServe configuration for the repository created. This will restart the BODS Engine and dialog box may be popped confirming it would restart the engine.

In our earlier article, we have learnt how to create a repository in BODS. Once that part is done, The final process before launching the BODS application is registering the repository with CMC.

The repository that was created and attached to JobServe needs to be registered in Business Object’s Central Management Console or CMC.

  1. 1.Goto Start Menu -> Information Platform Services 4.0 -> Information Platform Services -> Information Platform Services Central Management Console .A webpage as displayed as in Figure 2.11 would popup.

Figure 2.11

The credentials required for the figure 2.11 would need to be furnished as required.

System– The name or IP address of the machine on which the BODS server is installed(not to be confused with Database server).This BODS server could be the same machine on which you are working or can be installed on any other machine and connected via network.

User Name– This username is set by the Administrator user who installs the BODS tool. In this case I would be using the super user name “administrator”.

Password– The password set for the username administrator at the installation time.

Figure 2.12

  1. 2. After logging in, a screen as shown in figure 2.12 is opened up with the options as shown. At present we can ignore all the options except “Data Services”.
  2. 3. Click on the “Data Services” icon and a new window would be opened as shown in figure 2.13.

Figure 2.13

  1. 4. Right click on the link Repositories -> Configure repository or right click on the link Repositories -> Manage -> Configure repository and a new window as in figure 2.14 is opened up.

Figure 2.14

  1. 5. Each of the fields in figure 2.14 needs to be filled as mentioned below and also as shown in figure 2.15 -

§  Repository Name – Name of the database that was created for local repository. In our case we would use DS_LOCAL_TEST.

§  Description - A description in few words as a label.

§  Database Type – Microsoft SQL Server.

§  Hostname - This should be the machine/server name where the database is residing. If you have installed database in your local machine, then machine name or machine IP has to be provided.

§  Port – Leave the default port as it is ‘1433’.

§  Database Name – Same as repository name DS_LOCAL_TEST.

§  Windows Authentication – ‘No’

§  Is Profiler Repository – ‘No’

Figure 2.16

  1. 6. Click on “Test Connection” button and a message box as shown in figure 2.17 is displayed.

Figure 2.17

  1. 7. Click on “Save” and you can see the repository that was just registered with CMC and the status as active. Figure 2.18.

Figure 2.18

  1. 8. Click on “Logoff” and close the window. Now we are setup and are ready to launch the Designer and start BODS programming.

In our earlier article on Data Services we have learnt how to register the local repository with CMC. In this part of the article we will start using Data Services Designer.

 

Launching the BODS Designer to create Jobs

 

After the successful creation of Local repository, JobServe configuration and registering repository in CMC you are now good to launch the BODS designer. After opening the designer you would be able to create and execute BODS jobs.

Here are the steps for launching the BODS designer-

  1. 1. Go to Start Menu and click on SAP Business Objects Data Services 4.0 SP1 -> Data Services Designer. A screen as shown below would be displayed (Figure 2.4) -

Figure 3.1

  1. 2. Enter the credentials as required

System – host[post] - The name or IP address of the machine on which the BODS server is installed(not to be confused with Database server).This BODS server could be the same machine on which you are working or can be installed on any other machine and connected via network.

User Name – This username is set by the Administrator user who installs the BODS tool. In this case I would be using the super user name “administrator”.

Password – The password set for the username administrator at the installation time.

  1. 3. After entering the credentials, Click on “Log On” and it would display all the repositories that we created and registered with CMC on that BODS server (Figure 3.2).We had only one repository created so that can be seen.

Figure 3.2

  1. 4. Select the repository (DS_LOCAL_TEST in this case) and click on OK (or double click on the repository selected) and the Designer gets opened as shown in figure 3.3.

Figure 3.3

  1. 5. The figure 3.3 shows the first screen we get once we launch the BODS Designer, and this is our playground. We create our BODS jobs, execute, and debug them using this Designer.

 

Creating Data Services Jobs

 

After the preparation steps and launching the Designer we are now set up to start our job. Before we start the programming there are program hierarchies that we need to understand.

§  Every job that we create in BODS should be embedded within a project.

§  The object “Project” is the highest in the hierarchy followed by “Job” then “Workflow”/”Conditions “and finally “Dataflow”, and in the Dataflow we drag and drop the various transformations or tables or objects to perform the actual job run.

The above statement may be little confusing or difficult to easily understand.

To make it simpler we will go with a real life example. Here BODS Designer is the world and Project would be country. This country can accommodate many states, so the states become our jobs. In the state there are districts and those are our Workflows or Condition flows and in those districts we have corporations/municipalities. Likewise in the workflows there are data flows. And in these corporations and municipalities there are people. Similarly in the dataflow there are tables and transformations.

World>Country >State>District>Corporations/Municipalities>People

Designer>Project>Job>Wrokflow/Condition>DataFlow>Tables/Transforms

We would familiarize with the windows that can be seen within the Designer window (like Local Object Library etc) on the go. As we show this in the coming examples and practice jobs these would be clearer.

Creating our first BODS Program

  1. 1.Click on Project ->New->Project.(Figure 4.1)

Figure 4.1

  1. 2.We would get a new window to create new project. Enter the name of the project you would like to assign. In this case we go by the name “Prj_Test”(Figure 4.2).

Figure 4.2

  1. 3. Click on “Create” button.
  2. 4. The Prj_Test is created. Now we have to create a job under the project. Right click on the project Prj_test that we just created and select the option “New Batch Job”(Figure 4.3). At present we are dealing with only batch jobs and not real time jobs. The basic difference between batch and real time job is TBD

Figure 4.3

  1. 5. Create a job with any name, We have used “Job Test” (figure 4.4)

Figure 4.4

  1. 6. After creating the “Job Test”, we are now ready to create workflows/conditions and dataflow. To execute transformations the minimum object we need to have inside Job is a dataflow. Workflows or Condition flows are not mandatory but is a good programming practice to use them. From the tool bar on the right (indicated in figure 4.5), click on work flow (second object from the top) and place it anywhere on the Designer workspace. Rename the workflow (any name).I have used WF_TEST1.

Figure 4.5

  1. 7.Double click on the workflow WF_TEST1. From the tool bar on the right , click on dataflow (third object from the top) and place it anywhere on the Designer workspace within the workflow. Rename the dataflow as ‘DF_TEST1’ (figure 4.6).

Figure 4.6

  1. 8. We can now see the objects that we created, both in the “Local Object Library” by clicking on each of the tabs and also in the “Project Area” in a hierarchical format (figure 4.8).

Figure 4.9

  1. 9. Next we would start the actual transformation. For this we would begin with creating a “Datastore”. As discussed earlier, datastore is a logical connection to database.

To create a datastore in BODS and to link it to any database first of all, we should have a database and few tables in that database already created .For example, In our case, I have created a database “db_Test_Data” in my database server and two tables tblEmployee and tblEmpSalary ( These were created at the database backend and not in BODS).

We would try to now import these two tables using datastore connection into the BODS staging area (Staging is another common term that is used to describe the BODS Designer area itself where the data transformations are done).Figure 4.10 shows the Database and tables I have created in SQL Server database.

Figure 4.10

  1. 10.Now we create the datastore to connect to Database and import the tables. Figure 4.11 shows datastore creation. Go to datastore tab in the Local Object Library and right click on it and a popup window would come up. Click on “New”.

Figure 4.11

  1. 11. Once “New” is selected a screen as shown in figure 4.12 would be displayed.

Figure 4.12

  1. 12. Enter the credentials in the textboxes and select appropriate options (Figure 4.13).

Figure 4.13

Datastore name – The name you would to name you new datastore that you are creating now. Here I have use “DS_TEST1”.
Datastore Type – In our case it is “Database”
Database Type – My database is “Microsoft SQL Server”
Database version – The version on my system is “Microsoft SQL Server 2008”.Select the appropriate option the you are using.
Database Server name – This should be the name or IP of the machine /server which has the tables that you are going to import. (not the one have your local repository database. It could be the same or different). I have created both in the same database sever.
Database name – Name of the database which contains the tables to be imported to BODS.
Username – User name which is used to log on to the database server.
Password – Password for the username which is used to log on to the database server.

  1. 13. After filling up these details click on “OK”.
  2. 14. Leave all the other check boxes and other buttons seen on the window for now. We do not need them for now.
  3. 13. We would now be able to see the datastore (DS_TEST1) which we just created on the Local object library as shown in the figure 4.14.

Figure 4.14

  1. 14. Click on the datastore DS_TEST1 to expand the tree three nodes can be seen namely “Functions”,”Tables” and “Template Tables”. Our next task is to import the tables “TblEmployee” and “TblEmplyeeSalary ” that we created at the database backend using this datastore. There a two ways one can do it.

1.    Double click on the “Tables” icon.

2.    Right click on the “Tables” icon and select “Import by Name”. Using this method we can import the tables faster from database, but one has to know the exact table name to do this.

Here I would go with method ‘1’,as I do not have many tables in my database.

  1. 15. After double clicking the “Tables” icon we would get a window as shown in figure 4.15.

Figure 4.15

Note - In case you have more tables in that database all of those tables would be listed. Since I have only two tables that I had created only those two are listed.

  1. 16. Select both the tables right click, a menu will popup click on “Import” (figure 4.16).

Figure 4.16

  1. 17. Now, once again click on the Tables icon on the Local object Library and you can see the tables we have imported in the tree (figure 4.17)

Figure 4.17

  1. 18. We now have the source system data in our datastore. (But this does not necessarily mean we have extracted the data.)

Exercise 1

As we have set up our datastore and are ready with the job, let’s try out a small exercise. We are here going to perform a small scale Data Migration or ETL, which includes the following scopes -

§  To extract this data from this source system for which we have created the datastore (DS_TEST1) into a staging area.

§  To do a small transformation on the incoming data by adding a new column to the table tblEmployee, called ‘EmpLastname’ and leave it blank.

§  To join the two tables tblEmployee and tblEmployeeSalary with the common field EmpNo.

§  Load the result into a table in another database.

(Do not panic seeing too many to-do items; these are easily doable tasks for a starter)

Solution design

Any programmer knows, there are always different ways to approach a problem. But, in the end, the most efficient and easy method is adopted. Here also we would follow the same approach. In BODS the best way to make you programs execute efficiently is to reduce the number of objects and flows where ever possible. This ability can be achieved over a period of time only, that too by practice and experience.

Let’s take our example; we can actually manage all the four requirements using just one dataflow. Here is how we do it –

  1. 1. Go to the job ‘Job_Test’ we had created, double click on the dataflow ‘DF_TEST1’.
  2. 2. Drag both the tables from the Local Object Library to the Designer workspace.
  3. 3. A dropdown menu would popup ( for both the tables) .Select ‘Make Source’ for both the tables(Figure 4.18)

Figure 4.18

  1. 4. Once both tables are dragged into the workspace, we would need to use BODS defined transformation objects. We cannot cover all of the transformation objects or functions, but here we are going to use one of the transform object, which is most useful and helpful transformation object; the ‘Query Transform’.
  2. 5. Click on the “Transforms” tab of the Local Object Library. Then, Click on the “Platform” icon to expand the node and we can see a list of transform objects. Drag the fourth transform from the top called ‘Query’ transform to the workspace between both the tables.
  3. 6. After placing the transform, Click and drag on the dark blue dot on the table (‘A’ in figure 4.19) and join the line to the inward arrow of the Query transform(‘B’ in figure 4.19). The table tblEmployee is already connected in figure 4.19 and the other table is still not connected.

Figure 4.19

  1. 7. Once both the tables are connected to the Query transform, open the query transform by double clicking on it. Inside the query transform we can see both the source tables that we have on the left side. The right side of the query transform which is now blank is the place where we have to perform our operations (figure 4.20).

Figure 4.20

  1. 8. Select and drag all the fields from tblEmployee to the right and also only the EMPSALARY field from the tblEmployeeSalary table (as per the requirement we need all the fields from tblEmployee along with salary field). And then click on the “WHERE” tab below the fields area. Here, we have to specify the join condition just like normal SQL query. The join condition here is “tblEmployee.EMP_NO = tblEmployeeSalary.EMP_NO”. To do this you do not have to type the field names. Just click on ‘EMP_NO’ field in the table tblEmployee on the left pane and drag it to the empty space inside “WHERE” tab.Figure 4.21 shows the activities discussed here.

Figure 4.21

  1. 9. Our next task is to add a new field to the output ‘EmpLastname’. To do this we need to right click on the right hand side pane of the query transform, on any of the existing fields (preferably the last field which is EMPSALARY in this case).Select the option ‘New Output Column’ (figure 4.22).

Figure 4.22

  1. 10. A window would be popped up as seen in figure 4.23, select “Insert Below”.

Figure 4.23

  1. 11. Another window is displayed to initialize and set the properties of the new field which we are going to create. Figure 4.24 shows the window with values entered as required for our scenario. Ignore all the other tabs for now and just enter the “Name” ,”Data type” and “Length” as required.

Figure 4.24

  1. 12. Click on “OK” and now we would be able to see the field on the right hand side pane of the query transform (Figure 4.25).

Figure 4.25

  1. 13. After we have added the ‘EMPLASTNAME’ field to the target side, we can check if syntactically we stand correct so far, and we do this using the “Validate” button. At this point by looking at the screen we can recognize, we are not yet done with the field mappings as the blue triangle is missing in the “EMPLASTNAME’ field. But just to understand the functionality of the “Validate” let’s click on it (figure 4.26).

Figure 4.25

  1. 14. As expected we get the error message box clearly indicating that the field ‘EMPLASTNAME’ is not mapped. To do that we have we have to syntactically correct it by completing the mapping of that field. For this, go to the “Mapping” tab and in the empty space of the Mapping tab and type in two single quotes ‘’ .By two single quotes we assign the default value of the field as blank (which is as per the requirement).Now again validate the and we can see there are no errors .Next click on the “Back” button to go back to the dataflow (figure 4.27).

Figure 4.28

  1. 15. Now we have completed three of our four tasks. Next we have to create our target table to get the final output. Here, I am going to create another datastore for a database residing on another server. So, the effect of this job would be like an actual data migration or and ETL where we extract tables from one database and transform it in BODS, add fields and load it into another database. The datastore I have created for target table is DS_OUTPUT1 (I am not repeating the datastore creation steps).There are two ways we can add an output table to our transformations.

1.    Click on the “Template tables” icon on the Local Object Library under the appropriate datastore. Drag and drop it to the workspace.

2.    Select the “Template table” option from the tool bar on the right side of the workspace and click anywhere on the Designer workspace. A new dialog box would popup, enter the name of the output table as required and select the appropriate datastore under which the table needs to be created.

Here, I would go with method number ‘2’. Figure 4.29 shows this action step.

Figure 4.29

  1. 16.Here we can give any name to the output table that is being created, I name it “tblOutput” and select the correct datastore in which the table needs to be created, from the drop down “In datastore”. Then click “OK’.
  2. 17. We can now see the table that we created in the workspace. Join the table to the Query transform as we did to the other two tables. After doing that validate your job again. You would see there are no errors and now we are good to test run our job (figure 4.30).

Figure 4.30

 

Executing your BODS job

 

The job has been created and checked for syntax errors and we see that job is ready for execution. Before execution there is one more object we can quickly brief through, the Script object. Let’s add a script to our job and display the message “Hello World”. Without “Hello world” any programming looks incomplete. Figure 4.31 shows how to add script.

  1. 1. Click on the “Job_Test” icon on project area to navigate back up or you can do this using “Back” button also.
  2. 2. Click on the “Script” button on the toolbar and then click on any space on the designer workspace. You can see the script has been placed on our workspace.
  3. 3. Rename it to “SCR_MSG” and then connect it to the workflow “WF_TEST1”.

All these steps are shown in the figure 4.31.

Figure 4.31

  1. 4. Double click on ‘SCR_MSG’ to open the script.

Figure 4.32

  1. 6. Press the “Back” button or click on the Job icon to go back to the job level. Validate the job again to make sure everything is syntactically correct. Save the work you have done so far. Click on ‘Save All’ button on the toolbar on the top (shown in figure 4.33).
  2. 7. Now execute the job. This can be done in many ways

1.    Right click on the Job_Test on the project area and select ‘Execute’.

2.    Click on the ‘Execute’ icon on the upper tool bar or press F8.

Figure 4.33

  1. 8. Either ways we execute, we would get a screen as shown in figure 4.34. For now just ignore all of the check boxes and tabs that we see on the screen and click ‘OK’.

Figure 4.34

We can see the name of the Job server that we had created for this repository.

  1. 9. Once we press ‘OK’ the job begins execution and we can see the message Hello World and towards the end there is a message ‘Job is completed sucessfuly’ Both these have been indicted in figure 4.35.

Figure 4.35

  1. 10. Congrats!! you have now successfully executed your first BODS job.
  2. 11. After execution of our job we can see the number of records that were processed. This can be done by clicking on the monitor icon (shown in figure 4.36) on top of the screen. At alter stage if you want to review or re-visit you old job execution details, then click on the Monitor tab in the Project area (shown in figure 4.36) and you can see all the jobs you have executed. Click on the job you would like see the status of. Few things to keep in mind are-

1.    In the Project area Monitor tab if you see a small red light as in this case it means the job was executed successfully.

2.    In the Project area Monitor tab if you see a Big red cross (we would see this next) it means the job was executed with errors.

3.    In the Project area Monitor tab if you see a green light it means the job is still being executed.

Figure 4.36

  1. 12. Our next most important activity is check if the data has been actually migrated to the target table and if that has been migrated as expected. To do this, go in to the dataflow by clicking on the dataflow in Project Area. Next click on the lens like icon on the target table ‘TBLOUTPUT’ and there we see the output data as required.

Figure 4.37

Now, if you cross verify this against our requirements we can very well see that the output matches the expectations. This table can also be viewed at the database backend also by querying the table.

 

Debugging a BODS job

 

We now know how to create and execute a job; our next task is debugging the job that has errors. For this lets create an error in our job. Detach the link between our script SCR_MSG and workflow WF_TEST1.Save and execute the job. You can see that the job stopped execution and a Red indicator lighted up on the top of execution window (Figure 4.38). Click on the red button and you can see the error clearly listed.

Figure 4.38

You can test it further with creating your own errors and debugging. Some errors may not be very easy to understand this would become more and more understandable with experience and practice.

Do’s and Don’ts and Programmer tips

  1. 1. Do not copy paste dataflows if you want to re-use the same dataflow with modifications. Dataflows should be replicated only.
  2. 2. To re-use the dataflow without any modification, select it from the Local object library and drag and drop it to the designer work space.
  3. 3. Do not rename tables or copy paste target tables once created. Should you rename a table delete the table entirely from the Local object library and re-create it.
  4. 4. In a dataflow once a target table is created, we cannot have a starting point from that table. That target table would need to be used as a source in next dataflow.
  5. 5. There is no predefined or hard and fast rule while placing or arranging the transforms /tables on the workspace with respect to the layout. It recommended to arrange the objects neatly aligned in a workspace. For example, avoid jerks while connecting two objects which are on a straight line.
  6. 6. Keep saving you job at each step. BODS like any other software can crash any time and you may lose your work done that far.
  7. 7. ATL – The BODS program dump is called ATL. This is same like an ‘.exe’.


Same table as Source and Target in a dataflow without table lock (Teradata) – Issue Solution:

$
0
0

Scenario:

 

Consider a scenario where we have to use the same teradata table as Source and Target in a single dataflow.

This amy sometimes causes a table lock in teradata database and the job will suspend without showing any progress.

 

1.PNG

Here the table bods_region is used as source & target which cause the job to suspend.

 

Resolution:

 

To avoid this issue, we can divide the main dataflow execution to sub dataflows. This can be achieved by adding a data transfer transform in the dataflow.

 

2.PNG

 

Here the Data_transfer (DT_Test table) transform added will divide the execution into multiple sub dataflows (which can be viewed in ‘Optimized SQL’ as in below)

 

3.PNG

4.PNG

  • First sub dataflow will join the source tables and load to DT_Test table.
  • Second sub dataflow will read from DT_Test to the target bods_region table.

 

This resolves the teradata table lock issue as after the first sub dataflow the lock on bods_region table will be released and so the 2nd sub dataflow will be able to load data to target successfully.

 

This resolution can be applied for all the scenarios wherever a lock happens for simultaneous read/write .

How to create "Full Outer Join" in SAP BODS

$
0
0

Picture1.jpg

Although this can be done directly by using the "SQL Transform" by providing the query for full-outer-join, but it is said that its not recommended due to performance reasons.

 

So the picture explains itself how to perform the same. We have two source tables, One Query transform contains the contents for left-outer-join and the another Query transform contains the contents for right-outer-join . Then the outputs from both Query transforms are merged (union-all) , and then we remove the duplicate rows by using another query transform (Query_2) and the output is directed to the output table (TEST_OUTPUT).

.

$
0
0

error.jpg

 

 

The error is simply because BODS is not getting the required handler for connecting to Oracle. The most frequent cause is the smaller value of the parameter "PROCESSES" set in Oracle, which needs to be increased in order to solve this issue. Kindly follow the below mentioned steps:

 

1) Open the SQL interface like SQL*Plus etc.

 

2) Login as a system DBA (conn sys as sysdba)

 

3) Enter the SQL statement > alter system set processes=200 scope=spfile

 

4) Except current sql*plus window , Close all other applications connecting to Oracle ( like BODS , SQLDeveloper etc)

 

5) Enter the SQL command > startup force

 

This will restart the oracle database and the newly entered value for processes parameter will come into effect.

SAP Data Insight

$
0
0

Data Insight

Data Insight is used to do the DHA (Data Health Assessment) on the data, to see if the data is good to use. We use the tool data Insight to do a test / profiling on the data before we use the data for the ETL process. We can also say that Insight is used to do the data investigation for DHA. It   automates the Analysis and Monitors the data.

 

Using Data Insight we can perform the following tasks


Data Profiling

Column query

Integrity test and Custom query

Scheduling

Creating a trend reports

Sampling reports

 

Getting started

Creating Connection


Navigation to data Insight


Note:- First we need to start the Data insight Engine before we use the tool.

 

To start the Data Insight Engine, follow the bellow navigation.

Start --> Program Files -->Business objects XI 3.0--> Business objects Data Insight --> Data Insight Engine

Insight1.png


Once you click on this, a Dos window will open and it will start the engine.

Once the Engine starts, go for the Data Insight GUI in the same above navigation.

Once your Data Insight starts, You will find the bellow screen.


Insight2.png

Now we need to create a Project

To create a Project go to the navigation

File-> New Project -> Give the project name and Check the box for Share Project.

Insight3.png

By sharing, we can make it accessible to the rest.

 

Now you have to provide the connection name.

Choose Data base and click on the Down arrow to specify the database connection.

Insight4.png


If using for the first time, give your SQL server name and click on OK, it opens the data link properties window.

Give in the server name and username and password. In step 3 select your SQL database on which you want to perform the test.

Click on Test connection to see if the credentials are correct. And click on OK.

Insight5.png

Now it will open the below window for selecting Owners. You can click on OK. Now the Insight Window is open and you can see the selected DB available. Expand the data base to see the tables under it. Go to the selected table and expand it.

Insight6.png

Insight7.png

Here we have 4 tabs (Data Profile, Column Query, Referential Integrity, Custom Query) using which we can perform different types of tests on the data.

Data Profile

Using this we can perform tests like Summary on the data, Comparison,  Frequency Test, Word Frequency test, Uniqueness of data, Redundancy test.

Summary will give the snap shot of the data for decision making or further drill-down.

How can we carry out the Summary test?

You can perform the summary on the table level or on a column level as well.  Select the Check box under the Summary column and click on RUN.

It will give you the below Summary Profile on the data which gives a complete DHA on the data.

It will give you the below Summary Profile on the data which gives a complete DHA on the data.

Insight12.png

You can check on Save report and click on close. Now it will ask you to save the profile  report. Click on Yes and give the Profile name and click on OK.

Insight9.png

Insight10.png

Now if you notice, the last run column is populated with the time stamp and the result. Click on the result next to the time stamp to see the results

Comparison test

Comparison is used to get the report of Count and percentages of rows with incomplete column values.

To do a comparison test, Click on the check box under comparison at the table level or the row level and click on RUN.Insight11.png


Insight12.png

Now you can observe the result and it gives the result of the match or duplicates records available. In our case we don’t have duplicates or match records.

You can also click on print report to generate the report and also can export the report to different formats by clicking on the export report.

Insight13.png

Insight14.png

Once you close this report and click on close in the main window, it will ask you to save the result and same as the above procedure we can save the results.

Frequency (FRQ) is used to find the frequency distribution of distinct values in columns.

The working procedure is same as the above. Click on the check box under the FRQ and click on RUN to see the results. You can also click on print report to export it in to different formats. You can also save the result  by checking save report and click on close and give the profile name.

Please see the following screen shots.

Insight15.png

Insight16.png

Insight17.png

WFRQ (Word frequency )Frequency distribution of single word.

Same as the above procedure, click on check box and click on run to see the results.

UNQ (Unique) This gives the count and percentages of the rows with non-unique column values.

Same as the above procedure, click on check box and click on run to see the results.

RDN (Redundancy ) This test is to identify the commonalities and outlives between the columns.

Same as the above procedure, click on check box and click on run to see the results.

Column Query :-  This is used to Analyze the data within the Data Insight.

  1. Select the column on which you want to perform the test and right click à add combined column query

We can perform the following test using the Combined column Query.

 

Insight18.png

Format


Occurrence Search for the occurrence (<, >, =,<=, >=) ‘n’ times

Pattern                                           Pattern of the data in the column

Pattern recognition                        Recognizing the string pattern with special chars

Range                                            specify the min and max values for the range

Reference column                         reference column on which we have to refer this column

Specific value test                         Search the column with a specific value


Select the Radio buttons on the left side and the respective selections will be activated on the right side.

Insight19.png

Once you select the query type on the left side, chose the respective options on the right side and click on return data check box and click on run,

In our example, we take the specific value test.

Select the specific values on left side and specify a value on the right hand side. Select the Return data check box and click on run. You will get the below result. You can click on print report to see the data in a report format, or you can click the check box save data and click on close. Click ok to save the report and give the report name and click on OK.

Happy Learning

Rakesh

Introduction, Artifacts and look and feel of BODS

$
0
0

SAP BO DATA Integrator / Data Services

 

 

Data services is integrated with SAP BI/SAP R3/SAP Applications and non SAP Ware house.

Purpose:- It does ETL via batch Job and online method through bulk and delta load processing of both structured and unstructured data to generate a Ware House (sap and Non-sap)

 

Data Services is the combination of Data Integrator and Data Quality. Previously these are separate tools like Data Integrator which is used to do the ETL part and Data Quality to do the data profiling and Data Cleansing.

Now with Data Services both DI and DQ are combined in to once interface so that it provides the complete solution (data integration and Quality) under one platform.

This even combines the separate job servers & Repositories of DI and DI in to one.

 

Data Federator: - The output of the data federator is the virtual data. Federator provides the data as input to the data services and using federator we can project data from multiple sources as a single source.

 

Data Services Scenarios:-

Source                                            Ware House

SQL         --           DS           --             DB

Flat File    --           DS           --             DB

Flat File    --           DS           --             BI

R/3           --           DS           --             BI

R/3           --           DS           --             DB

SQL         --           DS           --             BI

 

We can move the data from any source to any target DB using Data Services.

Data Services is an utility to do ETL process, It is not a warehouse , so it doesn’t stage any amount of data in it.

Data Services can create ETL process and can create a ware house (SAP / Non-Sap) .

 

DS is used majorly for 3 sort of projects

1)  

             Migration

2)          Ware house or DB building

3)          Data Quality

 

Data Profiling: - Pre processing of data before the ETL to check the health of the data. By profiling we check the health of the data if it’s good or bad.

 

Advantages of Data Services over SAP BI/BW ETL process

 

It’s a GUI based frame work

It has multiple data sources in built configuration

It has numerous inbuilt Transformations (Integrator, Quality, Platform)

It does data profiling activity

It easily adds external systems

It supports Export Execution Command to load the data in to the ware house via batch mode process

It generates ABAP code automatically

It recognizes Structure and un structures data

It can generate a ware house (sap / Non Sap)

It supports huge data cleansing/ Consolidation/ Transformation

It can do real time data load/ Full data load/ Incremental Data load

 

Data integrator / Services Architecture

 

intro1.png

No concept of Process chains/ DTP/ Info packages if you use the data services to load the data.

 

Data Integrator Components

 

Designer

intro2.png

It Creates the ETL Process

It has wide set of transformations

It includes all the artifacts of the project ( Work Flow, Data Flow, Data Store, Tables)

It is a gate way to do profiling

All the designer objects are reusable

 

 

 

Management Console (URL based tool / Web based tool)

 

intro3.png

It is used to configure the repositories

It allows us to configure user profiles to specific environment

It allows us to create users and user groups and assign the users to the user groups with privileges

It allows to auto schedule or execute the jobs

We can execute the jobs from any Proj-geographic location as this is a web based tool

It allows us to connect the repositories to Connections (Dev/ Qual / Prod)

It allows us to customize the data stores

 

Access Server

 

It is used to run the real time jobs

It gets the XML input (real time data)

XML inputs can be loaded to the Ware house using the Access server

It is responsible for the execution of online / real time jobs

 

Repository Manager

intro4.png

It allows us to create the Repositories (Local, Central, and Profiler)

Repositories are created on top of the standard database

Data Services system tables are available here

 

 

Job Server

 

This is the server which is responsible to execute the jobs. Without assigning the local / central repository we cannot execute the job.

 

Data Integrator Objects

 

Projects :-

 

Project is a folder where you store all the related jobs at once place. We can call it as a Folder to organize jobs.

 

Jobs:-

Jobs are the executable part of the Data Services. This job is present under the project.

 

Batch Job

Online jobs

 

Work Flows:-

This work flow acts a folder to contain the related Data Flows. This Work Flows are re-usable

 

Conditionals:-

Conditional contains Work Flows or data flows and these are controlled by script whether to trigger or not.

 

Scripts:-

Scripts are set of codes used to define or initialize the global variables, Control the flow of conditionals or control the flow of execution , to print some statements at the runtime and also to assign specific default values to the variables.

 

Data Flow:-

The actual data processing happens here.

 

Source Data Store:-

It is the place held to import the data from the data base/ sap to data services local repository

 

Target Data Store:-

It is the collection of dimensions and fact tables to create the data ware house.

 

Transformations:-

These are the query transformations that are used to carry out the ETL process. These are broadly categorized in to 3 (platform, Quality and integrator)

 

File Format :-

It contains various legacy system file formats

 

Variables:-

We can create and use the local and global variables and use them in the project. The variables starts with “$” Symbol.

 

Functions:-

We have numerous inbuilt functions like (String, math, lookup , enrich and so on)

 

Template Table:-

These are the temporary tables that are used to hold the intermediate data or the final data.

 

Data Store:-

These data stores acts a port from which you can define the connections to the source or the target systems. You can create multiple configurations in one data store to connect this to the different systems

 

ATL :-

ATL files are like the BIAR files. This is named after a company. ATL  doesn’t hold any full form like BIAR.

The Project/ Job/ Work Flow/ Data Flow/ Tables can be exported to ATL so that they can be moved between Dev -->Qual and from Qual-->Prod.

Similarly you can also import the Project/ Job/ Work Flow/ Data Flow/ Tables which are exported to ATL, back in to the data services

 

Thanks

Rakesh

Viewing all 222 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>