Cleaning and validating input data

The HR data source, that I currently receive person data from, has historically had data quality issues. These are much better than they were in the past, but still cause a few issues.

When I attended FIM training at OCG, I raised the issue of data cleanliness and was told in simple terms – make sure the input data is clean! If only life was so simple…..

Back to reality, I have had to add code to my Advanced Flows to deal with, clean up and validate the input data.

A nice example follows – importing Surname from HR – dealing with:

  • Just plain bad data (null as a string/ value)
  • Validation (characters that should not be present – via regex replace)
  • Clean up (removing spaces from around hyphens – double barrelled names).- there is also a bit of trimming to remove and spaces before or after the string value
  • Surname missing!

Things like this remind me of why “Codeless Provisioning” was something I fought to get working (for too long), but ultimately had to abandon in favour of using code for almost everything. Doing so has been a real panacea for all of the rules and other funnies that I have had to accommodate.

Note: I made a little edit – I was not checking for the presence of AccountName before raising errors – should that attribute have been missing (highly unlikely, but not unknown to occur), that would have raised an error in itself. The edited code is a little more robust!

Case "sn-HRMA-Import"
'HR attributes required are:Surname, AccountName
'Ensure that this attribute has spaced Hypens corrected
Dim surnamelogFileName As String = dtDateNowDay & "-" & dtDateNowMonth & "-" & dtDateNowYear & "_HRMA_Surname.log"
Logging.SetupLogFile(surnamelogFileName, loggingLevel)
If csentry("Surname").IsPresent Then
If csentry("Surname").Value.ToLower = "null" Then
mventry("sn").Delete()
If csentry("AccountName").IsPresent Then
Throw New Exception("Error in Surname (null) for FedID: " & csentry("AccountName").Value)
Else
Throw New Exception("Error in Surname (null) for PID: " & csentry("PID").Value) 'PID is the anchor - if this is missing we have more serious problems
End If
Else
Dim tString As String = Regex.Replace(csentry("Surname").Value, "[^a-zA-ZÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜŸäëïöüÿ_\-\'\ ]", "")
'tString = tString.Replace(" - ", "-")
If tString <> csentry("Surname").Value Then
If csentry("AccountName").IsPresent Then
Throw New Exception("Error in Last Name format for FedID: " & csentry("AccountName").Value & ", Firstname: " & csentry("Surname").Value)
Else
Throw New Exception("Error in Last Name format for PID: " & csentry("PID").Value & ", Firstname: " & csentry("Surname").Value) 'PID is the anchor - if this is missing we have more serious problems
End If
Else
mventry("sn").Value = Replace(csentry("Surname").Value.Trim, " - ", "-")
End If
End If
Else
If csentry("AccountName").IsPresent Then
Logging.Log("Surname not Present for: " & csentry("AccountName").Value, True, 0)
Else
Logging.Log("Surname not Present for: " & csentry("PID").Value, True, 0) 'PID is the anchor - if this is missing we have more serious problems
End If
End If