Tech Tuesday: Source File Name from Folder Data Source
A very common question that the Support Team will get is:
“I’m using a SmartConnect map using a folder data source with csv files. Is there any way to get the filename that each document came from?”
The answer is, surprisingly, no.
Unfortunately, there isn’t a built-in variable that SmartConnect uses that has this information that we could just reference. To get the actual file name for a record, we will have to code it ourselves.
I can think of a couple different ways to do this:
- Don’t use a folder data source map. Instead make a map that mimics the behavior of a folder data source map by using scripts.
- Modify the actual source file with scripting to add a new column with the source file name added
Don’t use a folder data source
This seems to go against the whole idea of using a folder data in the first place.
But this is a clever idea and better yet, it is already mostly complete.
The premise in that article was that the user would like to use a ‘custom odbc query’ which we can run on a file data source but not a text file folder data source map. The solution was to copy the file from a folder to the name of the file that the ODBC Query on the map was expecting.
But in our case, this works the same in that now that we have read the file name using the map pre- script, we could copy that original file name into a custom global variable such as GBL_FILENAME.
Modify the actual source file
To me, this seems like the easiest solution. If we just had the file name in the source, we can reference it like any other column. The “trick” then would be – how do we get it there?
We can again use a “Before Map” task similar to the code in the first section. However, in this example we won’t rename the file. Instead, we will open the file and read the contents looking for a column named “FileName”. If there is a column named that, we can skip the file. Otherwise we add it and then add the filename data to the rows.
Map Pre-Script
Dim sourceFile As String = “D:\eone\test\folder_ds_test\source\”
Dim list_line As Object
Dim i As Int16
For Each fi As FileInfo In New DirectoryInfo(sourceFile).GetFiles(“*.csv*”)
Dim linesList As New Object
Dim newList As New System.Collections.Generic.List(Of String)
i = 0
linesList = System.IO.File.ReadAllLines(fi.FullName)
‘if the column list already includes FileName then exit
If linesList(0).Contains(“FileName”) Then
Continue For
End If
For Each list_line In linesList
If i = 0 Then
newList.Add(list_line + “,FileName”)
Else
newList.Add(list_line + “,” + fi.Name)
End If
i = i + 1
Next
‘overwrite existing file with my new modified version
System.IO.File.WriteAllLines(fi.FullName, newList.ToArray())
Next
Return True
The above script would be the Map Pre-script. Adjust the sourceFile path to be the path that your map is looking for the source files. Also on the “GetFiles()” method, this is the file filter for the GetFiles method that filters on only “.csv” files. If there were .txt and other files (which I would never recommend with a folder data source), the query wouldn’t find those files to change them.Figure 1:Files modified by the script
After running the map with the above script, we can see in the source folder there were three files initially. The newfile.txt wasn’t modified because of the “.csv” filter but the newfile2.csv was modified to include the new FileName column that was populated with the actual file name.
There is an existing feature request to add this ability so you should vote and comment on that, but until then hopefully one of these methods will suffice.
Best Regards,
Patrick Roth
eOne Support
Want to know more? Email me at pat.roth@eonesolutions.com.
I am voting for this feature request!
Actually, I’ve found that the assertion that there’s no out-of-box method to retrieve the file name from a folder data source is not correct.
When looking at a preview of source data in the destination mapping window, one will find a “SourceFile” field if they scroll the preview window all the way to the right. This “virtual” field can be assigned to the return value of a calculated column with this simple script…
return _SOURCEFILE
I use this method extensively in multiple integrations for our organization. The only caveat is that there must be at least one valid data file present in the source folder for the script to validate.
This value can also be assigned to a global variable within the same calculated column script. However, I have found that for some unknown reason in the latest version of SmartConnect, assigning it to the predefined GlobalFileName variable does not work (creating my own user defined variable does work). This appears to only be a problem in newer releases as using GlobalFileName did work in the 2014 version.
I folks will find this info useful.