AdlCopy is a command-line tool (it runs on the user’s machine) that allows you to copy data from Azure Storage Containers or Blobs into Azure Data Lake Store.
You can use the AdlCopy tool in two ways:
- Standalone, where the tool uses Data Lake Store resources to perform the task. This can be a cheap and more ad-hoc/manual option to move data from blob storage to ADLS compared to using Data Factory or ADLA U-SQL jobs. I assume only ADLS transaction costs will be made if your blob storage resides in the same region as your ADLS.
- Using a Data Lake Analytics account, where the units assigned to your Data Lake Analytics account are used to perform the copy operation. You would typically use this option when the data to be moved is in the range of gigabytes and terabytes, and you want better and predictable performance throughput.
Another advantage is that on execution the AdlCopy tool generates an ADLA job that contains U-SQL code for each source file from your blob storage that will be copied to the ADLS. You can reuse the code of this job to schedule execution of the copy process on a regular basis. Requirement for this scenario is that your blob storage source files have static paths and file names, as the U-SQL code of the generated job contains hardcoded references to the source files locations.
Download AdlCopy: http://aka.ms/downloadadlcopy
Instructions: https://azure.microsoft.com/en-us/documentation/articles/data-lake-store-copy-data-azure-storage-blob/
Copy all data from blob storage container to ADLS folder using a generated U-SQL job that can be reused:
1. Start Command Prompt
2. Navigate to dir where AldCopy is installed:
%HOMEPATH%\Documents\adlcopy
3. Run the following command (standalone, copy is performed by ADLS):
AdlCopy.exe /Source https://yourblob.blob.core.windows.net/yourcontainer/ /Dest swebhdfs://yourdatalakestore.azuredatalakestore.net/yourfolder/ /SourceKey –
4. Data is copied:
5. Run the following command to execute the process using the Data Lake Analytics service:
AdlCopy.exe /Source https://yourblob.blob.core.windows.net/yourcontainer/ /Dest swebhdfs://yourdatalakestore.azuredatalakestore.net/yourfolder/ /SourceKey – /Account yourDataLakeAnalyticsAccount /Units 2
6. Now because the ADLA service has been used a U-SQL job was generated and executed, which can be seen in the azure portal:
7. The U-SQL code shows that for each file in the blob storage EXTRACT from blob storage and OUTPUT to ADLS statements were created:
8. You can now choose “Duplicate Script”, delete the DROP ASSEMBLY IF EXIST and CREATE ASSEMBLY statements which you don’t need again, and save your new job. You can now execute/schedule the copy process again without the need of the AdlCopy tool.