Back

Configure an Azure Data Lake for use in Popdock

Published: Dec 18, 2023
Post Author Written by Ethan Sorenson

When configuring an Azure Blob storage, there will be a lot of settings and options presented. The purpose of this article is to navigate the different options and ensure the correct settings are used for the best performance in Popdock.

Full Microsoft documentation for storage accounts and configuration can be found here.

What is a Data Lake

data lake is a single, centralized repository where you can store all your data, both structured and unstructured. A data lake enables your organization to quickly and more easily store, access, and analyze a wide variety of data in a single location. With a data lake, you don’t need to conform your data to fit an existing structure. Instead, you can store your data in its raw or native format, usually as files or as binary large objects (blobs).

Microsoft Learn

For Popdock the advantage of using a Data Lake for large data sets over storing that data in SQL or another application is the speed and flexibility. From a speed perspective, with a Data Lake we can read very large files quickly. Without worrying about API limits or memory restrictions on a server. Because all data in a Data Lake is stored in files, we don’t have a schema we need to fit the data into; instead we can work with any data structure, whether that is a standard ERP table, or a heavily customized system.

1. Create a Storage Account

First we need to create a Storage account in Azure. A Storage account is the environment that will hold our folders (Containers).

  1. Log into the Azure Portal.
  2. Click Create a Resource.
    CreateResource
  3. Select the Storage account resource.
    CreateStorageAccount
  4. Select an Azure Subscription and Resource Group into which to add the Storage account.
  5. Provide a unique Storage account name.
  6. Select a Region to store the account data.

For best performance, we recommend selecting a region based on your location (United States, Canada, Western Europe, etc.) and the Data Centers where Popdock is hosted in. (Example – if you are located within the United States, selecting the United States Data Center that Popdock is hosted in). Click HERE to see a list of our Popdock Account Regions.

  1. Set the performance to Standard.
    • Note: Popdock currently only supports Standard performance.
  2. Set the desired Redundancy.
    • Note: We recommend Geo-redundant storage (GRS).
  3. Click Next: Advance.
    InstanceDetails
  4. On the Advanced tab > Security section.
  5. Check the box for Allow enabling anonymous access on individual containers.
  6. Check the box for Enable storage account key access.
    KeyAccess 1
  7. Navigate to the Hierarchical namespace section.
  8. Check the box for Enable hierarchical namespace.
    HierarchicalNamespace
  9. Navigate to the Blob storage section.
  10. Set the Access tier to Hot.
    AccessTier
  11. Click Review.
  12. Click Create.

2. Create Container

A Container is essentially a folder, and each container can have separate security information allowing limited access to only some data in your Storage account.

Azure pricing is based on the amount of data stored not the number of Containers used. Add as many containers as need to organize your data.

Each container will be setup as a separate connection in Popdock.

  1. Search the Azure Portal for Storage accounts.
    OpenStorageAccounts
  2. Select your Storage account.
  3. From the left-hand navigation select Containers.
  4. Click + Container to add a new Container.
  5. Provide a Name for the Container.
  6. Set the Anonymous access level to Private (no anonymous access).
    AddContainer
  7. Click Create.

With the Azure Data Lake configured you can proceed to add the connector in Popdock.

Feeling stuck? Get the support and guidance you need to help you power through any data challenge

We're on your integration team. Connect with our people and let us know how we can help you.