Skip to content

Preparing Archive

Core
5d 1h ago
Reviewed

azure-storage-file-datalake-py

Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations.

.agents/skills/azure-storage-file-datalake-py Python
PY
TY
BA
4+ layers Tracked stack
Capabilities
0
Signals
0
Related
3
0
Capabilities
Actionable behaviors documented in the skill body.
0
Phases
Operational steps available for guided execution.
0
References
Support files available for deeper usage and onboarding.
0
Scripts
Runnable or reusable automation artifacts discovered locally.

Architectural Overview

Skill Reading

"This module is grounded in ai engineering patterns and exposes 1 core capabilities across 1 execution phases."

Azure Data Lake Storage Gen2 SDK for Python

Hierarchical file system for big data analytics workloads.

Installation

pip install azure-storage-file-datalake azure-identity

Environment Variables

AZURE_STORAGE_ACCOUNT_URL=https://<account>.dfs.core.windows.net

Authentication

from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient

credential = DefaultAzureCredential()
account_url = "https://<account>.dfs.core.windows.net"

service_client = DataLakeServiceClient(account_url=account_url, credential=credential)

Client Hierarchy

Client Purpose
DataLakeServiceClient Account-level operations
FileSystemClient Container (file system) operations
DataLakeDirectoryClient Directory operations
DataLakeFileClient File operations

File System Operations

# Create file system (container)
file_system_client = service_client.create_file_system("myfilesystem")

# Get existing
file_system_client = service_client.get_file_system_client("myfilesystem")

# Delete
service_client.delete_file_system("myfilesystem")

# List file systems
for fs in service_client.list_file_systems():
    print(fs.name)

Directory Operations

file_system_client = service_client.get_file_system_client("myfilesystem")

# Create directory
directory_client = file_system_client.create_directory("mydir")

# Create nested directories
directory_client = file_system_client.create_directory("path/to/nested/dir")

# Get directory client
directory_client = file_system_client.get_directory_client("mydir")

# Delete directory
directory_client.delete_directory()

# Rename/move directory
directory_client.rename_directory(new_name="myfilesystem/newname")

File Operations

Upload File

# Get file client
file_client = file_system_client.get_file_client("path/to/file.txt")

# Upload from local file
with open("local-file.txt", "rb") as data:
    file_client.upload_data(data, overwrite=True)

# Upload bytes
file_client.upload_data(b"Hello, Data Lake!", overwrite=True)

# Append data (for large files)
file_client.append_data(data=b"chunk1", offset=0, length=6)
file_client.append_data(data=b"chunk2", offset=6, length=6)
file_client.flush_data(12)  # Commit the data

Download File

file_client = file_system_client.get_file_client("path/to/file.txt")

# Download all content
download = file_client.download_file()
content = download.readall()

# Download to file
with open("downloaded.txt", "wb") as f:
    download = file_client.download_file()
    download.readinto(f)

# Download range
download = file_client.download_file(offset=0, length=100)

Delete File

file_client.delete_file()

List Contents

# List paths (files and directories)
for path in file_system_client.get_paths():
    print(f"{'DIR' if path.is_directory else 'FILE'}: {path.name}")

# List paths in directory
for path in file_system_client.get_paths(path="mydir"):
    print(path.name)

# Recursive listing
for path in file_system_client.get_paths(path="mydir", recursive=True):
    print(path.name)

File/Directory Properties

# Get properties
properties = file_client.get_file_properties()
print(f"Size: {properties.size}")
print(f"Last modified: {properties.last_modified}")

# Set metadata
file_client.set_metadata(metadata={"processed": "true"})

Access Control (ACL)

# Get ACL
acl = directory_client.get_access_control()
print(f"Owner: {acl['owner']}")
print(f"Permissions: {acl['permissions']}")

# Set ACL
directory_client.set_access_control(
    owner="user-id",
    permissions="rwxr-x---"
)

# Update ACL entries
from azure.storage.filedatalake import AccessControlChangeResult
directory_client.update_access_control_recursive(
    acl="user:user-id:rwx"
)

Async Client

from azure.storage.filedatalake.aio import DataLakeServiceClient
from azure.identity.aio import DefaultAzureCredential

async def datalake_operations():
    credential = DefaultAzureCredential()
    
    async with DataLakeServiceClient(
        account_url="https://<account>.dfs.core.windows.net",
        credential=credential
    ) as service_client:
        file_system_client = service_client.get_file_system_client("myfilesystem")
        file_client = file_system_client.get_file_client("test.txt")
        
        await file_client.upload_data(b"async content", overwrite=True)
        
        download = await file_client.download_file()
        content = await download.readall()

import asyncio
asyncio.run(datalake_operations())

Best Practices

  1. Use hierarchical namespace for file system semantics
  2. Use append_data + flush_data for large file uploads
  3. Set ACLs at directory level and inherit to children
  4. Use async client for high-throughput scenarios
  5. Use get_paths with recursive=True for full directory listing
  6. Set metadata for custom file attributes
  7. Consider Blob API for simple object storage use cases

When to Use

This skill is applicable to execute the workflow or actions described in the overview.

Primary Stack

Python

Tooling Surface

Guide only

Workspace Path

.agents/skills/azure-storage-file-datalake-py

Operational Ecosystem

The complete hardware and software toolchain required.

This skill is mostly documentation-driven and does not expose extra scripts, references, examples, or templates.

Module Topology

Skill File
Parsed metadata
Skills UI
Launch context
Chat Session
Antigravity Core

Antigravity Core

Principal Engineering Agent

A high-performance agentic architecture developed by Deepmind for autonomous coding tasks.
120 Installs
4.2 Reliability
1 Workspace Files
4.2
Workspace Reliability Avg
5
68%
4
22%
3
10%
2
0%
1
0%
No explicit validation signals were parsed for this skill yet, but the module remains available for inspection and chat launch.

Recommended for this workflow

Adjacent modules that complement this skill surface

Loading content
Loading content
Cart