Skip to content

Data assets

dadosfera.services.maestro.data_assets.fetch_catalog_asset_count

fetch_catalog_asset_count(maestro_base_url, token, additional_params={})

Fetch the total number of data assets from Maestro.

PARAMETER DESCRIPTION
maestro_base_url

Base URL of the Maestro instance (e.g., 'https://maestro.example.com/api').

TYPE: str

token

Authentication token for API access.

TYPE: str

additional_params

Additional query parameters to include in the request. These parameters will override default sorting parameters if there are conflicts. Defaults to {}.

TYPE: Dict[str, str] DEFAULT: {}

Returns: int: Total number of data assets. Raises: requests.exceptions.HTTPError: If the API request fails with a non-200 status code. requests.exceptions.ConnectionError: If there's a network connection error. requests.exceptions.Timeout: If the request times out. requests.exceptions.RequestException: For any other request-related errors. Example: >>> assets = fetch_catalog_asset_count( ... maestro_base_url="https://maestro.example.com/api", ... token="bearer_token_123", ... additional_params = {} ... ) >>> logger.info(assets) 5000

Source code in dadosfera/services/maestro/data_assets.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def fetch_catalog_asset_count(
    maestro_base_url: str, token: str, additional_params: Dict[str, str] = {}
) -> int:
    """Fetch the total number of data assets from Maestro.

    Args:
        maestro_base_url (str): Base URL of the Maestro instance
            (e.g., 'https://maestro.example.com/api').
        token (str): Authentication token for API access.
        additional_params (Dict[str, str], optional): Additional query parameters to include
            in the request. These parameters will override default sorting parameters if
            there are conflicts. Defaults to {}.
    Returns:
        int: Total number of data assets.
    Raises:
        requests.exceptions.HTTPError: If the API request fails with a non-200 status code.
        requests.exceptions.ConnectionError: If there's a network connection error.
        requests.exceptions.Timeout: If the request times out.
        requests.exceptions.RequestException: For any other request-related errors.
    Example:
        >>> assets = fetch_catalog_asset_count(
        ...     maestro_base_url="https://maestro.example.com/api",
        ...     token="bearer_token_123",
        ...     additional_params = {}
        ... )
        >>> logger.info(assets)
        5000
    """
    params = {"size": 1}

    params.update(additional_params)
    try:
        response = requests.get(
            f"{maestro_base_url}/catalog",
            params=params,
            headers={"Content-Type": "application/json", "Authorization": token},
        )
    except requests.exceptions.HTTPError as errh:
            logger.info("Http Error:",errh)
    except requests.exceptions.ConnectionError as errc:
        logger.info("Error Connecting:",errc)
    except requests.exceptions.Timeout as errt:
        logger.info("Timeout Error:",errt)
    except requests.exceptions.RequestException as err:
        logger.info("OOps: Something Else",err)

    return int(response.json()["total"])

dadosfera.services.maestro.data_assets.fetch_paginated_catalog_assets

fetch_paginated_catalog_assets(maestro_base_url, token, additional_params={}, size=50, start_page=1)

Fetch all data assets from Maestro's catalog using pagination.

This function retrieves the complete list of data assets by making multiple paginated requests to the Maestro catalog API. Assets are sorted by display name in ascending order. If the initial request fails, appropriate error messages will be logged.

PARAMETER DESCRIPTION
maestro_base_url

Base URL of the Maestro instance (e.g., 'https://maestro.example.com/api').

TYPE: str

token

Authentication token for API access.

TYPE: str

additional_params

Additional query parameters to include in the request. These parameters will override default sorting parameters if there are conflicts. Defaults to {}.

TYPE: Dict[str, str] DEFAULT: {}

size

Number of data assets to fetch per page. A larger size means fewer API calls but more data per request. Defaults to 50.

TYPE: int DEFAULT: 50

start_page

Page number to start fetching from. Useful for resuming interrupted fetches. Defaults to 1.

TYPE: int DEFAULT: 1

RETURNS DESCRIPTION
List[Dict[str, Any]]

List[Dict[str, Any]]: List of data asset objects. Each object contains metadata about a data asset as returned by the Maestro API.

RAISES DESCRIPTION
HTTPError

If the API request fails with a non-200 status code.

ConnectionError

If there's a network connection error.

Timeout

If the request times out.

RequestException

For any other request-related errors.

Example: >>> assets = fetch_paginated_catalog_assets( ... maestro_base_url="https://maestro.example.com/api", ... token="bearer_token_123", ... size=100 ... ) >>> logger.info(f"Retrieved {len(assets)} assets")

Source code in dadosfera/services/maestro/data_assets.py
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
def fetch_paginated_catalog_assets(
    maestro_base_url: str,
    token: str,
    additional_params: Dict[str, str] = {},
    size: int = 50,
    start_page: int = 1,
) -> List[Dict[str, Any]]:
    """Fetch all data assets from Maestro's catalog using pagination.

    This function retrieves the complete list of data assets by making multiple paginated
    requests to the Maestro catalog API. Assets are sorted by display name in ascending order.
    If the initial request fails, appropriate error messages will be logged.

    Args:
        maestro_base_url (str): Base URL of the Maestro instance
            (e.g., 'https://maestro.example.com/api').
        token (str): Authentication token for API access.
        additional_params (Dict[str, str], optional): Additional query parameters to include
            in the request. These parameters will override default sorting parameters if
            there are conflicts. Defaults to {}.
        size (int, optional): Number of data assets to fetch per page. A larger size means
            fewer API calls but more data per request. Defaults to 50.
        start_page (int, optional): Page number to start fetching from. Useful for resuming
            interrupted fetches. Defaults to 1.

    Returns:
        List[Dict[str, Any]]: List of data asset objects. Each object contains metadata
            about a data asset as returned by the Maestro API.

    Raises:
        requests.exceptions.HTTPError: If the API request fails with a non-200 status code.
        requests.exceptions.ConnectionError: If there's a network connection error.
        requests.exceptions.Timeout: If the request times out.
        requests.exceptions.RequestException: For any other request-related errors.
    Example:
        >>> assets = fetch_paginated_catalog_assets(
        ...     maestro_base_url="https://maestro.example.com/api",
        ...     token="bearer_token_123",
        ...     size=100
        ... )
        >>> logger.info(f"Retrieved {len(assets)} assets")
    """
    num_data_assets = fetch_catalog_asset_count(
        maestro_base_url=maestro_base_url,
        token=token,
        additional_params=additional_params,
    )

    logger.info(f"Found {num_data_assets} of this type")

    page = start_page
    data_assets = []
    while len(data_assets) < num_data_assets:
        params = {"size": size, "page": page, "order": "asc", "sort_by": "display_name"}

        params.update(additional_params)
        try:
            response = requests.get(
                f"{maestro_base_url}/catalog",
                headers={"Content-Type": "application/json", "Authorization": token},
                params=params,
            )
            response.raise_for_status()
        except requests.exceptions.HTTPError as errh:
            logger.info("Http Error:",errh)
        except requests.exceptions.ConnectionError as errc:
            logger.info("Error Connecting:",errc)
        except requests.exceptions.Timeout as errt:
            logger.info("Timeout Error:",errt)
        except requests.exceptions.RequestException as err:
            logger.info("OOps: Something Else",err)
        response_json = response.json()
        data_assets.extend(response_json["data_assets"])
        logger.info(f"Data assets retrieved: {len(data_assets)}")
        page += 1
    return data_assets

dadosfera.services.maestro.data_assets.get_data_asset_column_metadata

get_data_asset_column_metadata(maestro_base_url, token, data_asset_id, additional_params={})

Fetch detailed column metadata for a data asset.

Retrieves comprehensive metadata about all columns in a specific data asset, including data types, descriptions, and statistical information when available.

PARAMETER DESCRIPTION
maestro_base_url

Base URL of the Maestro instance (e.g., 'https://maestro.example.com/api').

TYPE: str

token

Authentication token for API access. Must have 'catalog:read' permission.

TYPE: str

data_asset_id

Unique identifier of the data asset (e.g., 'asset_abc123').

TYPE: str

additional_params

Additional query parameters. Common parameters include: - include_stats: Include statistical information - include_samples: Include value samples Defaults to {}.

TYPE: Dict[str, str] DEFAULT: {}

RETURNS DESCRIPTION
Dict[str, Any]

Dict[str, Any]: Column metadata information including: - columns (List[Dict]): List of column definitions

RAISES DESCRIPTION
HTTPError

For failed API requests. Common cases: - 401: Invalid or expired token - 403: Insufficient permissions - 404: Data asset not found

ConnectionError

For network connectivity issues

Timeout

For request timeouts

RequestException

For other request-related errors

Source code in dadosfera/services/maestro/data_assets.py
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
def get_data_asset_column_metadata(
    maestro_base_url: str,
    token: str,
    data_asset_id: str,
    additional_params: Dict[str, str] = {}
) -> Dict[str, Any]:
    """Fetch detailed column metadata for a data asset.

    Retrieves comprehensive metadata about all columns in a specific data asset,
    including data types, descriptions, and statistical information when available.

    Args:
        maestro_base_url (str): Base URL of the Maestro instance
            (e.g., 'https://maestro.example.com/api').
        token (str): Authentication token for API access. Must have 'catalog:read'
            permission.
        data_asset_id (str): Unique identifier of the data asset
            (e.g., 'asset_abc123').
        additional_params (Dict[str, str], optional): Additional query parameters.
            Common parameters include:
            - include_stats: Include statistical information
            - include_samples: Include value samples
            Defaults to {}.

    Returns:
        Dict[str, Any]: Column metadata information including:
            - columns (List[Dict]): List of column definitions

    Raises:
        requests.exceptions.HTTPError: For failed API requests. Common cases:
            - 401: Invalid or expired token
            - 403: Insufficient permissions
            - 404: Data asset not found
        requests.exceptions.ConnectionError: For network connectivity issues
        requests.exceptions.Timeout: For request timeouts
        requests.exceptions.RequestException: For other request-related errors

    """
    try:
        response = requests.get(
            f"{maestro_base_url}/catalog/data-asset/{data_asset_id}/columns-metadata",
            headers={"Content-Type": "application/json", "Authorization": token},
            params=additional_params,
        )
        response.raise_for_status()
        return response.json()
    except requests.exceptions.HTTPError as errh:
        logger.error(f"HTTP Error fetching column metadata: {errh}")
    except requests.exceptions.ConnectionError as errc:
        logger.error(f"Connection Error: {errc}")
    except requests.exceptions.Timeout as errt:
        logger.error(f"Timeout Error: {errt}")
    except requests.exceptions.RequestException as err:
        logger.error(f"Unexpected error fetching column metadata: {err}")

dadosfera.services.maestro.data_assets.get_data_asset_data_preview

get_data_asset_data_preview(maestro_base_url, token, data_asset_id, additional_params={})

Fetch a data preview for a specific data asset.

Retrieves a sample of rows from the data asset to preview its content. The preview typically includes a limited number of rows and columns for quick inspection.

PARAMETER DESCRIPTION
maestro_base_url

Base URL of the Maestro instance (e.g., 'https://maestro.example.com/api').

TYPE: str

token

Authentication token for API access. Must have 'catalog:read' permission.

TYPE: str

data_asset_id

Unique identifier of the data asset (e.g., 'asset_abc123').

TYPE: str

additional_params

Additional query parameters. Common parameters include: - limit: Maximum number of rows to return - offset: Number of rows to skip - columns: Specific columns to include Defaults to {}.

TYPE: Dict[str, str] DEFAULT: {}

RETURNS DESCRIPTION
Dict[str, Any]

Dict[str, Any]: Preview data

RAISES DESCRIPTION
HTTPError

For failed API requests. Common cases: - 401: Invalid or expired token - 403: Insufficient permissions - 404: Data asset not found

ConnectionError

For network connectivity issues

Timeout

For request timeouts

RequestException

For other request-related errors

Source code in dadosfera/services/maestro/data_assets.py
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
def get_data_asset_data_preview(
    maestro_base_url: str,
    token: str,
    data_asset_id: str,
    additional_params: Dict[str, str] = {}
) -> Dict[str, Any]:
    """Fetch a data preview for a specific data asset.

    Retrieves a sample of rows from the data asset to preview its content. The preview
    typically includes a limited number of rows and columns for quick inspection.

    Args:
        maestro_base_url (str): Base URL of the Maestro instance
            (e.g., 'https://maestro.example.com/api').
        token (str): Authentication token for API access. Must have 'catalog:read'
            permission.
        data_asset_id (str): Unique identifier of the data asset
            (e.g., 'asset_abc123').
        additional_params (Dict[str, str], optional): Additional query parameters.
            Common parameters include:
            - limit: Maximum number of rows to return
            - offset: Number of rows to skip
            - columns: Specific columns to include
            Defaults to {}.

    Returns:
        Dict[str, Any]: Preview data

    Raises:
        requests.exceptions.HTTPError: For failed API requests. Common cases:
            - 401: Invalid or expired token
            - 403: Insufficient permissions
            - 404: Data asset not found
        requests.exceptions.ConnectionError: For network connectivity issues
        requests.exceptions.Timeout: For request timeouts
        requests.exceptions.RequestException: For other request-related errors

    """
    try:
        response = requests.get(
            f"{maestro_base_url}/catalog/data-asset/{data_asset_id}/preview",
            headers={"Content-Type": "application/json", "Authorization": token},
            params=additional_params,
        )
        response.raise_for_status()
        return response.json()
    except requests.exceptions.HTTPError as errh:
        logger.error(f"HTTP Error fetching data preview: {errh}")
    except requests.exceptions.ConnectionError as errc:
        logger.error(f"Connection Error: {errc}")
    except requests.exceptions.Timeout as errt:
        logger.error(f"Timeout Error: {errt}")
    except requests.exceptions.RequestException as err:
        logger.error(f"Unexpected error fetching data preview: {err}")

dadosfera.services.maestro.data_assets.get_data_asset_table_metadata

get_data_asset_table_metadata(maestro_base_url, token, data_asset_id, additional_params={})

Fetch comprehensive metadata information about a data asset.

Retrieves detailed metadata about a specific data asset including its general properties, schema information, and configuration details.

PARAMETER DESCRIPTION
maestro_base_url

Base URL of the Maestro instance (e.g., 'https://maestro.example.com/api').

TYPE: str

token

Authentication token for API access. Must have 'catalog:read' permission.

TYPE: str

data_asset_id

Unique identifier of the data asset (e.g., 'asset_abc123').

TYPE: str

additional_params

Additional query parameters to include in the request. Defaults to {}.

TYPE: Dict[str, str] DEFAULT: {}

RETURNS DESCRIPTION
Dict[str, Any]

Dict[str, Any]: Data asset metadata including

RAISES DESCRIPTION
HTTPError

For failed API requests. Common cases: - 401: Invalid or expired token - 403: Insufficient permissions - 404: Data asset not found

ConnectionError

For network connectivity issues

Timeout

For request timeouts

RequestException

For other request-related errors

Source code in dadosfera/services/maestro/data_assets.py
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
def get_data_asset_table_metadata(
    maestro_base_url: str,
    token: str,
    data_asset_id: str,
    additional_params: Dict[str, str] = {}
) -> Dict[str, Any]:
    """Fetch comprehensive metadata information about a data asset.

    Retrieves detailed metadata about a specific data asset including its general properties,
    schema information, and configuration details.

    Args:
        maestro_base_url (str): Base URL of the Maestro instance
            (e.g., 'https://maestro.example.com/api').
        token (str): Authentication token for API access. Must have 'catalog:read'
            permission.
        data_asset_id (str): Unique identifier of the data asset
            (e.g., 'asset_abc123').
        additional_params (Dict[str, str], optional): Additional query parameters to include
            in the request.
            Defaults to {}.

    Returns:
        Dict[str, Any]: Data asset metadata including

    Raises:
        requests.exceptions.HTTPError: For failed API requests. Common cases:
            - 401: Invalid or expired token
            - 403: Insufficient permissions
            - 404: Data asset not found
        requests.exceptions.ConnectionError: For network connectivity issues
        requests.exceptions.Timeout: For request timeouts
        requests.exceptions.RequestException: For other request-related errors

    """
    try:
        response = requests.get(
            f"{maestro_base_url}/catalog/data-asset/{data_asset_id}",
            headers={"Content-Type": "application/json", "Authorization": token},
            params=additional_params,
        )
        response.raise_for_status()
        return response.json()
    except requests.exceptions.HTTPError as errh:
        logger.error(f"HTTP Error fetching data asset metadata: {errh}")
    except requests.exceptions.ConnectionError as errc:
        logger.error(f"Connection Error: {errc}")
    except requests.exceptions.Timeout as errt:
        logger.error(f"Timeout Error: {errt}")
    except requests.exceptions.RequestException as err:
        logger.error(f"Unexpected error fetching data asset metadata: {err}")