S3
dadosfera.services.s3.list_s3_objects
list_s3_objects(bucket_name, prefix, aws_access_key_id=None, aws_secret_access_key=None)
List objects in an S3 bucket with pagination support.
This function lists objects in an AWS S3 bucket under a specified prefix, handling pagination automatically. It filters out zero-byte objects and validates the response status for each page of results.
PARAMETER | DESCRIPTION |
---|---|
bucket_name
|
Name of the S3 bucket. Example: "my-company-data-bucket"
TYPE:
|
prefix
|
Prefix to filter objects in the bucket. Acts like a folder path in the S3 bucket. Example: "data/2024/01/" or "logs/"
TYPE:
|
aws_access_key_id
|
AWS access key ID. If not provided, falls back to default credentials. Defaults to None.
TYPE:
|
aws_secret_access_key
|
AWS secret access key. If not provided, falls back to default credentials. Defaults to None.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Dict[str, Any]]
|
List[Dict[str, Any]]: List of S3 object metadata dictionaries. |
RAISES | DESCRIPTION |
---|---|
Exception
|
When S3 API returns a non-200 status code. |
ClientError
|
When AWS API calls fail. Common cases: - Invalid credentials - Insufficient permissions - Bucket does not exist - Network issues |
NoCredentialsError
|
When no AWS credentials are available and none are provided. |
Examples:
List all objects in a specific prefix:
>>> objects = list_s3_objects('my-bucket', 'data/2024/')
>>> for obj in objects:
... print(f"Found {obj['Key']} of size {obj['Size']}")
Using explicit credentials:
>>> objects = list_s3_objects(
... 'my-bucket',
... 'logs/',
... aws_access_key_id='AKIAXXXXXXXXXXXXXXXX',
... aws_secret_access_key='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
... )
Notes
- Uses us-east-1 region by default
- Automatically handles pagination of results
- Filters out zero-byte objects (typically folder markers)
- Uses boto3 session for AWS API calls
- Validates HTTP status code for each page
Performance Considerations
- For buckets with many objects, this function may make multiple API calls
- Consider using prefix to narrow down results
- Response time depends on number of objects and network conditions
- Memory usage scales with number of non-zero-byte objects
See Also
- AWS S3 ListObjects documentation: https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjects.html
- boto3 S3 documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html
Source code in dadosfera/services/s3.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
|
dadosfera.services.s3.get_objects_from_s3
get_objects_from_s3(bucket_name, prefix)
Retrieve and decode objects from AWS S3 with automatic character encoding detection.
This function retrieves objects from an AWS S3 bucket, automatically detects their character encoding using chardet, and returns their decoded contents. It uses the list_s3_objects function to get object metadata before downloading each object individually.
PARAMETER | DESCRIPTION |
---|---|
bucket_name
|
Name of the S3 bucket to search. Example: "my-company-data-bucket"
TYPE:
|
prefix
|
Prefix (folder path) to filter objects in the bucket. Example: "data/2024/01/" or "logs/"
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Dict[str, str]]
|
List[Dict[str, str]]: List of dictionaries containing file information. Each dictionary contains: - file_content (str): Decoded content of the file - key (str): Full S3 key/path of the object - file_name (str): Extracted file name without extension Example: [ { 'file_content': 'content of file1...', 'key': 'data/2024/01/file1.txt', 'file_name': 'file1' }, ... ] Returns empty list if no objects are found. |
RAISES | DESCRIPTION |
---|---|
ClientError
|
When AWS API calls fail. Common cases: - Invalid credentials - Insufficient permissions - Bucket does not exist - Object does not exist - Network issues |
UnicodeDecodeError
|
When file content cannot be decoded with detected encoding. |
NoCredentialsError
|
When no AWS credentials are available. |
Example
objects = get_objects_from_s3('my-bucket', 'data/2024/') for obj in objects: ... print(f"File {obj['file_name']} content length: {len(obj['file_content'])}")
Notes
- Uses us-east-1 region by default
- Uses chardet to detect file encoding
- Logs operations at INFO and DEBUG levels
- Requires list_s3_objects function
- Returns empty list instead of None when no objects found
Dependencies
- boto3: AWS SDK for Python
- chardet: Character encoding detection
- logging: For operation logging
- list_s3_objects: Custom function for listing S3 objects
See Also
- AWS S3 GetObject documentation: https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html
- chardet documentation: https://chardet.readthedocs.io/en/latest/usage.html
Source code in dadosfera/services/s3.py
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
|
dadosfera.services.s3.get_s3_bucket_size
get_s3_bucket_size(bucket_name, prefix='', aws_access_key_id=None, aws_secret_access_key=None)
Calculates the size of the S3 bucket or prefix in bytes.
PARAMETER | DESCRIPTION |
---|---|
bucket_name
|
Name of the S3 bucket to search. Example: "my-company-data-bucket"
TYPE:
|
prefix
|
Prefix (folder path) to filter objects in the bucket. Example: "data/2024/01/" or "logs/"
TYPE:
|
aws_access_key_id
|
AWS access key ID. If not provided, falls back to default credentials. Defaults to None.
TYPE:
|
aws_secret_access_key
|
AWS secret access key. If not provided, falls back to default credentials. Defaults to None.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
total_size
|
size of the S3 bucket
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ClientError
|
When AWS API calls fail. Common cases: - Invalid credentials - Insufficient permissions - Bucket does not exist - Object does not exist - Network issues |
Example
objects = get_objects_from_s3('my-bucket', 'data/2024/') for obj in objects: ... print(f"File {obj['file_name']} content length: {len(obj['file_content'])}")
Notes
- Uses us-east-1 region by default
Dependencies
- boto3: AWS SDK for Python
Source code in dadosfera/services/s3.py
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 |
|