This page describes how to grant the Dataproc Service AccountUser role to Cloud Data Fusion toallow it to provision and run pipelines on Dataproc clusters.
For service accounts that are used by Dataproc, you also need togrant datafusion.instances.runtime permission to accessCloud Data Fusion runtime resources.
Whether you use a user-managed service account, or the default Compute Engineservice account on the virtual machines in a cluster, you must grant theService Account User role to Cloud Data Fusion. Otherwise,Cloud Data Fusion cannot provision a Dataproc clusterand the following error appears when you execute a data pipeline:
PROVISION task failed in REQUESTING_CREATE state for program run [pipeline-name] due to Dataproc operation failure: INVALID_ARGUMENT: User not authorized to act as service account '[service-account-name]'
Get the service account name
- In the Google Cloud console, go to the Identity and Access Management page.
Go to the IAM page - From the project selector at the top of the page, choose the project, folder, or organization to which the Cloud Data Fusion instancebelongs.
- Find and copy the Cloud Data Fusion service account name. Use the following format:
service-[project-number]@gcp-sa-datafusion.iam.gserviceaccount.com
.
Give service account user permission
- In the Google Cloud console, go to the Service Accounts page.
Go to the Service Accounts page - Click Select a project, choose a project where the service account youwant to use for the Dataproc cluster is located, and then click Open.
Click the email address of the Dataproc service account.
Click the Permissions tab. The page displays a list of principals thathave been granted roles on the service account.
Click person_add Grant access.
In the New principals field, paste the Cloud Data Fusion service account name that you previously copied.
Select the Service Account User role.
Click Save.
Grant roles to Dataproc service accounts
Grant runner role permission
Grant the Cloud Data Fusion runner role(roles/datafusion.runner
) to service accounts that are used byDataproc. This authorizes the Dataproc service account to run Cloud Data Fusion pipelines in your project.For more information, see Requiring permission to attach service accounts to resources.
Grant Cloud Storage admin permission
In Cloud Data Fusion versions 6.2.0 and above, grant the Cloud Storage admin role(roles/storage.admin
) to service accounts that are used byDataproc in your project.
What's next
- Learn more about Access control in Cloud Data Fusion.
- Learn more about Cloud Data Fusion service accounts.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-09-10 UTC.
[{ "type": "thumb-down", "id": "hardToUnderstand", "label":"Hard to understand" },{ "type": "thumb-down", "id": "incorrectInformationOrSampleCode", "label":"Incorrect information or sample code" },{ "type": "thumb-down", "id": "missingTheInformationSamplesINeed", "label":"Missing the information/samples I need" },{ "type": "thumb-down", "id": "otherDown", "label":"Other" }] [{ "type": "thumb-up", "id": "easyToUnderstand", "label":"Easy to understand" },{ "type": "thumb-up", "id": "solvedMyProblem", "label":"Solved my problem" },{ "type": "thumb-up", "id": "otherUp", "label":"Other" }]