Google Colaboratory로 데이터 가져 오기
비공개 데이터를 Google Colaboratory 노트북으로 가져 오는 일반적인 방법은 무엇입니까? 비공개 Google 시트를 가져올 수 있습니까? 시스템 파일에서 읽을 수 없습니다. 소개 문서는 BigQuery 사용 에 대한 가이드와 연결되어 있지만 조금은 ...
https://colab.research.google.com/notebooks/io.ipynb : 로컬 파일 업로드 / 다운로드 및 드라이브 및 시트와의 통합을 보여주는 공식 예제 노트북
파일을 공유하는 가장 간단한 방법은 Google 드라이브를 마운트하는 것입니다.
이렇게하려면 코드 셀에서 다음을 실행하십시오.
from google.colab import drive
drive.mount('/content/drive')
나중에 드라이브 파일이 마운트되고 측면 패널의 파일 브라우저로 파일을 찾아 볼 수 있습니다.
다음은 전체 예제 노트북입니다.
업로드
from google.colab import files
files.upload()
다운로드
files.download('filename')
디렉토리 목록
files.os.listdir()
googledrive에서 데이터를 가져 오는 간단한 방법-이렇게하면 사람들이 시간을 절약 할 수 있습니다 (Google이이 단계를 명시 적으로 나열하지 않는 이유를 모릅니다).
PyDrive 설치 및 인증
!pip install -U -q PyDrive ## you will have install for every colab session
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
업로드
로컬 드라이브에서 데이터를 업로드해야하는 경우 :
from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(name=fn, length=len(uploaded[fn])))
실행하면 파일 선택 버튼이 표시됩니다-업로드 파일을 찾으십시오-열기를 클릭하십시오
업로드하면 다음이 표시됩니다.
sample_file.json(text/plain) - 11733 bytes, last modified: x/xx/2018 - %100 done
User uploaded file "sample_file.json" with length 11733 bytes
노트북 용 파일 작성
데이터 파일이 이미 gdrive에 있으면이 단계로 건너 뛸 수 있습니다.
이제 구글 드라이브에 있습니다. Google 드라이브에서 파일을 찾아 마우스 오른쪽 버튼으로 클릭하십시오. '공유 가능한 링크'를 클릭하십시오. 다음과 같은 창이 나타납니다.
https://drive.google.com/open?id=29PGh8XCts3mlMP6zRphvnIcbv27boawn
복사- '29PGh8XCts3mlMP6zRphvnIcbv27boawn'-파일 ID입니다.
노트북에서 :
json_import = drive.CreateFile({'id':'29PGh8XCts3mlMP6zRphvnIcbv27boawn'})
json_import.GetContentFile('sample.json') - 'sample.json' is the file name that will be accessible in the notebook.
노트북으로 데이터 가져 오기
To import the data you uploaded into the notebook (a json file in this example - how you load will depend on file/data type - .txt,.csv etc. ):
sample_uploaded_data = json.load(open('sample.json'))
Now you can print to see the data is there:
print(sample_uploaded_data)
step 1- Mount your Google Drive to Collaboratory
from google.colab import drive
drive.mount('/content/gdrive')
step 2- Now you will see your Google Drive files in the left pane (file explorer). Right click on the file that you need to import and select çopy path. Then import as usual in pandas, using this copied path.
import pandas as pd
df=pd.read_csv('gdrive/My Drive/data.csv')
Done!
The simplest way I've made is :
- Make repository on github with your dataset
- Clone Your repository with ! git clone --recursive [GITHUB LINK REPO]
- Find where is your data ( !ls command )
- Open file with pandas as You do it in normal jupyter notebook.
This allows you to upload your files through Google Drive.
Run the below code (found this somewhere previously but I can't find the source again - credits to whoever wrote it!):
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
Click on the first link that comes up which will prompt you to sign in to Google; after that another will appear which will ask for permission to access to your Google Drive.
Then, run this which creates a directory named 'drive', and links your Google Drive to it:
!mkdir -p drive
!google-drive-ocamlfuse drive
If you do a !ls
now, there will be a directory drive, and if you do a !ls drive
you can see all the contents of your Google Drive.
So for example, if I save my file called abc.txt
in a folder called ColabNotebooks
in my Google Drive, I can now access it via a path drive/ColabNotebooks/abc.txt
On the left bar of any colaboratory there is a section called "Files". Upload your files there and use this path
"/content/YourFileName.extension"
ex: pd.read_csv('/content/Forbes2015.csv');
The simplest solution I have found so far which works perfectly for small to mid-size CSV files is:
- Create a secret gist on gist.github.com and upload (or copy-paste the content of) your file.
- Click on the Raw view and copy the raw file URL.
- Use the copied URL as the file address when you call
pandas.read_csv(URL)
This may or may not work for reading a text file line by line or binary files.
Quick and easy import from Dropbox:
!pip install dropbox
import dropbox
access_token = 'YOUR_ACCESS_TOKEN_HERE' # https://www.dropbox.com/developers/apps
dbx = dropbox.Dropbox(access_token)
# response = dbx.files_list_folder("")
metadata, res = dbx.files_download('/dataframe.pickle2')
with open('dataframe.pickle2', "wb") as f:
f.write(res.content)
You can also use my implementations on google.colab and PyDrive at https://github.com/ruelj2/Google_drive which makes it a lot easier.
!pip install - U - q PyDrive
import os
os.chdir('/content/')
!git clone https://github.com/ruelj2/Google_drive.git
from Google_drive.handle import Google_drive
Gd = Google_drive()
Then, if you want to load all files in a Google Drive directory, just
Gd.load_all(local_dir, drive_dir_ID, force=False)
Or just a specific file with
Gd.load_file(local_dir, file_ID)
It has been solved, find details here and please use the function below: https://stackoverflow.com/questions/47212852/how-to-import-and-read-a-shelve-or-numpy-file-in-google-colaboratory/49467113#49467113
from google.colab import files
import zipfile, io, os
def read_dir_file(case_f):
# author: yasser mustafa, 21 March 2018
# case_f = 0 for uploading one File and case_f = 1 for uploading one Zipped Directory
uploaded = files.upload() # to upload a Full Directory, please Zip it first (use WinZip)
for fn in uploaded.keys():
name = fn #.encode('utf-8')
#print('\nfile after encode', name)
#name = io.BytesIO(uploaded[name])
if case_f == 0: # case of uploading 'One File only'
print('\n file name: ', name)
return name
else: # case of uploading a directory and its subdirectories and files
zfile = zipfile.ZipFile(name, 'r') # unzip the directory
zfile.extractall()
for d in zfile.namelist(): # d = directory
print('\n main directory name: ', d)
return d
print('Done!')
Here is one way to import files from google drive to notebooks.
open jupyter notebook and run the below code and do complete the authentication process
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret= {creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
once you done with above code , run the below code to mount google drive
!mkdir -p drive
!google-drive-ocamlfuse drive
Importing files from google drive to notebooks (Ex: Colab_Notebooks/db.csv)
lets say your dataset file in Colab_Notebooks folder and its name is db.csv
import pandas as pd
dataset=pd.read_csv("drive/Colab_Notebooks/db.csv")
I hope it helps
if you want to do this without code it's pretty easy. Zip your folder in my case it is
dataset.zip
그런 다음 Colab에서이 파일을 저장할 폴더를 마우스 오른쪽 버튼으로 클릭하고 업로드를 누르고이 zip 파일을 업로드하십시오. 그런 다음이 Linux 명령을 작성하십시오.
!unzip <your_zip_file_name>
데이터가 성공적으로 업로드 된 것을 볼 수 있습니다.
@Vivek Solanki가 언급했듯이 공동 작업 대시 보드의 "파일"섹션에서 파일을 업로드했습니다. 파일이 업로드 된 위치를 기록하십시오. 나를 train_data = pd.read_csv('/fileName.csv')
위해 일했다.
참고 URL : https://stackoverflow.com/questions/46986398/import-data-into-google-colaboratory
'Programming' 카테고리의 다른 글
Linux 쉘에서 변수로 나누는 방법은 무엇입니까? (0) | 2020.07.15 |
---|---|
특정 유형의 모든 이벤트 리스너 제거 (0) | 2020.07.15 |
Hadoop에서 여러 MapReduce 작업 연결 (0) | 2020.07.14 |
gem 파일에서 ~>은 무엇을 의미합니까? (0) | 2020.07.14 |
ASP.NET MVC 응용 프로그램을 지역화하는 방법은 무엇입니까? (0) | 2020.07.14 |