用numpy处理大数据遇到的问题

在使用numpy读取一个四百多万行数据的.csv文件时抛出了如下异常:

numpy.core._exceptions.MemoryError: Unable to allocate array with shape (4566386, 23) and data type <U20

以下是我的源代码:

import numpy as np
import matplotlib.pyplot as mp
import sklearn.ensemble as se
import sklearn.metrics as sm
headers = None
data = []
with open (/home/tarena/桌面/i-80.csv,r) as f:
    for i,line in enumerate( f.readlines()):
        if i==0:
            headers=line.split(,)[2:]
        else:
            data.append(line.split(,)[2:])
headers = np.array(data)
data = np.array(data)
print(headers.shape)
print(data.shape)

以下是运行结果:

Traceback (most recent call last):
  File "/home/tarena/桌面/read_forest.py", line 13, in <module>
    headers = np.array(data)
numpy.core._exceptions.MemoryError: Unable to allocate array with shape (4566386, 23) and data type <U20

Process finished with exit code 1

虽然是报错,但是还是拿到了结果。

各位大佬们,有没有解决方案?

文章来自:https://www.cnblogs.com/bitrees/p/11369327.html
© 2021 jiaocheng.bubufx.com  联系我们
ICP备案:鲁ICP备09046678号-3