About 9,260,000 results
Open links in new tab
  1. Strategy for partitioning dask dataframes efficiently

    Jun 20, 2017 · The documentation for Dask talks about repartioning to reduce overhead here. They however seem to indicate you need some knowledge of what your dataframe will look like …

  2. Dask DataFrame.to_parquet fails on read - Stack Overflow

    Mar 15, 2022 · Use dask.dataframe.read_parquet or other dask I/O implementations, not dask.delayed wrapping pandas I/O operations, whenever possible. Giving dask direct access to the file object or …

  3. How to transform Dask.DataFrame to pd.DataFrame?

    Aug 18, 2016 · How can I transform my resulting dask.DataFrame into pandas.DataFrame (let's say I am done with heavy lifting, and just want to apply sklearn to my aggregate result)?

  4. Converting an DataFrame from pandas to dask - Stack Overflow

    Oct 22, 2020 · I followed this documentation dask.dataframe.from_pandas and there are optional arguments called npartitions and chunksize. So I try write something like this: import dask.dataframe …

  5. python - Why does Dask perform so slower while multiprocessing …

    Sep 6, 2019 · 36 dask delayed 10.288054704666138s my cpu has 6 physical cores Question Why does Dask perform so slower while multiprocessing perform so much faster? Am I using Dask the wrong …

  6. dask: difference between client.persist and client.compute

    Jan 23, 2017 · More pragmatically, I recommend using persist when your result is large and needs to be spread among many computers and using compute when your result is small and you want it on just …

  7. Dask does not use all workers and behaves differently with different ...

    Apr 21, 2023 · Workers: 15 Threads: 15 Memory: 22.02 GiB Dask Version: 2023.2.0 Dask.Distributed Version: 2023.2.0 10 nodes If I use 10 nodes the calculations interrupted after 40-45 minutes (40% …

  8. python - Difference between dask.distributed LocalCluster with threads ...

    Sep 2, 2019 · What is the difference between the following LocalCluster configurations for dask.distributed? Client(n_workers=4, processes=False, threads_per_worker=1) versus …

  9. Reading an SQL query into a Dask DataFrame - Stack Overflow

    May 24, 2022 · I'm trying create a function that takes an SQL SELECT query as a parameter and use dask to read its results into a dask DataFrame using the dask.read_sql_query function.

  10. dask - distributed.worker Memory use is high but worker has no data …

    Feb 11, 2020 · The warning also says that Dask itself isn't holding on to any data, so there isn't much that it can do to help the situation (like remove its data). My guess is that some of the libraries that …